Publications

Refine Results

(Filters Applied) Clear All

Analyzing and interpreting automatically learned rules across dialects

Published in:
INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Summary

In this paper, we demonstrate how informative dialect recognition systems such as acoustic pronunciation model (APM) help speech scientists locate and analyze phonetic rules efficiently. In particular, we analyze dialect-specific characteristics automatically learned from APM across two American English dialects. We show that unsupervised rule retrieval performs similarly to supervised retrieval, indicating that APM is useful in practical applications, where word transcripts are often unavailable. We also demonstrate that the top-ranking rules learned from APM generally correspond to the linguistic literature, and can even pinpoint potential research directions to refine existing knowledge. Thus, the APM system can help phoneticians analyze rules efficiently by characterizing large amounts of data to postulate rule candidates, so they can reserve time to conduct more targeted investigations. Potential applications of informative dialect recognition systems include forensic phonetics and diagnosis of spoken language disorders.
READ LESS

Summary

In this paper, we demonstrate how informative dialect recognition systems such as acoustic pronunciation model (APM) help speech scientists locate and analyze phonetic rules efficiently. In particular, we analyze dialect-specific characteristics automatically learned from APM across two American English dialects. We show that unsupervised rule retrieval performs similarly to supervised...

READ MORE

Assessing the speaker recognition performance of naive listeners using Mechanical Turk

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 5916-5919.

Summary

In this paper we attempt to quantify the ability of naive listeners to perform speaker recognition in the context of the NIST evaluation task. We describe our protocol: a series of listening experiments using large numbers of naive listeners (432) on Amazon's Mechanical Turk that attempts to measure the ability of the average human listener to perform speaker recognition. Our goal was to compare the performance of the average human listener to both forensic experts and state-of-the- art automatic systems. We show that naive listeners vary substantially in their performance, but that an aggregation of listener responses can achieve performance similar to that of expert forensic examiners.
READ LESS

Summary

In this paper we attempt to quantify the ability of naive listeners to perform speaker recognition in the context of the NIST evaluation task. We describe our protocol: a series of listening experiments using large numbers of naive listeners (432) on Amazon's Mechanical Turk that attempts to measure the ability...

READ MORE

Informative dialect recognition using context-dependent pronunciation modeling

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 4396-4399.

Summary

We propose an informative dialect recognition system that learns phonetic transformation rules, and uses them to identify dialects. A hidden Markov model is used to align reference phones with dialect specific pronunciations to characterize when and how often substitutions, insertions, and deletions occur. Decision tree clustering is used to find context-dependent phonetic rules. We ran recognition tasks on 4 Arabic dialects. Not only do the proposed systems perform well on their own, but when fused with baselines they improve performance by 21-36% relative. In addition, our proposed decision-tree system beats the baseline monophone system in recovering phonetic rules by 21% relative. Pronunciation rules learned by our proposed system quantify the occurrence frequency of known rules, and suggest rule candidates for further linguistic studies.
READ LESS

Summary

We propose an informative dialect recognition system that learns phonetic transformation rules, and uses them to identify dialects. A hidden Markov model is used to align reference phones with dialect specific pronunciations to characterize when and how often substitutions, insertions, and deletions occur. Decision tree clustering is used to find...

READ MORE

USSS-MITLL 2010 human assisted speaker recognition

Summary

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from USSS casework. The USSS-MIT/LL 2010 HASR results are presented. We also present post-evaluation results. The results are encouraging within the resolving power of the evaluation, which was limited to enable reasonable levels of human effort. Future ideas and efforts are discussed, including new features and capitalizing on naive listeners.
READ LESS

Summary

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from...

READ MORE

Using United States government language proficiency standards for MT evaluation

Published in:
Chapter 5.3.3 in Handbook of Natural Language Processing and Machine Translation, 2011, pp. 775-82.

Summary

The purpose of this section is to discuss a method of measuring the degree to which the essential meaning of the original text is communicated in the MT output. We view this test to be a measurement of the fundamental goal of MT; that is, to convey information accurately from one language to another. We conducted a series of experiments in which educated native readers of English responded to test questions about translated versions of texts originally written in Arabic and Chinese. We compared the results for those subjects using machine translations of the texts with those using professional reference translations. These comparisons serve as a baseline for determining the level of foreign language reading comprehension that can be achieved by a native English reader relying on machine translation technology. This also allows us to explore the relationship between the current, broadly accepted automatic measures of performance for machine translation and a test derived from the Defense Language Proficiency Test, which is used throughout the Defense Department for measuring foreign language proficiency. Our goal is to put MT system performance evaluation into terms that are meaningful to US government consumers of MT output.
READ LESS

Summary

The purpose of this section is to discuss a method of measuring the degree to which the essential meaning of the original text is communicated in the MT output. We view this test to be a measurement of the fundamental goal of MT; that is, to convey information accurately from...

READ MORE

The MIT-LL/AFRL IWSLT-2010 MT system

Published in:
Proc. Int. Workshop on Spoken Language Translation, IWSLT, 2 December 2010.

Summary

This paper describes the MIT-LUAFRL statistical MT system and the improvements that were developed during the IWSLT 2010 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic and Turkish to English translation tasks. We also participated in the new French to English BTEC and English to French TALK tasks. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2008 system, and experiments we ran during the IWSLT-2010 evaluation. Specifically, we focus on 1) cross-domain translation using MAP adaptation, 2) Turkish morphological processing and translation, 3) improved Arabic morphology for MT preprocessing, and 4) system combination methods for machine translation.
READ LESS

Summary

This paper describes the MIT-LUAFRL statistical MT system and the improvements that were developed during the IWSLT 2010 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic and Turkish to English translation tasks. We...

READ MORE

A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 5014-5017.

Summary

We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by 7.5% (relative). Our proposed dialect discriminating biphone system achieves similar performance to a baseline all-biphone system despite using 25% fewer biphone models. In addition, our system complements PRLM (Phone Recognition followed by Language Modeling), verified by obtaining relative gains of 15-29% when fused with PRLM. Our work is an encouraging first step towards a linguistically-informative dialect recognition system, with potential applications in forensic phonetics, accent training, and language learning.
READ LESS

Summary

We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by...

READ MORE

Query-by-example spoken term detection using phonetic posteriorgram templates

Published in:
Proc. IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU, 13-17 December 2009, pp. 421-426.

Summary

This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus.
READ LESS

Summary

This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user...

READ MORE

The MIT-LL/AFRL IWSLT-2008 MT System

Published in:
Int. Workshop on Spoken Language Translation, IWSLT, 1-2 December 2009.

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2008 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance for both text and speech-based translation on Chinese and Arabic translation tasks. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2007 system, and experiments we ran during the IWSLT-2008 evaluation. Specifically, we focus on 1) novel segmentation models for phrase-based MT, 2) improved lattice and confusion network decoding of speech input, 3) improved Arabic morphology for MT preprocessing, and 4) system combination methods for machine translation.
READ LESS

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2008 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance for both text and speech-based translation on Chinese and Arabic...

READ MORE

A comparison of query-by-example methods for spoken term detection

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngrambased phonetic index and we analyze factors affecting the performance of these systems. We show that the best systems under this paradigm are able to achieve 77% precision when retrieving utterances from conversational telephone speech and returning 10 results from a single query (performance that is better than a similar dictionary-based approach) suggesting significant utility for applications requiring high precision. We also show that these systems can be further improved using relevance feedback: By incorporating four additional queries the precision of the best system can be improved by 13.7% relative. Our systems perform well despite high phone recognition error rates (> 40%) and make use of no pronunciation or letter-to-sound resources.
READ LESS

Summary

In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngrambased phonetic index and we analyze factors affecting the...

READ MORE