Publications

Refine Results

(Filters Applied) Clear All

Corpora design and score calibration for text dependent pronunciation proficiency recognition

Published in:
8th ISCA Workshop on Speech and Language Technology in Education, SLaTe 2019, 20-21 September 2019.

Summary

This work investigates methods for improving a pronunciation proficiency recognition system, both in terms of phonetic level posterior probability calibration, and in ordinal utterance level classification, for Modern Standard Arabic (MSA), Spanish and Russian. To support this work, utterance level labels were obtained by crowd-sourcing the annotation of language learners' recordings. Phonetic posterior probability estimates extracted using automatic speech recognition systems trained in each language were estimated using a beta calibration approach [1] and language proficiency level was estimated using an ordinal regression [2]. Fusion with language recognition (LR) scores from an i-vector system [3] trained on 23 languages is also explored. Initial results were promising for all three languages and it was demonstrated that the calibrated posteriors were effective for predicting pronunciation proficiency. Significant relative gains of 16% mean absolute error for the ordinal regression and 17% normalized cross entropy for the binary beta regression were achieved on MSA through fusion with LR scores.
READ LESS

Summary

This work investigates methods for improving a pronunciation proficiency recognition system, both in terms of phonetic level posterior probability calibration, and in ordinal utterance level classification, for Modern Standard Arabic (MSA), Spanish and Russian. To support this work, utterance level labels were obtained by crowd-sourcing the annotation of language learners'...

READ MORE

Using K-means in SVR-based text difficulty estimation

Published in:
8th ISCA Workshop on Speech and Language Technology in Education, SLaTE, 20-21 September 2019.

Summary

A challenge for second language learners, educators, and test creators is the identification of authentic materials at the right level of difficulty. In this work, we present an approach to automatically measure text difficulty, integrated into Auto-ILR, a web-based system that helps find text material at the right level for learners in 18 languages. The Auto-ILR subscription service scans web feeds, extracts article content, evaluates the difficulty, and notifies users of documents that match their skill level. Difficulty is measured on the standard ILR scale with language-specific support vector machine regression (SVR) models built from vectors incorporating length features, term frequencies, relative entropy, and K-means clustering.
READ LESS

Summary

A challenge for second language learners, educators, and test creators is the identification of authentic materials at the right level of difficulty. In this work, we present an approach to automatically measure text difficulty, integrated into Auto-ILR, a web-based system that helps find text material at the right level for...

READ MORE

A new multiple choice comprehension test for MT

Published in:
Automatic and Manual Metrics for Operation Translation Evaluation Workshop, 9th Int. Conf. on Language Resources and Evaluation (LREC 2014), 26 May 2014.

Summary

We present results from a new machine translation comprehension test, similar to those developed in previous work (Jones et al., 2007). This test has documents in four conditions: (1) original English documents; (2) human translations of the documents into Arabic; conditions (3) and (4) are machine translations of the Arabic documents into English from two different MT systems. We created two forms of the test: Form A has the original English documents and output from the two Arabic-to-English MT systems. Form B has English, Arabic, and one of the MT system outputs. We administered the comprehension test to three subject types recruited in the greater Boston area: (1) native English speakers with no Arabic skills, (2) Arabic language learners, and (3) Native Arabic speakers who also have English language skills. There were 36 native English speakers, 13 Arabic learners, and 11 native Arabic speakers with English skills. Subjects needed an average of 3.8 hours to complete the test, which had 191 questions and 59 documents. Native English speakers with no Arabic skills saw Form A. Arabic learners and native Arabic speakers saw form B.
READ LESS

Summary

We present results from a new machine translation comprehension test, similar to those developed in previous work (Jones et al., 2007). This test has documents in four conditions: (1) original English documents; (2) human translations of the documents into Arabic; conditions (3) and (4) are machine translations of the Arabic...

READ MORE

Standardized ILR-based and task-based speech-to-speech MT evaluation

Published in:
Automatic and Manual Metrics for Operation Translation Evaluation Workshop, 9th Int. Conf. on Language Resources and Evaluation (LREC 2014), 26 May 2014.

Summary

This paper describes a new method for task-based speech-to-speech machine translation evaluation, in which tasks are defined and assessed according to independent published standards, both for the military tasks performed and for the foreign language skill levels used. We analyze task success rates and automatic MT evaluation scores (BLEU and METEOR) for 220 role-play dialogs. Each role-play team consisted of one native English-speaking soldier role player, one native Pashto-speaking local national role player, and one Pashto/English interpreter. The overall PASS score, averaged over all of the MT dialogs, was 44%. The average PASS rate for HT was 95%, which is important because a PASS requires that the role-players know the tasks. Without a high PASS rate in the HT condition, we could not be sure that the MT condition was not being unfairly penalized. We learned that success rates depended as much on task simplicity as it did upon the translation condition: 67% of simple, base-case scenarios were successfully completed using MT, whereas only 35% of contrasting scenarios with even minor obstacles received passing scores. We observed that MT had the greatest chance of success when the task was simple and the language complexity needs were low.
READ LESS

Summary

This paper describes a new method for task-based speech-to-speech machine translation evaluation, in which tasks are defined and assessed according to independent published standards, both for the military tasks performed and for the foreign language skill levels used. We analyze task success rates and automatic MT evaluation scores (BLEU and...

READ MORE

Development and use of a comprehensive humanitarian assessment tool in post-earthquake Haiti

Summary

This paper describes a comprehensive humanitarian assessment tool designed and used following the January 2010 Haiti earthquake. The tool was developed under Joint Task Force -- Haiti coordination using indicators of humanitarian needs to support decision making by the United States Government, agencies of the United Nations, and various non-governmental organizations. A set of questions and data collection methodology were developed by a collaborative process involving a broad segment of the Haiti humanitarian relief community and used to conduct surveys in internally displaced person settlements and surrounding communities for a four-month period starting on 15 March 2010. Key considerations in the development of the assessment tool and data collection methodology, representative analysis results, and observations from the operational use of the tool for decision making are reported. The paper concludes with lessons learned and recommendations for design and use of similar tools in the future.
READ LESS

Summary

This paper describes a comprehensive humanitarian assessment tool designed and used following the January 2010 Haiti earthquake. The tool was developed under Joint Task Force -- Haiti coordination using indicators of humanitarian needs to support decision making by the United States Government, agencies of the United Nations, and various non-governmental...

READ MORE

Using United States government language proficiency standards for MT evaluation

Published in:
Chapter 5.3.3 in Handbook of Natural Language Processing and Machine Translation, 2011, pp. 775-82.

Summary

The purpose of this section is to discuss a method of measuring the degree to which the essential meaning of the original text is communicated in the MT output. We view this test to be a measurement of the fundamental goal of MT; that is, to convey information accurately from one language to another. We conducted a series of experiments in which educated native readers of English responded to test questions about translated versions of texts originally written in Arabic and Chinese. We compared the results for those subjects using machine translations of the texts with those using professional reference translations. These comparisons serve as a baseline for determining the level of foreign language reading comprehension that can be achieved by a native English reader relying on machine translation technology. This also allows us to explore the relationship between the current, broadly accepted automatic measures of performance for machine translation and a test derived from the Defense Language Proficiency Test, which is used throughout the Defense Department for measuring foreign language proficiency. Our goal is to put MT system performance evaluation into terms that are meaningful to US government consumers of MT output.
READ LESS

Summary

The purpose of this section is to discuss a method of measuring the degree to which the essential meaning of the original text is communicated in the MT output. We view this test to be a measurement of the fundamental goal of MT; that is, to convey information accurately from...

READ MORE

Machine translation for government applications

Published in:
Lincoln Laboratory Journal, Vol. 18, No. 1, June 2009, pp. 41-53.

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer scientists, mathematicians, and linguists since the first international conference on MT was held at the Massachusetts Institute of Technology in 1952. Currently, Lincoln Laboratory is achieving success in a highly focused research program that specializes in developing speech translation technology for limited language resource domains and in adapting foreign-language proficiency standards for MT evaluation. Our specialized research program is situated within a general framework for multilingual speech and text processing for government applications.
READ LESS

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer...

READ MORE

Two protocols comparing human and machine phonetic discrimination performance in conversational speech

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 1630-1633.

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest that in conversational telephone speech, human performance on these tasks exceeds that of machines by 15%. Furthermore, in a related controlled language model control experiment, human subjects were better able to correctly predict words in conversational speech by 45%.
READ LESS

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest...

READ MORE

ILR-based MT comprehension test with multi-level questions

Published in:
Human Language Technology, North American Chapter of the Association for Computational Linguistics, HLT/NAACL, 22-27 April 2007.

Summary

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the MT condition, for both text and audio data types, with high quality human reference Gold Standard (GS) translations. Overall, subjects achieved 95% comprehension for GS and 74% for MT, across all genres and difficulty levels. Interestingly, comprehension rates do not correlate highly with translation error rates, suggesting that we are measuring an additional dimension of MT quality.
READ LESS

Summary

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the...

READ MORE

Experimental facility for measuring the impact of environmental noise and speaker variation on speech-to-speech translation devices

Published in:
Proc. IEEE Spoken Language Technology Workshop, 10-13 December 2006, pp. 250-253.

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately 60,000 phrases available for experimentation. We describe an initial experiment using the facility which shows the impact of environmental noise and speaker variability on phrase recognition accuracy for two commercially available oneway speech-to-speech translation devices configured for English-to-Arabic.
READ LESS

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately...

READ MORE