Publications

Refine Results

(Filters Applied) Clear All

A new multiple choice comprehension test for MT

Published in:
Automatic and Manual Metrics for Operation Translation Evaluation Workshop, 9th Int. Conf. on Language Resources and Evaluation (LREC 2014), 26 May 2014.

Summary

We present results from a new machine translation comprehension test, similar to those developed in previous work (Jones et al., 2007). This test has documents in four conditions: (1) original English documents; (2) human translations of the documents into Arabic; conditions (3) and (4) are machine translations of the Arabic documents into English from two different MT systems. We created two forms of the test: Form A has the original English documents and output from the two Arabic-to-English MT systems. Form B has English, Arabic, and one of the MT system outputs. We administered the comprehension test to three subject types recruited in the greater Boston area: (1) native English speakers with no Arabic skills, (2) Arabic language learners, and (3) Native Arabic speakers who also have English language skills. There were 36 native English speakers, 13 Arabic learners, and 11 native Arabic speakers with English skills. Subjects needed an average of 3.8 hours to complete the test, which had 191 questions and 59 documents. Native English speakers with no Arabic skills saw Form A. Arabic learners and native Arabic speakers saw form B.
READ LESS

Summary

We present results from a new machine translation comprehension test, similar to those developed in previous work (Jones et al., 2007). This test has documents in four conditions: (1) original English documents; (2) human translations of the documents into Arabic; conditions (3) and (4) are machine translations of the Arabic...

READ MORE

Standardized ILR-based and task-based speech-to-speech MT evaluation

Published in:
Automatic and Manual Metrics for Operation Translation Evaluation Workshop, 9th Int. Conf. on Language Resources and Evaluation (LREC 2014), 26 May 2014.

Summary

This paper describes a new method for task-based speech-to-speech machine translation evaluation, in which tasks are defined and assessed according to independent published standards, both for the military tasks performed and for the foreign language skill levels used. We analyze task success rates and automatic MT evaluation scores (BLEU and METEOR) for 220 role-play dialogs. Each role-play team consisted of one native English-speaking soldier role player, one native Pashto-speaking local national role player, and one Pashto/English interpreter. The overall PASS score, averaged over all of the MT dialogs, was 44%. The average PASS rate for HT was 95%, which is important because a PASS requires that the role-players know the tasks. Without a high PASS rate in the HT condition, we could not be sure that the MT condition was not being unfairly penalized. We learned that success rates depended as much on task simplicity as it did upon the translation condition: 67% of simple, base-case scenarios were successfully completed using MT, whereas only 35% of contrasting scenarios with even minor obstacles received passing scores. We observed that MT had the greatest chance of success when the task was simple and the language complexity needs were low.
READ LESS

Summary

This paper describes a new method for task-based speech-to-speech machine translation evaluation, in which tasks are defined and assessed according to independent published standards, both for the military tasks performed and for the foreign language skill levels used. We analyze task success rates and automatic MT evaluation scores (BLEU and...

READ MORE

Development and use of a comprehensive humanitarian assessment tool in post-earthquake Haiti

Summary

This paper describes a comprehensive humanitarian assessment tool designed and used following the January 2010 Haiti earthquake. The tool was developed under Joint Task Force -- Haiti coordination using indicators of humanitarian needs to support decision making by the United States Government, agencies of the United Nations, and various non-governmental organizations. A set of questions and data collection methodology were developed by a collaborative process involving a broad segment of the Haiti humanitarian relief community and used to conduct surveys in internally displaced person settlements and surrounding communities for a four-month period starting on 15 March 2010. Key considerations in the development of the assessment tool and data collection methodology, representative analysis results, and observations from the operational use of the tool for decision making are reported. The paper concludes with lessons learned and recommendations for design and use of similar tools in the future.
READ LESS

Summary

This paper describes a comprehensive humanitarian assessment tool designed and used following the January 2010 Haiti earthquake. The tool was developed under Joint Task Force -- Haiti coordination using indicators of humanitarian needs to support decision making by the United States Government, agencies of the United Nations, and various non-governmental...

READ MORE

Using United States government language proficiency standards for MT evaluation

Published in:
Chapter 5.3.3 in Handbook of Natural Language Processing and Machine Translation, 2011, pp. 775-82.

Summary

The purpose of this section is to discuss a method of measuring the degree to which the essential meaning of the original text is communicated in the MT output. We view this test to be a measurement of the fundamental goal of MT; that is, to convey information accurately from one language to another. We conducted a series of experiments in which educated native readers of English responded to test questions about translated versions of texts originally written in Arabic and Chinese. We compared the results for those subjects using machine translations of the texts with those using professional reference translations. These comparisons serve as a baseline for determining the level of foreign language reading comprehension that can be achieved by a native English reader relying on machine translation technology. This also allows us to explore the relationship between the current, broadly accepted automatic measures of performance for machine translation and a test derived from the Defense Language Proficiency Test, which is used throughout the Defense Department for measuring foreign language proficiency. Our goal is to put MT system performance evaluation into terms that are meaningful to US government consumers of MT output.
READ LESS

Summary

The purpose of this section is to discuss a method of measuring the degree to which the essential meaning of the original text is communicated in the MT output. We view this test to be a measurement of the fundamental goal of MT; that is, to convey information accurately from...

READ MORE

Machine translation for government applications

Published in:
Lincoln Laboratory Journal, Vol 18, No. 1, June 2009, pp. 41-53.

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer scientists, mathematicians, and linguists since the first international conference on MT was held at the Massachusetts Institute of Technology in 1952. Currently, Lincoln Laboratory is achieving success in a highly focused research program that specializes in developing speech translation technology for limited language resource domains and in adapting foreign-language proficiency standards for MT evaluation. Our specialized research program is situated within a general framework for multilingual speech and text processing for government applications.
READ LESS

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer...

READ MORE

Two protocols comparing human and machine phonetic discrimination performance in conversational speech

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 1630-1633.

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest that in conversational telephone speech, human performance on these tasks exceeds that of machines by 15%. Furthermore, in a related controlled language model control experiment, human subjects were better able to correctly predict words in conversational speech by 45%.
READ LESS

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest...

READ MORE

ILR-based MT comprehension test with multi-level questions

Published in:
Human Language Technology, North American Chapter of the Association for Computational Linguistics, HLT/NAACL, 22-27 April 2007.

Summary

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the MT condition, for both text and audio data types, with high quality human reference Gold Standard (GS) translations. Overall, subjects achieved 95% comprehension for GS and 74% for MT, across all genres and difficulty levels. Interestingly, comprehension rates do not correlate highly with translation error rates, suggesting that we are measuring an additional dimension of MT quality.
READ LESS

Summary

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the...

READ MORE

Experimental facility for measuring the impact of environmental noise and speaker variation on speech-to-speech translation devices

Published in:
Proc. IEEE Spoken Language Technology Workshop, 10-13 December 2006, pp. 250-253.

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately 60,000 phrases available for experimentation. We describe an initial experiment using the facility which shows the impact of environmental noise and speaker variability on phrase recognition accuracy for two commercially available oneway speech-to-speech translation devices configured for English-to-Arabic.
READ LESS

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately...

READ MORE

Toward an interagency language roundtable based assessment of speech-to-speech translation capabilitites

Published in:
AMTA 2006, 7th Biennial Conf. of the Association for Machine Translation in the Americas, 8-12 August 2006.

Summary

We present observations from three exercises designed to map the effective listening and speaking skills of an operator of a speech-to-speech translation system (S2S) to the Interagency Language Roundtable (ILR) scale. Such a mapping is nontrivial, but will be useful for government and military decision makers in managing expectations of S2S technology. We observed domain-dependent S2S capabilities in the ILR range of Level 0+ to Level 1, and interactive text-based machine translation in the Level 3 range.
READ LESS

Summary

We present observations from three exercises designed to map the effective listening and speaking skills of an operator of a speech-to-speech translation system (S2S) to the Interagency Language Roundtable (ILR) scale. Such a mapping is nontrivial, but will be useful for government and military decision makers in managing expectations of...

READ MORE

Two experiments comparing reading with listening for human processing of conversational telephone speech

Published in:
6th Annual Conf. of the Int. Speech Communication Association, INTERSPEECH 2005, 4-8 September 2005.

Summary

We report on results of two experiments designed to compare subjects' ability to extract information from audio recordings of conversational telephone speech (CTS) with their ability to extract information from text transcripts of these conversations, with and without the ability to hear the audio recordings. Although progress in machine processing of CTS speech is well documented, human processing of these materials has not been as well studied. These experiments compare subject's processing time and comprehension of widely-available CTS data in audio and written formats -- one experiment involves careful reading and one involves visual scanning for information. We observed a very modest improvement using transcripts compared with the audio-only condition for the careful reading task (speed-up by a factor of 1.2) and a much more dramatic improvement using transcripts in the visual scanning task (speed-up by a factor of 2.9). The implications of the experiments are twofold: (1) we expect to see similar gains in human productivity for comparable applications outside the laboratory environment and (2) the gains can vary widely, depending on the specific tasks involved.
READ LESS

Summary

We report on results of two experiments designed to compare subjects' ability to extract information from audio recordings of conversational telephone speech (CTS) with their ability to extract information from text transcripts of these conversations, with and without the ability to hear the audio recordings. Although progress in machine processing...

READ MORE