Publications

Refine Results

(Filters Applied) Clear All

The MIT-LL/AFRL IWSLT-2011 MT System

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2011 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk translation tasks. We also applied our existing ASR system to the TED-talk lecture ASR task. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2010 system, and experiments we ran during the IWSLT-2011 evaluation. Specifically, we focus on 1) speech recognition for lecture-like data, 2) cross-domain translation using MAP adaptation, and 3) improved Arabic morphology for MT preprocessing.
READ LESS

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2011 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk...

READ MORE

The MIT-LL/AFRL IWSLT-2010 MT system

Published in:
Proc. Int. Workshop on Spoken Language Translation, IWSLT, 2 December 2010.

Summary

This paper describes the MIT-LUAFRL statistical MT system and the improvements that were developed during the IWSLT 2010 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic and Turkish to English translation tasks. We also participated in the new French to English BTEC and English to French TALK tasks. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2008 system, and experiments we ran during the IWSLT-2010 evaluation. Specifically, we focus on 1) cross-domain translation using MAP adaptation, 2) Turkish morphological processing and translation, 3) improved Arabic morphology for MT preprocessing, and 4) system combination methods for machine translation.
READ LESS

Summary

This paper describes the MIT-LUAFRL statistical MT system and the improvements that were developed during the IWSLT 2010 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic and Turkish to English translation tasks. We...

READ MORE

Advocate: a distributed voice-oriented computing architecture

Published in:
North American Chapter of the Association for Computational Linguistics - Human Language Technologies Conf. (NAACL HLT 2009), 31 May - 5 June 2009.

Summary

Advocate is a lightweight and easy-to-use computing architecture that supports real-time, voice-oriented computing. It is designed to allow the combination of multiple speech and language processing components to create cohesive distributed applications. It is scalable, supporting local processing of all NLP/speech components when sufficient processing resources are available to one machine, or fully distributed/networked processing over an arbitrarily large compute structure when more compute resources are needed. Advocate is designed to operate in a large distributed test-bed in which an arbitrary number of NLP/speech services interface with an arbitrary number of Advocate clients applications. In this configuration, each Advocate client application employs automatic service discovery, calling them as required.
READ LESS

Summary

Advocate is a lightweight and easy-to-use computing architecture that supports real-time, voice-oriented computing. It is designed to allow the combination of multiple speech and language processing components to create cohesive distributed applications. It is scalable, supporting local processing of all NLP/speech components when sufficient processing resources are available to one...

READ MORE

Low-resource speech translation of Urdu to English using semi-supervised part-of-speech tagging and transliteration

Author:
Published in:
SLT 2008, IEEE Spoken Language Technology Workshop 2008, 15-10 December 2008, pp. 265-268.

Summary

This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.
READ LESS

Summary

This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction...

READ MORE

Showing Results

1-4 of 4