Publications

Refine Results

(Filters Applied) Clear All

Cognitive services for the user

Published in:
Chapter 10, Cognitive Radio Technology, 2009, pp. 305-324.

Summary

Software-defined cognitive radios (CRs) use voice as a primary input/output (I/O) modality and are expected to have substantial computational resources capable of supporting advanced speech- and audio-processing applications. This chapter extends previous work on speech applications (e.g., [1]) to cognitive services that enhance military mission capability by capitalizing on automatic processes, such as speech information extraction and understanding the environment. Such capabilities go beyond interaction with the intended user of the software-defined radio (SDR) - they extend to speech and audio applications that can be applied to information that has been extracted from voice and acoustic noise gathered from other users and entities in the environment. For example, in a military environment, situational awareness and understanding could be enhanced by informing users based on processing voice and noise from both friendly and hostile forces operating in a given battle space. This chapter provides a survey of a number of speech- and audio-processing technologies and their potential applications to CR, including: - A description of the technology and its current state of practice. - An explanation of how the technology is currently being applied, or could be applied, to CR. - Descriptions and concepts of operations for how the technology can be applied to benefit users of CRs. - A description of relevant future research directions for both the speech and audio technologies and their applications to CR. A pictorial overview of many of the core technologies with some applications presented in the following sections is shown in Figure 10.1. Also shown are some overlapping components between the technologies. For example, Gaussian mixture models (GMMs) and support vector machines (SVMs) are used in both speaker and language recognition technologies [2]. These technologies and components are described in further detail in the following sections. Speech and concierge cognitive services and their corresponding applications are covered in the following sections. The services covered include speaker recognition, language identification (LID), text-to-speech (TTS) conversion, speech-to-text (STT) conversion, machine translation (MT), background noise suppression, speech coding, speaker characterization, noise management, noise characterization, and concierge services. These technologies and their potential applications to CR are discussed at varying levels of detail commensurate with their innovation and utility.
READ LESS

Summary

Software-defined cognitive radios (CRs) use voice as a primary input/output (I/O) modality and are expected to have substantial computational resources capable of supporting advanced speech- and audio-processing applications. This chapter extends previous work on speech applications (e.g., [1]) to cognitive services that enhance military mission capability by capitalizing on automatic...

READ MORE

Low-resource speech translation of Urdu to English using semi-supervised part-of-speech tagging and transliteration

Author:
Published in:
SLT 2008, IEEE Spoken Language Technology Workshop 2008, 15-10 December 2008, pp. 265-268.

Summary

This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.
READ LESS

Summary

This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction...

READ MORE

Efficient speech translation through confusion network decoding

Published in:
IEEE Trans. Audio Speech Lang. Proc., Vol. 16, No. 8, November 2008, pp. 1696-1705.

Summary

This paper describes advances in the use of confusion networks as interface between automatic speech recognition and machine translation. In particular, it presents a decoding algorithm for confusion networks which results as an extension of a state-of-the-art phrase-based text translation decoder. The confusion network decoder significantly improves both in efficiency and performance over previous work along this direction, and outperforms the background text translation system. Experimental results in terms of translation accuracy and decoding efficiency are reported for the task of translating plenary speeches of the European Parliament from Spanish to English and from English to Spanish.
READ LESS

Summary

This paper describes advances in the use of confusion networks as interface between automatic speech recognition and machine translation. In particular, it presents a decoding algorithm for confusion networks which results as an extension of a state-of-the-art phrase-based text translation decoder. The confusion network decoder significantly improves both in efficiency...

READ MORE

The MIT-LL/AFRL IWSLT-2007 MT System

Published in:
Int. Workshop on Spoken Language Translation, IWSLT, 15-16 October 2007.

Summary

The MIT-LL/AFRL MT system implements a standard phrase-based, statistical translation model. It incorporates a number of extensions that improve performance for speech-based translation. During this evaluation our efforts focused on the rapid porting of our SMT system to a new language (Arabic) and novel approaches to translation from speech input. This paper discusses the architecture of the MIT-LL/AFRL MT system, improvements over our 2007 system, and experiments we ran during the IWSLT-2007 evaluation. Specifically, we focus on 1) experiments comparing the performance of confusion network decoding and direct lattice decoding techniques for speech machine translation, 2) the application of lightweight morphology for Arabic MT pre-processing and 3) improved confusion network decoding.
READ LESS

Summary

The MIT-LL/AFRL MT system implements a standard phrase-based, statistical translation model. It incorporates a number of extensions that improve performance for speech-based translation. During this evaluation our efforts focused on the rapid porting of our SMT system to a new language (Arabic) and novel approaches to translation from speech input...

READ MORE

ILR-based MT comprehension test with multi-level questions

Published in:
Human Language Technology, North American Chapter of the Association for Computational Linguistics, HLT/NAACL, 22-27 April 2007.

Summary

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the MT condition, for both text and audio data types, with high quality human reference Gold Standard (GS) translations. Overall, subjects achieved 95% comprehension for GS and 74% for MT, across all genres and difficulty levels. Interestingly, comprehension rates do not correlate highly with translation error rates, suggesting that we are measuring an additional dimension of MT quality.
READ LESS

Summary

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the...

READ MORE

Experimental facility for measuring the impact of environmental noise and speaker variation on speech-to-speech translation devices

Published in:
Proc. IEEE Spoken Language Technology Workshop, 10-13 December 2006, pp. 250-253.

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately 60,000 phrases available for experimentation. We describe an initial experiment using the facility which shows the impact of environmental noise and speaker variability on phrase recognition accuracy for two commercially available oneway speech-to-speech translation devices configured for English-to-Arabic.
READ LESS

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately...

READ MORE

An efficient graph search decoder for phrase-based statistical machine translation

Published in:
Int. Workshop on Spoken Language Translation, 28 November 2006.

Summary

In this paper we describe an efficient implementation of a graph search algorithm for phrase-based statistical machine translation. Our goal was to create a decoder that could be used for both our research system and a real-time speech-to-speech machine translation demonstration system. The search algorithm is based on a Viterbi graph search with an A* heuristic. We were able to increase the speed of our decoder substantially through the use of on-the-fly beam pruning and other algorithmic enhancements. The decoder supports a variety of reordering constraints as well as arbitrary n-gram decoding. In addition, we have implemented disk based translation models and a messaging interface to communicate with other components for use in our real-time speech translation system.
READ LESS

Summary

In this paper we describe an efficient implementation of a graph search algorithm for phrase-based statistical machine translation. Our goal was to create a decoder that could be used for both our research system and a real-time speech-to-speech machine translation demonstration system. The search algorithm is based on a Viterbi...

READ MORE

The MIT-LL/AFRL IWSLT-2006 MT system

Published in:
Proc. Int. Workshop on Spoken Language Translation, IWSLT, 27-28 November 2006.

Summary

The MIT-LL/AFRL MT system is a statistical phrase-based translation system that implements many modern SMT training and decoding techniques. Our system was designed with the long-term goal of dealing with corrupted ASR input and limited amounts of training data for speech-to-speech MT applications. This paper will discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2005 system, and experiments with manual and ASR transcription data that were run as part of the IWSLT-2006 evaluation campaign.
READ LESS

Summary

The MIT-LL/AFRL MT system is a statistical phrase-based translation system that implements many modern SMT training and decoding techniques. Our system was designed with the long-term goal of dealing with corrupted ASR input and limited amounts of training data for speech-to-speech MT applications. This paper will discuss the architecture of...

READ MORE

The JHU Workshop 2006 IWSLT System

Published in:
Int. Workshop on Spoken Language Translation, IWSLT, 27-28 November 2006.

Summary

This paper describes the SMT we built during the 2006 JHU Summer Workshop for the IWSLT 2006 evaluation. Our effort focuses on two parts of the speech translation problem: 1) efficient decoding of word lattices and 2) novel applications of factored translation models to IWSLT-specific problems. In this paper, we present results from the open-track Chinese-to-English condition. Improvements of 5-10% relative BLEU are obtained over a high performing baseline. We introduce a new open-source decoder that implements the state-of-the-art in statistical machine translation.
READ LESS

Summary

This paper describes the SMT we built during the 2006 JHU Summer Workshop for the IWSLT 2006 evaluation. Our effort focuses on two parts of the speech translation problem: 1) efficient decoding of word lattices and 2) novel applications of factored translation models to IWSLT-specific problems. In this paper, we...

READ MORE

Toward an interagency language roundtable based assessment of speech-to-speech translation capabilitites

Published in:
AMTA 2006, 7th Biennial Conf. of the Association for Machine Translation in the Americas, 8-12 August 2006.

Summary

We present observations from three exercises designed to map the effective listening and speaking skills of an operator of a speech-to-speech translation system (S2S) to the Interagency Language Roundtable (ILR) scale. Such a mapping is nontrivial, but will be useful for government and military decision makers in managing expectations of S2S technology. We observed domain-dependent S2S capabilities in the ILR range of Level 0+ to Level 1, and interactive text-based machine translation in the Level 3 range.
READ LESS

Summary

We present observations from three exercises designed to map the effective listening and speaking skills of an operator of a speech-to-speech translation system (S2S) to the Interagency Language Roundtable (ILR) scale. Such a mapping is nontrivial, but will be useful for government and military decision makers in managing expectations of...

READ MORE