Publications

Refine Results

(Filters Applied) Clear All

Measuring human readability of machine generated text: three case studies in speech recognition and machine translation

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 5, ICASSP, 19-23 March 2005, pp. V-1009 - V-1012.

Summary

We present highlights from three experiments that test the readability of current state-of-the art system output from (1) an automated English speech-to-text system (2) a text-based Arabic-to-English machine translation system and (3) an audio-based Arabic-to-English MT process. We measure readability in terms of reaction time and passage comprehension in each case, applying standard psycholinguistic testing procedures and a modified version of the standard Defense Language Proficiency Test for Arabic called the DLPT*. We learned that: (1) subjects are slowed down about 25% when reading system STT output, (2) text-based MT systems enable an English speaker to pass Arabic Level 2 on the DLPT* and (3) audio-based MT systems do not enable English speakers to pass Arabic Level 2. We intend for these generic measures of readability to predict performance of more application-specific tasks.
READ LESS

Summary

We present highlights from three experiments that test the readability of current state-of-the art system output from (1) an automated English speech-to-text system (2) a text-based Arabic-to-English machine translation system and (3) an audio-based Arabic-to-English MT process. We measure readability in terms of reaction time and passage comprehension in each...

READ MORE

The 2004 MIT Lincoln Laboratory speaker recognition system

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, 19-23 March 2005, pp. I-177 - I-180.

Summary

The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was built upon seven core systems using speaker information from short-term acoustics, pitch and duration prosodic behavior, and phoneme and word usage. These different levels of information were modeled and classified using Gaussian Mixture Models, Support Vector Machines and N-gram language models and were combined using a single layer perception fuser. The 2004 SRE used a new multi-lingual, multi-channel speech corpus that provided a challenging speaker detection task for the above systems. In this paper we describe the core systems used and provide an overview of their performance on the 2004 SRE detection tasks.
READ LESS

Summary

The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was built upon seven core systems using speaker information from short-term acoustics, pitch and duration prosodic behavior, and phoneme and word usage. These different levels of information were modeled and classified using Gaussian Mixture Models, Support Vector...

READ MORE

Evaluating static analysis tools for detecting buffer overflows in C code

Published in:
Thesis (MLA)--Harvard University, 2005.

Summary

This project evaluated five static analysis tools using a diagnostic test suite to determine their strengths and weaknesses in detecting a variety of buffer overflow flaws in C code. Detection, false alarm, and confusion rates were measured, along with execution time. PolySpace demonstrated a superior detection rate on the basic test suite, missing only one out of a possible 291 detections. It may benefit from improving its treatment of signal handlers, and reducing both its false alarm rate (particularly for C library functions) and execution time. ARCHER performed quite well with no false alarms whatsoever; a few key enhancements, such as in its inter-procedural analysis and handling of C library functions, would boost its detection rate and should improve its performance on real-world code. Splint detected significantly fewer overflows and exhibited the highest false alarm rate. Improvements in its loop handling, and reductions in its false alarm rate would make it a much more useful tool. UNO had no false alarms, but missed a broad variety of overflows amounting to nearly half of the possible detections in the test suite. It would need improvement in many areas to become a very useful tool. BOON was clearly at the back of the pack, not even performing well on the subset of test cases where it could have been expected to function. The project also provides a buffer overflow taxonomy, along with a test suite generator and other tools, that can be used by others to evaluate code analysis tools with respect to buffer overflow detection.
READ LESS

Summary

This project evaluated five static analysis tools using a diagnostic test suite to determine their strengths and weaknesses in detecting a variety of buffer overflow flaws in C code. Detection, false alarm, and confusion rates were measured, along with execution time. PolySpace demonstrated a superior detection rate on the basic...

READ MORE

Advances in channel compensation for SVM speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 1, 19-23 March 2005, pp. I-629 - I-631.

Summary

Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study the problem for speaker recognition using support vector machines (SVMs). We perform channel compensation in SVM modeling by removing non-speaker nuisance dimensions in the SVM expansion space via projections. Training to remove these dimensions is accomplished via an eigenvalue problem. The eigenvalue problem attempts to reduce multisession variation for the same speaker, reduce different channel effects, and increase "distance" between different speakers. We apply our methods to a subset of the Switchboard 2 corpus. Experiments show dramatic improvement in performance for the cross-channel case.
READ LESS

Summary

Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study the problem for speaker recognition using support vector machines (SVMs). We perform channel compensation in SVM modeling by removing non-speaker nuisance dimensions in the SVM expansion space via projections. Training to remove these dimensions is accomplished...

READ MORE

Automatic dysphonia recognition using biologically-inspired amplitude-modulation features

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, 19-23 March 2005, pp. I-873 - I-876.

Summary

A dysphonia, or disorder of the mechanisms of phonation in the larynx, can create time-varying amplitude fluctuations in the voice. A model for band-dependent analysis of this amplitude modulation (AM) phenomenon in dysphonic speech is developed from a traditional communications engineering perspective. This perspective challenges current dysphonia analysis methods that analyze AM in the time-domain signal. An automatic dysphonia recognition system is designed to exploit AM in voice using a biologically-inspired model of the inferior colliculus. This system, built upon a Gaussian-mixture-model (GMM) classification backend, recognizes the presence of dysphonia in the voice signal. Recognition experiments using data obtained from the Kay Elemetrics Voice Disorders Database suggest that the system provides complementary information to state-of-the-art mel-cepstral features. We present dysphonia recognition as an approach to developing features that capture glottal source differences in normal speech.
READ LESS

Summary

A dysphonia, or disorder of the mechanisms of phonation in the larynx, can create time-varying amplitude fluctuations in the voice. A model for band-dependent analysis of this amplitude modulation (AM) phenomenon in dysphonic speech is developed from a traditional communications engineering perspective. This perspective challenges current dysphonia analysis methods that...

READ MORE

Estimating and evaluating confidence for forensic speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 1, 19-23 March 2005, pp. I-717 - I-720.

Summary

Estimating and evaluating confidence has become a key aspect of the speaker recognition problem because of the increased use of this technology in forensic applications. We discuss evaluation measures for speaker recognition and some of their properties. We then propose a framework for confidence estimation based upon scores and metainformation, such as utterance duration, channel type, and SNR. The framework uses regression techniques with multilayer perceptrons to estimate confidence with a data-driven methodology. As an application, we show the use of the framework in a speaker comparison task drawn from the NIST 2000 evaluation. A relative comparison of different types of meta-information is given. We demonstrate that the new framework can give substantial improvements over standard distribution methods of estimating confidence.
READ LESS

Summary

Estimating and evaluating confidence has become a key aspect of the speaker recognition problem because of the increased use of this technology in forensic applications. We discuss evaluation measures for speaker recognition and some of their properties. We then propose a framework for confidence estimation based upon scores and metainformation...

READ MORE

New measures of effectiveness for human language technology

Summary

The field of human language technology (HLT) encompasses algorithms and applications dedicated to processing human speech and written communication. We focus on two types of HLT systems: (1) machine translation systems, which convert text and speech files from one human language to another, and (2) speech-to-text (STT) systems, which produce text transcripts when given audio files of human speech as input. Although both processes are subject to machine errors and can produce varying levels of garbling in their output, HLT systems are improving at a remarkable pace, according to system-internal measures of performance. To learn how these system-internal measurements correlate with improved capabilities for accomplishing real-world language-understanding tasks, we have embarked on a collaborative, interdisciplinary project involving Lincoln Laboratory, the MIT Department of Brain and Cognitive Sciences, and the Defense Language Institute Foreign Language Center to develop new techniques to scientifically measure the effectiveness of these technologies when they are used by human subjects.
READ LESS

Summary

The field of human language technology (HLT) encompasses algorithms and applications dedicated to processing human speech and written communication. We focus on two types of HLT systems: (1) machine translation systems, which convert text and speech files from one human language to another, and (2) speech-to-text (STT) systems, which produce...

READ MORE

Parallel MATLAB for extreme virtual memory

Published in:
Proc. of the HPCMP Users Group Conf., 27-30 June 2005, pp. 381-387.

Summary

Many DoD applications have extreme memory requirements, often with data sets larger than memory on a single computer. Such data sets can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for eXtreme Virtual Memory (pMatlab XVM) library adds out-of-core extensions to the Parallel Matlab (pMatlab) library. The DARPA High Productivity Computing Systems' HPC challenge FFT benchmark has been implemented in C+MPI, pMatlab, pMatlab hand coded for out-of-core and pMatlab XVM. We found that 1) the performance of the C+MPI and pMatlab versions were comparable; 2) the out-of-core versions deliver 80% of the performance of the in-core versions; 3) the out-of-core versions were able to perform a 1 TB (64 billion point) FFT; and 4) the pMatlab XVM program was smaller, easier to implement and verify, and more efficient than its hand coded equivalent. We plan to apply pMatlab XVM to the full HPC challenge benchmark suite. Using next generation hardware, problems sizes a factor of 100 to 1000 times larger should be feasible. We are also transitioning this technology to several DoD signal processing applications. Finally, the flexibility of pMatlab XVM allows hardware designers to experiment with FFT parameters in software before designing hardware for a real-time, ultra-long FFT.
READ LESS

Summary

Many DoD applications have extreme memory requirements, often with data sets larger than memory on a single computer. Such data sets can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for...

READ MORE

Technology requirements for supporting on-demand interactive grid computing

Summary

It is increasingly being recognized that a large pool of High Performance Computing (HPC) users requires interactive, on-demand access to HPC resources. How to provide these resources is a significant technical challenge that can be addressed from two directions. The first approach is to adapt existing batch queue based HPC systems to make them more interactive. The second approach is to start with existing interactive desktop environments (e.g., MATLAB) and design a system from the ground up that allows interactive parallel computing. The Lincoln Laboratory Grid (LLGrid) project has taken the latter approach. The LLGrid system has been operational for over a year with a few hundred processors and roughly 70 users, having run over 13,000 interactive jobs and consumed approximately 10,000 processor days of computation. This paper compares the on-demand and interactive computing features of four prominent batch queuing systems: openPBS, Sun GridEngine, Condor, and LSF. It goes on to briefly describe the LLGrid system, and how interactive, on-demand computing was achieved on it by binding to a resource management system. Finally, usage characteristics of the LLGrid system are discussed.
READ LESS

Summary

It is increasingly being recognized that a large pool of High Performance Computing (HPC) users requires interactive, on-demand access to HPC resources. How to provide these resources is a significant technical challenge that can be addressed from two directions. The first approach is to adapt existing batch queue based HPC...

READ MORE

Parallel VSIPL++: an open standard software library for high-performance parallel signal processing

Published in:
Proc. IEEE, Vol. 93, No. 2 , February 2005, pp. 313-330.

Summary

Real-time signal processing consumes the majority of the world's computing power. Increasingly, programmable parallel processors are used to address a wide variety of signal processing applications (e.g., scientific, video, wireless, medical, communication, encoding, radar, sonar, and imaging). In programmable systems, the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in allowing the user to write programs at high level, while still achieving performance and preserving the portability of the code across parallel computing hardware platforms. The Parallel Vector, Signal, and Image Processing Library (Parallel VSIPL++) addresses this hurdle by providing high-level C++ array constructs, a simple mechanism for mapping data and functions onto parallel hardware, and a community-defined portable interface. This paper presents an overview of the Parallel VSIPL++ standard as well as a deeper description of the technical foundations and expected performance of the library. Parallel VSIPL++ supports adaptive optimization at many levels. The C++ arrays are designed to support automatic hardware specialization by the compiler. The computation objects (e.g., fast Fourier transforms) are built with explicit setup and run stages to allow for runtime optimization. Parallel arrays and functions in Parallel VSIPL++ also support explicit setup and run stages, which are used to accelerate communication operations. The parallel mapping mechanism provides an external interface that allows optimal mappings to be generated offline and read into the system at runtime. Finally, the standard has been developed in collaboration with high performance embedded computing vendors and is compatible with their proprietary approaches to achieving performance.
READ LESS

Summary

Real-time signal processing consumes the majority of the world's computing power. Increasingly, programmable parallel processors are used to address a wide variety of signal processing applications (e.g., scientific, video, wireless, medical, communication, encoding, radar, sonar, and imaging). In programmable systems, the major challenge is no longer hardware but software. Specifically...

READ MORE