Publications

Refine Results

(Filters Applied) Clear All

Large-scale analysis of formant frequency estimation variability in conversational telephone speech

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular channels than those in landline, while F2 in general shows an opposite trend. We also characterized vowel shift trends in northern states in U.S.A. and compared them with the Northern city chain shift (NCCS). Our analysis is useful in forensic applications where it is important to distinguish between speaker, dialect, and channel characteristics.
READ LESS

Summary

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular...

READ MORE

The MIT Lincoln Laboratory 2008 speaker recognition system

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker verification systems. In this paper we present the MIT Lincoln Laboratory Speaker Recognition system applied to the task in the NIST 2008 SRE. Our approach during the evaluation was two-fold: 1) Utilize recent advances in variational nuisance modeling (latent factor analysis and nuisance attribute projection) to allow our spectral speaker verification systems to better compensate for the channel variation introduced, and 2) fuse systems targeting the different linguistic tiers of information, high and low. The performance of the system is presented when applied on a NIST 2008 SRE task. Post evaluation analysis is conducted on the sub-task when interview microphones are present.
READ LESS

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker...

READ MORE

Automatic registration of LIDAR and optical images of urban scenes

Published in:
CVPR 2009, IEEE Conf. on Computer Vision and Pattern Recognition, 20-25 June 2009, pp. 2639-2646.

Summary

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a novel application of mutual information registration methods, which exploits the statistical dependency in urban scenes of optical apperance with measured LIDAR elevation. We utilize the well known downhill simplex optimization to infer camera pose parameters. We discuss three methods for measuring mutual information between LIDAR imagery and optical imagery. Utilization of OpenGL and graphics hardware in the optimization process yields registration times dramatically lower than previous methods. Using an initial registration comparable to GPS/INS accuracy, we demonstrate the utility of our algorithm with a collection of urban images and present 3D models created with the fused imagery.
READ LESS

Summary

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a...

READ MORE

Compressed sensing arrays for frequency-sparse signal detection and geolocation

Published in:
Proc. of the 2009 DoD High Performance Computing Modernization Program Users Group Conf., HPCMP-UGC, 15 June 2009, pp. 297-301.

Summary

Compressed sensing (CS) can be used to monitor very wide bands when the received signals are sparse in some basis. We have developed a compressed sensing receiver architecture with the ability to detect, demodulate, and geolocate signals that are sparse in frequency. In this paper, we evaluate detection, reconstruction, and angle of arrival (AoA) estimation via Monte Carlo simulation and find that, using a linear 4- sensor array and undersampling by a factor of 8, we achieve near-perfect detection when the received signals occupy up to 5% of the bandwidth being monitored and have an SNR of 20 dB or higher. The signals in our band of interest include frequency-hopping signals detected due to consistent AoA. We compare CS array performance using sensor-frequency and space-frequency bases, and determine that using the sensor-frequency basis is more practical for monitoring wide bands. Though it requires that the received signals be sparse in frequency, the sensor-frequency basis still provides spatial information and is not affected by correlation between uncompressed basis vectors.
READ LESS

Summary

Compressed sensing (CS) can be used to monitor very wide bands when the received signals are sparse in some basis. We have developed a compressed sensing receiver architecture with the ability to detect, demodulate, and geolocate signals that are sparse in frequency. In this paper, we evaluate detection, reconstruction, and...

READ MORE

Polyphase nonlinear equalization of time-interleaved analog-to-digital converters

Published in:
IEEE J. Sel. Top. Sig. Process., Vol. 3, No. 3, June 2009, pp. 362-373.

Summary

As the demand for higher data rates increases, commercial analog-to-digital converters (ADCs) are more commonly being implemented with multiple on-chip converters whose outputs are time-interleaved. The distortion generated by time-interleaved ADCs is now not only a function of the nonlinear behavior of the constituent circuitry, but also mismatches associated with interleaving multiple output streams. To mitigate distortion generated by time-interleaved ADCs, we have developed a polyphase NonLinear EQualizer (pNLEQ) which is capable of simultaneously mitigating distortion generated by both the on-chip circuitry and mismatches due to time interleaving. In this paper, we describe the pNLEQ architecture and present measurements of its performance.
READ LESS

Summary

As the demand for higher data rates increases, commercial analog-to-digital converters (ADCs) are more commonly being implemented with multiple on-chip converters whose outputs are time-interleaved. The distortion generated by time-interleaved ADCs is now not only a function of the nonlinear behavior of the constituent circuitry, but also mismatches associated with...

READ MORE

Machine translation for government applications

Published in:
Lincoln Laboratory Journal, Vol. 18, No. 1, June 2009, pp. 41-53.

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer scientists, mathematicians, and linguists since the first international conference on MT was held at the Massachusetts Institute of Technology in 1952. Currently, Lincoln Laboratory is achieving success in a highly focused research program that specializes in developing speech translation technology for limited language resource domains and in adapting foreign-language proficiency standards for MT evaluation. Our specialized research program is situated within a general framework for multilingual speech and text processing for government applications.
READ LESS

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer...

READ MORE

Advocate: a distributed architecture for speech-to-speech translation

Author:
Published in:
Lincoln Laboratory Journal, Vol. 18, No. 1, June 2009, pp. 54-65.

Summary

Advocate is a set of communications application programming interfaces and service wrappers that serve as a framework for creating complex and scalable real-time software applications from component processing algorithms. Advocate can be used for a variety of distributed processing applications, but was initially designed to use existing speech processing and machine translation components in the rapid construction of large-scale speech-to-speech translation systems. Many such speech-to-speech translation applications require real-time processing, and Advocate provides this speed with low-latency communication between services.
READ LESS

Summary

Advocate is a set of communications application programming interfaces and service wrappers that serve as a framework for creating complex and scalable real-time software applications from component processing algorithms. Advocate can be used for a variety of distributed processing applications, but was initially designed to use existing speech processing and...

READ MORE

Advocate: a distributed voice-oriented computing architecture

Published in:
North American Chapter of the Association for Computational Linguistics - Human Language Technologies Conf. (NAACL HLT 2009), 31 May - 5 June 2009.

Summary

Advocate is a lightweight and easy-to-use computing architecture that supports real-time, voice-oriented computing. It is designed to allow the combination of multiple speech and language processing components to create cohesive distributed applications. It is scalable, supporting local processing of all NLP/speech components when sufficient processing resources are available to one machine, or fully distributed/networked processing over an arbitrarily large compute structure when more compute resources are needed. Advocate is designed to operate in a large distributed test-bed in which an arbitrary number of NLP/speech services interface with an arbitrary number of Advocate clients applications. In this configuration, each Advocate client application employs automatic service discovery, calling them as required.
READ LESS

Summary

Advocate is a lightweight and easy-to-use computing architecture that supports real-time, voice-oriented computing. It is designed to allow the combination of multiple speech and language processing components to create cohesive distributed applications. It is scalable, supporting local processing of all NLP/speech components when sufficient processing resources are available to one...

READ MORE

Modeling and detection techniques for counter-terror social network analysis and intent recognition

Summary

In this paper, we describe our approach and initial results on modeling, detection, and tracking of terrorist groups and their intents based on multimedia data. While research on automated information extraction from multimedia data has yielded significant progress in areas such as the extraction of entities, links, and events, less progress has been made in the development of automated tools for analyzing the results of information extraction to ?connect the dots.? Hence, our Counter-Terror Social Network Analysis and Intent Recognition (CT-SNAIR) work focuses on development of automated techniques and tools for detection and tracking of dynamically-changing terrorist networks as well as recognition of capability and potential intent. In addition to obtaining and working with real data for algorithm development and test, we have a major focus on modeling and simulation of terrorist attacks based on real information about past attacks. We describe the development and application of a new Terror Attack Description Language (TADL), which is used as a basis for modeling and simulation of terrorist attacks. Examples are shown which illustrate the use of TADL and a companion simulator based on a Hidden Markov Model (HMM) structure to generate transactions for attack scenarios drawn from real events. We also describe our techniques for generating realistic background clutter traffic to enable experiments to estimate performance in the presence of a mix of data. An important part of our effort is to produce scenarios and corpora for use in our own research, which can be shared with a community of researchers in this area. We describe our scenario and corpus development, including specific examples from the September 2004 bombing of the Australian embassy in Jakarta and a fictitious scenario which was developed in a prior project for research in social network analysis. The scenarios can be created by subject matter experts using a graphical editing tool. Given a set of time ordered transactions between actors, we employ social network analysis (SNA) algorithms as a filtering step to divide the actors into distinct communities before determining intent. This helps reduce clutter and enhances the ability to determine activities within a specific group. For modeling and simulation purposes, we generate random networks with structures and properties similar to real-world social networks. Modeling of background traffic is an important step in generating classifiers that can separate harmless activities from suspicious activity. An algorithm for recognition of simulated potential attack scenarios in clutter based on Support Vector Machine (SVM) techniques is presented. We show performance examples, including probability of detection versus probability of false alarm tradeoffs, for a range of system parameters.
READ LESS

Summary

In this paper, we describe our approach and initial results on modeling, detection, and tracking of terrorist groups and their intents based on multimedia data. While research on automated information extraction from multimedia data has yielded significant progress in areas such as the extraction of entities, links, and events, less...

READ MORE

Forensic speaker recognition: a need for caution

Summary

There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives, and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or innocence. Challenges, realities, and cautions regarding the use of speaker recognition applied to forensic-quality samples are presented.
READ LESS

Summary

There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives, and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or...

READ MORE