Publications

Refine Results

(Filters Applied) Clear All

Speaker verification using support vector machines and high-level features

Published in:
IEEE Trans. on Audio, Speech, and Language Process., Vol. 15, No. 7, September 2007, pp. 2085-2094.

Summary

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although a significant amount of work has been done in finding novel high-level features, less work has been done on modeling these features. We describe a method of speaker modeling based upon support vector machines. Current high-level feature extraction produces sequences or lattices of tokens for a given conversation side. These sequences can be converted to counts and then frequencies of -gram for a given conversation side. We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. Generalizations of this method are shown to produce excellent results on a variety of high-level features. We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. We also demonstrate that our system can perform well in conjunction with standard cesptral speaker recognition systems.
READ LESS

Summary

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although...

READ MORE

pMATLAB parallel MATLAB library

Author:
Published in:
Int. J. High Perform. Comp. Appl., Vol. 21, No. 3, Fall 2007, pp. 336-359.

Summary

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions), interpretive, interactive programming, and powerful mathematical graphics. The compute intensive nature of technical computing means that many MATLAB users have codes that can significantly benefit from the increased performance offered by parallel computing. pMatlab provides this capability by implementing parallel global array semantics using standard operator overloading techniques. The core data structure in pMatlab is a distributed numerical array whose distribution onto multiple processors is specified with a "map" construct. Communication operations between distributed arrays are abstracted away from the user and pMatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pMatlab is built on top of the MatlabMPI communication library and runs on any combination of heterogeneous systems that support MATLAB, which includes Windows, Linux, MacOS X, and SunOS. This paper describes the overall design and architecture of the pMatlab implementation. Performance is validated by implementing the HPC Challenge benchmark suite and comparing pMatlab performance with the equivalent C+MPI codes. These results indicate that pMatlab can often achieve comparable performance to C+MPI, usually at one tenth the code size. Finally, we present implementation data collected from a sample of real pMatlab applications drawn from the approximately one hundred users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to an efficient pMatlab code in about 3 hours while changing less than 1% of their code.
READ LESS

Summary

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions)...

READ MORE

Back-illuminated three-dimensionally integrated CMOS image sensors for scientific applications

Published in:
SPIE Vol. 6690, Focal Plane Arrays for Space Telescopes III, 27-28 August 2007, 669009.

Summary

SOI-based active pixel image sensors have been built in both monolithic and vertically interconnected pixel technologies. The latter easily supports the inclusion of more complex pixel circuitry without compromising pixel fill factor. A wafer-scale back-illumination process is used to achieve 100% fill factor photodiodes. Results from 256 x 256 and 1024 x 1024 pixel arrays are presented, with discussion of dark current improvement in the differing technologies.
READ LESS

Summary

SOI-based active pixel image sensors have been built in both monolithic and vertically interconnected pixel technologies. The latter easily supports the inclusion of more complex pixel circuitry without compromising pixel fill factor. A wafer-scale back-illumination process is used to achieve 100% fill factor photodiodes. Results from 256 x 256 and...

READ MORE

Construction of a phonotactic dialect corpus using semiautomatic annotation

Summary

In this paper, we discuss rapid, semiautomatic annotation techniques of detailed phonological phenomena for large corpora. We describe the use of these techniques for the development of a corpus of American English dialects. The resulting annotations and corpora will support both large-scale linguistic dialect analysis and automatic dialect identification. We delineate the semiautomatic annotation process that we are currently employing and, a set of experiments we ran to validate this process. From these experiments, we learned that the use of ASR techniques could significantly increase the throughput and consistency of human annotators.
READ LESS

Summary

In this paper, we discuss rapid, semiautomatic annotation techniques of detailed phonological phenomena for large corpora. We describe the use of these techniques for the development of a corpus of American English dialects. The resulting annotations and corpora will support both large-scale linguistic dialect analysis and automatic dialect identification. We...

READ MORE

A comparison of speaker clustering and speech recognition techniques for air situational awareness

Author:
Published in:
INTERSPEECH 2007, 27-31 August 2007, pp. 2421-2424.

Summary

In this paper we compare speaker clustering and speech recognition techniques to the problem of understanding patterns of air traffic control communications. For a given radio transmission, our goal is to identify the talker and to whom he/she is speaking. This information, in combination with knowledge of the roles (i.e. takeoff, approach, hand-off, taxi, etc.) of different radio frequencies within an air traffic control region could allow tracking of pilots through various stages of flight, thus providing the potential to monitor the airspace in great detail. Both techniques must contend with degraded audio channels and significant non-native accents. We report results from experiments using the nn-MATC database showing 9.3% and 32.6% clustering error for speaker clustering and ASR methods respectively.
READ LESS

Summary

In this paper we compare speaker clustering and speech recognition techniques to the problem of understanding patterns of air traffic control communications. For a given radio transmission, our goal is to identify the talker and to whom he/she is speaking. This information, in combination with knowledge of the roles (i.e...

READ MORE

A new kernel for SVM MLLR based speaker recognition

Published in:
INTERSPEECH, 27-31 August 2007.

Summary

Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a Gaussian mixture model (GMM), and maximum likelihood linear regression (MLLR) adaptation is used to adapt the means of the UBM. We examine two possible SVM feature expansions that arise in this context: the first, a GMM supervector is constructed by stacking the means of the adapted GMM, and the second consists of the elements of the MLLR transform. We examine several kernels associated with these expansions. We show that both expansions are equivalent given an appropriate choice of kernels. Experiments performed on the NIST SRE 2006 corpus clearly highlight that our choice of kernels, which are motivated by distance metrics between GMMs, outperform ad-hoc ones. We also apply SVM nuisance attribute projection (NAP) to the kernels as a form of channel compensation and show that, with a proper choice of kernel, we achieve results comparable to existing SVM based recognizers.
READ LESS

Summary

Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a...

READ MORE

Improving phonotactic language recognition with acoustic adaptation

Author:
Published in:
INTERSPEECH 2007, 27-31 August 2007, pp. 358-361.

Summary

In recent evaluations of automatic language recognition systems, phonotactic approaches have proven highly effective. However, as most of these systems rely on underlying ASR techniques to derive a phonetic tokenization, these techniques are potentially susceptible to acoustic variability from non-language sources (i.e. gender, speaker, channel, etc.). In this paper we apply techniques from ASR research to normalize and adapt HMM-based phonetic models to improve phonotactic language recognition performance. Experiments we conducted with these techniques show an EER reduction of 29% over traditional PRLM-based approaches.
READ LESS

Summary

In recent evaluations of automatic language recognition systems, phonotactic approaches have proven highly effective. However, as most of these systems rely on underlying ASR techniques to derive a phonetic tokenization, these techniques are potentially susceptible to acoustic variability from non-language sources (i.e. gender, speaker, channel, etc.). In this paper we...

READ MORE

Variable projection and unfolding in compressed sensing

Published in:
Proc. 14th IEEE/SP Workshop on Statistical Signal Processing, 26-28 August 2007, pp. 358-362.

Summary

The performance of linear programming techniques that are applied in the signal identification and reconstruction process in compressed sensing (CS) is governed by both the number of measurements taken and the number of nonzero coefficients in the discrete basis used to represent the signal. To enhance the capabilities of CS, we have developed a technique called Variable Projection and Unfolding (VPU). VPU extends the identification and reconstruction capability of linear programming techniques to signals with a much greater number of nonzero coefficients in the basis in which the signals are compressible with significantly better reconstruction error.
READ LESS

Summary

The performance of linear programming techniques that are applied in the signal identification and reconstruction process in compressed sensing (CS) is governed by both the number of measurements taken and the number of nonzero coefficients in the discrete basis used to represent the signal. To enhance the capabilities of CS...

READ MORE

Multifocal multiphoton microscopy (MMM) at a frame rate beyond 600 Hz

Published in:
Opt. Express, Vol. 15, No. 17, 20 August 2007, pp. 10998-11005.

Summary

We introduce a multiphoton microscope for high-speed three-dimensional (3D) fluorescence imaging. The system combines parallel illumination by a multifocal multiphoton microscope (MMM) with parallel detection via a segmented high-sensitivity charge-couple device (CCD) camera. The instrument consists of a Ti-sapphire laser illuminating a microlens array that projects 36 foci onto the focal plane. The foci are scanned using a resonance scanner and imaged with a custom-made CCD camera. The MMM increases the imaging speed by parallelizing the illumination; the CCD camera can operate at a frame rate of 1428 Hz while maintaining a low read noise of 11 electrons per pixel by dividing its chip into 16 independent segments for parallelized readout. We image fluorescent specimens at a frame rate of 640 Hz. The calcium wave of fluo3 labeled cardiac myocytes is measured by imaging the spontaneous contraction of the cells in a 0.625 second sequence movie, consisting of 400 single images.
READ LESS

Summary

We introduce a multiphoton microscope for high-speed three-dimensional (3D) fluorescence imaging. The system combines parallel illumination by a multifocal multiphoton microscope (MMM) with parallel detection via a segmented high-sensitivity charge-couple device (CCD) camera. The instrument consists of a Ti-sapphire laser illuminating a microlens array that projects 36 foci onto the...

READ MORE

Analysis of ground surveillance assets to support Global Hawk airspace access at Beale Air Force Base

Summary

This study, performed from May 2006 to January 2007 by MIT Lincoln Laboratory, investigated the feasibility of providing ground-sensor-based traffic data directly to Global Hawk operators at Beale AFB. The system concept involves detecting and producing tracks for all cooperative (transponder-equipped) and non-cooperative aircraft from the surface to 18,000 ft MSL, extending from the Beale AFB Class C airspace cylinder northward to the China Military Operations Area (MOA). Data from multiple sensors can be fused together to create a comprehensive air surveillance picture, with the altitudes of non-cooperative targets estimated by fusing returns from all available sensor data. Such a capability, if accepted by the FAA, could mitigate the need for Temporary Flight Restrictions (TFR) to satisfy Certificate of Waiver or Authorization (COA) requirements. There are no existing specifications for ground-sensor-based Unmanned Aerial Systems (UAS) traffic avoidance procedures, nor is it yet known how precisely altitude needs to be estimated. It may be possible to avoid traffic laterally, in which case traffic altitude need not be known accurately. If, however, it is necessary to also avoid traffic vertically, then altitudes will need to be estimated to some (as yet undefined) level of accuracy.
READ LESS

Summary

This study, performed from May 2006 to January 2007 by MIT Lincoln Laboratory, investigated the feasibility of providing ground-sensor-based traffic data directly to Global Hawk operators at Beale AFB. The system concept involves detecting and producing tracks for all cooperative (transponder-equipped) and non-cooperative aircraft from the surface to 18,000 ft...

READ MORE