Publications

Refine Results

(Filters Applied) Clear All

A comparison of subspace feature-domain methods for language recognition

Summary

Compensation of cepstral features for mismatch due to dissimilar train and test conditions has been critical for good performance in many speech applications. Mismatch is typically due to variability from changes in speaker, channel, gender, and environment. Common methods for compensation include RASTA, mean and variance normalization, VTLN, and feature warping. Recently, a new class of subspace methods for model compensation have become popular in language and speaker recognition--nuisance attribute projection (NAP) and factor analysis. A feature space version of latent factor analysis has been proposed. In this work, a feature space version of NAP is presented. This new approach, fNAP, is contrasted with feature domain latent factor analysis (fLFA). Both of these methods are applied to a NIST language recognition task. Results show the viability of the new fNAP method. Also, results indicate when the different methods perform best.
READ LESS

Summary

Compensation of cepstral features for mismatch due to dissimilar train and test conditions has been critical for good performance in many speech applications. Mismatch is typically due to variability from changes in speaker, channel, gender, and environment. Common methods for compensation include RASTA, mean and variance normalization, VTLN, and feature...

READ MORE

A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 2542-2545.

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content words in the SVM kernel function score if the feature space is not normalized properly. In this paper we apply the discriminative minimum classification error (MCE) training approach to the problem of learning an appropriate feature space normalization for use with an SVM classifier. Results are presented showing significant error rate reductions for an SVM-based system on a topic identification task using the Fisher corpus of audio recordings of human conversations.
READ LESS

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content...

READ MORE

Dialect recognition using adapted phonetic models

Published in:
INTERSPEECH 2008, 22-26 September 2008, p. 763-766.

Summary

In this paper, we introduce a dialect recognition method that makes use of phonetic models adapted per dialect without phonetically labeled data. We show that this method can be implemented efficiently within an existing PRLM system. We compare the performance of this system with other state-of-the-art dialect recognition methods (both acoustic and token-based) on the NIST LRE 2007 English and Mandarin dialect recognition tasks. Our experimental results indicate that this system can perform better than baseline GMM and adapted PRLM systems, and also results in consistent gains of 15-23% when combined with other systems.
READ LESS

Summary

In this paper, we introduce a dialect recognition method that makes use of phonetic models adapted per dialect without phonetically labeled data. We show that this method can be implemented efficiently within an existing PRLM system. We compare the performance of this system with other state-of-the-art dialect recognition methods (both...

READ MORE

Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition

Published in:
Proc. INTERSPEECH 2008, 22-26 September 2008, pp. 723-726.

Summary

This paper presents a series of dialect/accent identification results for three sets of dialects with discriminatively trained Gaussian mixture models and feature compensation using eigen-channel decomposition. The classification tasks evaluated in the paper include: 1)the Chinese language classes, 2) American and Indian accented English and 3) discrimination between three Arabic dialects. The first two tasks were evaluated on the 2007 NIST LRE corpus. The Arabic discrimination task was evaluated using data derived from the LDC Arabic set collected by Appen. Analysis is performed for the English accent problem studied and an approach to open set dialect scoring is introduced. The system resulted in equal error rates at or below 10% for each of the tasks studied.
READ LESS

Summary

This paper presents a series of dialect/accent identification results for three sets of dialects with discriminatively trained Gaussian mixture models and feature compensation using eigen-channel decomposition. The classification tasks evaluated in the paper include: 1)the Chinese language classes, 2) American and Indian accented English and 3) discrimination between three Arabic...

READ MORE

The MITLL NIST LRE 2007 language recognition system

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST?s 14-language detection task are presented for both the closed-set and open-set tasks and for the 30, 10 and 3 second durations. On the 30 second 14-language closed set detection task, the system achieves a 1% equal error rate.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST?s 14-language detection task are presented for...

READ MORE

Two protocols comparing human and machine phonetic discrimination performance in conversational speech

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 1630-1633.

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest that in conversational telephone speech, human performance on these tasks exceeds that of machines by 15%. Furthermore, in a related controlled language model control experiment, human subjects were better able to correctly predict words in conversational speech by 45%.
READ LESS

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest...

READ MORE

Beyond frame independence: parametric modelling of time duration in speaker and language recognition

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 767-770.

Summary

In this work, we address the question of generating accurate likelihood estimates from multi-frame observations in speaker and language recognition. Using a simple theoretical model, we extend the basic assumption of independent frames to include two refinements: a local correlation model across neighboring frames, and a global uncertainty due to train/test channel mismatch. We present an algorithm for discriminative training of the resulting duration model based on logistic regression combined with a bisection search. We show that using this model we can achieve state-of-the-art performance for the NIST LRE07 task. Finally, we show that these more accurate class likelihood estimates can be combined to solve multiple problems using Bayes' rule, so that we can expand our single parametric backend to replace all six separate back-ends used in our NIST LRE submission for both closed and open sets.
READ LESS

Summary

In this work, we address the question of generating accurate likelihood estimates from multi-frame observations in speaker and language recognition. Using a simple theoretical model, we extend the basic assumption of independent frames to include two refinements: a local correlation model across neighboring frames, and a global uncertainty due to...

READ MORE

Detection probability modeling for airport wind-shear sensors

Author:
Published in:
MIT Lincoln Laboratory Report ATC-340

Summary

An objective wind-shear detection probability estimation model is developed for radar, lidar, and sensor combinations. The model includes effects of system sensitivity, site-specific wind-shear, clutter, and terrain blockage characteristics, range-aliased obscuration statistics, antenna beam filling and attenuation, and signal processing differences which allow a sensor- and site-specific performance analysis of deployed and future systems. A total of 161 sites are analyzed for the study, consisting of airports currently serviced by the Terminal Doppler Weather Radar (TDWR) (46), Airport Surveillance Radar Weather Systems Processor (ASR-9 WSP) (35), Low Altitude Wind Shear Alert System-Relocation/Sustainment (LLWAS-RS) (40), and no wind-shear detection system (40). Sensors considered are the TDWR, WSP, LLWAS, Weather Surveillance Radar 1988-Doppler (WSR-88D, commonly known as NEXRAD), adn the Lockheed Martin Coherent Technologies (LMCT) Doppler lidar and proposed x-band radar. [not complete]
READ LESS

Summary

An objective wind-shear detection probability estimation model is developed for radar, lidar, and sensor combinations. The model includes effects of system sensitivity, site-specific wind-shear, clutter, and terrain blockage characteristics, range-aliased obscuration statistics, antenna beam filling and attenuation, and signal processing differences which allow a sensor- and site-specific performance analysis of...

READ MORE

Amplitude spectroscopy of a solid-state artificial atom

Summary

The energy-level structure of a quantum system, which has a fundamental role in its behaviour, can be observed as discrete lines and features in absorption and emission spectra. Conventionally, spectra are measured using frequency spectroscopy, whereby the frequency of a harmonic electromagnetic driving field is tuned into resonance with a particular separation between energy levels. Although this technique has been successfully employed in a variety of physical systems, including natural and artificial atoms and molecules, its application is not universally straightforward and becomes extremely challenging for frequencies in the range of tens to hundreds of gigahertz. Here we introduce a complementary approach, amplitude spectroscopy, whereby a harmonic driving field sweeps an artificial atom through the avoided crossings between energy levels at a fixed frequency. Spectroscopic information is obtained from the amplitude dependence of the system's response, thereby overcoming many of the limitations of a broadband-frequency-based approach. The resulting 'spectroscopy diamonds', the regions in parameter space where transitions between specific pairs of levels can occur, exhibit interference patterns and population inversion that serve to distinguish the atom's spectrum. Amplitude spectroscopy provides a means of manipulating and characterizing systems over an extremely broad bandwidth, using only a single driving frequency that may be orders of magnitude smaller than the energy scales being probed.
READ LESS

Summary

The energy-level structure of a quantum system, which has a fundamental role in its behaviour, can be observed as discrete lines and features in absorption and emission spectra. Conventionally, spectra are measured using frequency spectroscopy, whereby the frequency of a harmonic electromagnetic driving field is tuned into resonance with a...

READ MORE

A 64 x 64-pixel CMOS test chip for the development of large-format ultra-high-speed snapshot imagers

Summary

A 64 x 64-pixel test circuit was designed and fabricated in 0.18- m CMOS technology for investigating high-speed imaging with large-format imagers. Several features are integrated into the circuit architecture to achieve fast exposure times with low-skew and jitter for simultaneous pixel snapshots. These features include an H-tree clock distribution with local and global repeaters, single-edge trigger propagation, local exposure control, and current-steering sampling circuits. To evaluate the circuit performance, test structures are periodically located throughout the 64 x 64-pixel device. Measured devices have exposure times that can be varied between 75 ps to 305 ps with skew times for all pixels less than +-3 ps and jitter that is less than +-1.2 ps rms. Other performance characteristics are a readout noise of approximately 115 e- rms and an upper dynamic range of 310,000 e-.
READ LESS

Summary

A 64 x 64-pixel test circuit was designed and fabricated in 0.18- m CMOS technology for investigating high-speed imaging with large-format imagers. Several features are integrated into the circuit architecture to achieve fast exposure times with low-skew and jitter for simultaneous pixel snapshots. These features include an H-tree clock distribution...

READ MORE