Publications

Refine Results

(Filters Applied) Clear All

Preserving the character of perturbations in scaled pitch contours

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 5 March 2010, pp. 417-420.

Summary

The global and fine dynamic components of a pitch contour in voice production, as in the speaking and singing voice, are important for both the meaning and character of an utterance. In speech, for example, slow pitch inflections, rapid pitch accents, and irregular regions all comprise the pitch contour. In applications where all components of a pitch contour are stretched or compressed in the same way, as for example in time-scale modification, an unnatural scaled contour may result. In this paper, we develop a framework for scaling pitch contours, motivated by the goal of maintaining naturalness in time-scale modification of voice. Specifically, we develop a multi-band algorithm to independently modify the slow trajectory and fast perturbation components of a contour for a more natural synthesis, and we present examples where pitch contours representative of speaking and singing voice are lengthened. In the speaking voice, the frequency content of flutter or irregularity is maintained, while slow pitch inflection is simply stretched or compressed. In the singing voice, rapid vibrato is preserved while slower note-to-note variation is scaled as desired.
READ LESS

Summary

The global and fine dynamic components of a pitch contour in voice production, as in the speaking and singing voice, are important for both the meaning and character of an utterance. In speech, for example, slow pitch inflections, rapid pitch accents, and irregular regions all comprise the pitch contour. In...

READ MORE

Multi-class SVM optimization using MCE training with application to topic identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 5350-5353.

Summary

This paper presents a minimum classification error (MCE) training approach for improving the accuracy of multi-class support vector machine (SVM) classifiers. We have applied this approach to topic identification (topic ID) for human-human telephone conversations from the Fisher corpus using ASR lattice output. The new approach yields improved performance over the traditional techniques for training multi-class SVM classifiers on this task.
READ LESS

Summary

This paper presents a minimum classification error (MCE) training approach for improving the accuracy of multi-class support vector machine (SVM) classifiers. We have applied this approach to topic identification (topic ID) for human-human telephone conversations from the Fisher corpus using ASR lattice output. The new approach yields improved performance over...

READ MORE

Kalman filter based speech synthesis

Author:
Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 4618-4621.

Summary

Preliminary results are reported from a very simple speech-synthesis system based on clustered-diphone Kalman Filter based modeling of line-spectral frequency based features. Parameters were estimated using maximum-likelihood EM training, with a constraint enforced that prevented eigenvalue magnitudes in the transition matrix from exceeding 1. Frames of training data were assigned diphone unit labels by forced alignment with an HMM recognition system. The HMM cluster tree was also used for Kalman Filter unit cluster assignments. The result is a simple synthesis system that has few parameters, synthesizes intelligible speech without audible discontinuities, and that can be adapted using MLLR techniques to support synthesis of a broad panoply of speakers from a single base model with small amounts of training data. The result is interesting for embedded synthesis applications.
READ LESS

Summary

Preliminary results are reported from a very simple speech-synthesis system based on clustered-diphone Kalman Filter based modeling of line-spectral frequency based features. Parameters were estimated using maximum-likelihood EM training, with a constraint enforced that prevented eigenvalue magnitudes in the transition matrix from exceeding 1. Frames of training data were assigned...

READ MORE

The MITLL NIST LRE 2009 language recognition system

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 4994-4997.

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in that test data included narrowband segments from worldwide Voice of America broadcasts as well as conventional recorded conversational telephone speech. Results are presented for the 23-language closed-set and open-set detection tasks at the 30, 10, and 3 second durations along with a discussion of the language-pair task. On the 30 second 23-language closed set detection task, the system achieved a 1.64 average error rate.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in...

READ MORE

Toward signal processing theory for graphs and non-Euclidean data

Published in:
ICASSP 2010, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 15 March 2010, pp. 5415-5417.

Summary

Graphs are canonical examples of high-dimensional non-Euclidean data sets, and are emerging as a common data structure in many fields. While there are many algorithms to analyze such data, a signal processing theory for evaluating these techniques akin to detection and estimation in the classical Euclidean setting remains to be developed. In this paper we show the conceptual advantages gained by formulating graph analysis problems in a signal processing framework by way of a practical example: detection of a subgraph embedded in a background graph. We describe an approach based on detection theory and provide empirical results indicating that the test statistic proposed has reasonable power to detect dense subgraphs in large random graphs.
READ LESS

Summary

Graphs are canonical examples of high-dimensional non-Euclidean data sets, and are emerging as a common data structure in many fields. While there are many algorithms to analyze such data, a signal processing theory for evaluating these techniques akin to detection and estimation in the classical Euclidean setting remains to be...

READ MORE

A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 5014-5017.

Summary

We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by 7.5% (relative). Our proposed dialect discriminating biphone system achieves similar performance to a baseline all-biphone system despite using 25% fewer biphone models. In addition, our system complements PRLM (Phone Recognition followed by Language Modeling), verified by obtaining relative gains of 15-29% when fused with PRLM. Our work is an encouraging first step towards a linguistically-informative dialect recognition system, with potential applications in forensic phonetics, accent training, and language learning.
READ LESS

Summary

We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by...

READ MORE

NextGen Weather Processor architecture study

Published in:
MIT Lincoln Laboratory Report ATC-361

Summary

The long-term objectives for the NextGen Weather Processor (NWP) include consolidation of today's multiple weather systems, incorporation of recent and emerging Federal Aviation Administration (FAA) infrastructure (Federal Telecommunications Infrastructure (FTI), System Wide Information Management (SWIM), NextGen Network-Enabled Weather (NNEW)), leveraging National Oceanic and Atmospheric Administraiton (NOAA) and/or commercial weather resources, and providing a solid development and runn-time platform for advanced aviation weather capabilities. These objectives are to be achieved in a staged fashion, ideally with new components coming on-line in time to replace existing capabilities prior to their end-of-life dates. As part of NWP Segment 1, a number of alternative implementations for the NWP as it might exist in the 2013 time frame have been proposed. This report examines the alternatives form a top-down technical perspective, assessing how well each maps to a high-level NWP architecture consistent with the long-term NextGen information sharing vision. Tehcnical challenges and opportunities for weather product improvements associated with each alternative are discussed. Additional alternatives consistent with the high-level NWP architecture, as well as a number of suggested follow-on analysis efforts are also presented.
READ LESS

Summary

The long-term objectives for the NextGen Weather Processor (NWP) include consolidation of today's multiple weather systems, incorporation of recent and emerging Federal Aviation Administration (FAA) infrastructure (Federal Telecommunications Infrastructure (FTI), System Wide Information Management (SWIM), NextGen Network-Enabled Weather (NNEW)), leveraging National Oceanic and Atmospheric Administraiton (NOAA) and/or commercial weather resources...

READ MORE

Airspace encounter models for estimating collision risk

Published in:
J. Guidance, Control, and Dynamics, Vol. 33, No. 2, March-April 2010, pp. 487-499.

Summary

Airspace encounter models, providing a statistical representation of geometries and aircraft behavior during a close encounter, are required to estimate the safety and robustness of collision avoidance systems. Prior encounter models, developed to certify the Traffic Alert and Collision Avoidance System, have been limited in their ability to capture important characteristics of encounters as revealed by recorded surveillance data, do not capture the current mix of aircraft types or noncooperative aircraft, and do not represent more recent airspace procedures. This paper describes a methodology for encounter model construction based on a Bayesian statistical framework connected to an extensive set of national radar data. In addition, this paper provides examples of using several such high-fidelity models to evaluate the safety of collision avoidance systems for manned and unmanned aircraft.
READ LESS

Summary

Airspace encounter models, providing a statistical representation of geometries and aircraft behavior during a close encounter, are required to estimate the safety and robustness of collision avoidance systems. Prior encounter models, developed to certify the Traffic Alert and Collision Avoidance System, have been limited in their ability to capture important...

READ MORE

FDSOI process technology for subthreshold-operation ultralow-power electronics

Published in:
Proc. of the IEEE, Vol. 98, No. 2, February 2010, pp. 333-342.
Topic:

Summary

Ultralow-power electronics will expand the technological capability of handheld and wireless devices by dramatically improving battery life and portability. In addition to innovative low-power design techniques, a complementary process technology is required to enable the highest performance devices possible while maintaining extremely low power consumption. Transistors optimized for subthreshold operation at 0.3 V may achieve a 97% reduction in switching energy compared to conventional transistors. The process technology described in this article takes advantage of the capacitance and performance benefits of thin-body silicon-oninsulator devices, combined with a workfunction engineered mid-gap metal gate.
READ LESS

Summary

Ultralow-power electronics will expand the technological capability of handheld and wireless devices by dramatically improving battery life and portability. In addition to innovative low-power design techniques, a complementary process technology is required to enable the highest performance devices possible while maintaining extremely low power consumption. Transistors optimized for subthreshold operation...

READ MORE

Model-based optimization of airborne collision avoidance logic

Summary

The Traffic Alert and Collision Avoidance System (TCAS) is designed to reduce the risk of mid-air collisions by providing resolution advisories to pilots. The current version of the collision avoidance logic was hand-crafted over the course of many years and contains many parameters that have been tuned to varying extents and heuristic rules whose justification has been lost. Further development of the TCAS system is required to make the system compatible with next generation air traffic control procedures and surveillance systems that will reduce separation between aircraft. This report presents a decision-theoretic approach to optimizing the TCAS logic using probabilistic models of aircraft behavior and a cost metric that balances the cost of alerting with the cost of collision. Such an approach ahs the potential for meeting or exceeding the current safety level while lowering the false alert rate and simplifing the process of re-optimizing the logic in response to changes in the airspace and sensor capabilities.
READ LESS

Summary

The Traffic Alert and Collision Avoidance System (TCAS) is designed to reduce the risk of mid-air collisions by providing resolution advisories to pilots. The current version of the collision avoidance logic was hand-crafted over the course of many years and contains many parameters that have been tuned to varying extents...

READ MORE