Publications

Refine Results

(Filters Applied) Clear All

Retrieval and browsing of spoken content

Published in:
IEEE Signal Process. Mag., Vol. 25, No. 3, May 2008, pp. 39-49.

Summary

Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on one's own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures. A variety of personal and commercial uses also exist. As data availability increases, the lack of adequate technology for processing spoken documents becomes the limiting factor to large-scale access to spoken content. In this article, we strive to discuss the technical issues involved in the development of information retrieval systems for spoken audio documents, concentrating on the issue of handling the errorful or incomplete output provided by ASR systems. We focus on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.
READ LESS

Summary

Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range...

READ MORE

Elementary surveillance (ELS) and enhanced surveillance (EHS) validation via Mode S secondary radar surveillance

Published in:
MIT Lincoln Laboratory Report ATC-337

Summary

Several applications of the Mode S data link are currently being implemented and equipage requirements have been issued in countries around the world. Elementary surveillance (ELS) and enhanced surveillance (EHS) applications have been mandated in Europe with full equipage of all aircraft in the airspace required by 2009. Exemptions to the ELS requirement include aircraft that will be out of service by 31 December 2009, and aircraft undergoing flight-testing, delivery, or transit into or out of maintenance bases. Transport type aircraft (defined as having a maximum take-off weight in excess of 250 knots) are to be equipped to support ELS and EHS. Exemptions to the requirements for EHS include those listed above for ELS and: a- fighter and training aircraft; b- rotary-wing aircraft; c- existing/older transport type aircraft undergoing avionics upgrades which will then support ELS/EHS; and d- aircraft types granted special exemptions (e.g., B1-B, B2-A, and B-52H bombers). [not complete]
READ LESS

Summary

Several applications of the Mode S data link are currently being implemented and equipage requirements have been issued in countries around the world. Elementary surveillance (ELS) and enhanced surveillance (EHS) applications have been mandated in Europe with full equipage of all aircraft in the airspace required by 2009. Exemptions to...

READ MORE

Effect of carrier lifetime on forward-biased silicon Mach-Zehnder modulators

Summary

We present a systematic study of Mach-Zehnder silicon optical modulators based on carrier-injection. Detailed comparisons between modeling and measurement results are made with good agreement obtained for both DC and AC characteristics. A figure of merit, static VpiL, as low as 0.24Vmm is achieved. The effect of carrier lifetime variation with doping concentration is explored and found to be important for the modulator characteristics.
READ LESS

Summary

We present a systematic study of Mach-Zehnder silicon optical modulators based on carrier-injection. Detailed comparisons between modeling and measurement results are made with good agreement obtained for both DC and AC characteristics. A figure of merit, static VpiL, as low as 0.24Vmm is achieved. The effect of carrier lifetime variation...

READ MORE

Organometallic vapor phase epitaxy of relaxed InPAs/InP as multiplication layers for avalanche photodiodes

Published in:
J. Cryst. Growth, Vol. 310, No. 7-9, April 2008, pp. 1583-1589 (Proc. 13th Int. Conf. on Crystal Growth, in conjunction with Int. Conf. on Vapor Growth and Epitaxy and US Biennial Workshop on Organometallic Vapor Phase Epitaxy, 12-17 August 2007).
Topic:

Summary

InP1-yAsy epitaxial layers grown lattice-mismatched (LMM) on InP substrates were investigated as a new materials system for multiplication layers in Geiger-mode avalanche photodiodes (GM APDs) for detection of photons in the range 1.6-2.5 mm. LMM InP1-yAsy epilayers were grown on semi-insulating (1 0 0) InP substrates misoriented 0.2 and 2 [1 1 0] by organometallic vapor phase epitaxy at a growth temperature of 580 1C. The growth scheme used for the InP1-yAsy buffer layer was optimized based on surface step structure and X-ray diffraction. It was found that step-flow growth is a minimum criterion for obtaining good material quality. A narrower XRD full-width at half-maximum values were measured for 21-miscut substrates compared to 0.21-miscut substrates. A highquality buffer was obtained by step-grading the InP1-yAsy composition in increments of y = 0.05 over a layer thickness of 0.5 mm to a final y = 0.25. The device performance of LMM GM APDs was compared to that of measured more traditional lattice-matched GaSbbased devices. At 77 K, dark count rates of LMM devices are ~50 kHz at 5V overbias, and are comparable to GaSb-based p-i-n diodes operated in Geiger mode, while reset times of 0.02 ms are approximately 3 orders of magnitude lower than GaSb-based GM APDs.
READ LESS

Summary

InP1-yAsy epitaxial layers grown lattice-mismatched (LMM) on InP substrates were investigated as a new materials system for multiplication layers in Geiger-mode avalanche photodiodes (GM APDs) for detection of photons in the range 1.6-2.5 mm. LMM InP1-yAsy epilayers were grown on semi-insulating (1 0 0) InP substrates misoriented 0.2 and 2...

READ MORE

Adaptive short-time analysis-synthesis for speech enhancement

Published in:
2008 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 31 March - 4 April 2008.

Summary

In this paper we propose a multiresolution short-time analysis method for speech enhancement. It is well known that fixed resolution methods such as the traditional short-time Fourier transform do not generally match the time-frequency structure of the signal being analyzed resulting in poor estimates of the speech and noise spectra required for enhancement. This can lead to the reduction of quality in the enhanced signal through the introduction of artifacts such as musical noise. To counter these limitations, we propose an adaptive short-time analysis-synthesis scheme for speech enhancement in which the adaptation is based on a measure of local time-frequency concentration. Synthesis is made possible through a modified overlap-add procedure. Empirical results using voiced speech indicate a clear improvement over a fixed time-frequency resolution enhancement scheme both in terms of mean-square error and as indicated by informal listening tests.
READ LESS

Summary

In this paper we propose a multiresolution short-time analysis method for speech enhancement. It is well known that fixed resolution methods such as the traditional short-time Fourier transform do not generally match the time-frequency structure of the signal being analyzed resulting in poor estimates of the speech and noise spectra...

READ MORE

A covariance kernel for SVM language recognition

Published in:
ICASSP 2008, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 31 March - 4 April 2008, pp. 4141-4144.

Summary

Discriminative training for language recognition has been a key tool for improving system performance. In addition, recognition directly from shifted-delta cepstral features has proven effective. A recent successful example of this paradigm is SVM-based discrimination of languages based on GMM mean supervectors (GSVs). GSVs are created through MAP adaptation of a universal background model (UBM) GMM. This work proposes a novel extension to this idea by extending the supervector framework to the covariances of the UBM. We demonstrate a new SVM kernel including this covariance structure. In addition, we propose a method for pushing SVM model parameters back to GMM models. These GMM models can be used as an alternate form of scoring. The new approach is demonstrated on a fourteen language task with substantial performance improvements over prior techniques.
READ LESS

Summary

Discriminative training for language recognition has been a key tool for improving system performance. In addition, recognition directly from shifted-delta cepstral features has proven effective. A recent successful example of this paradigm is SVM-based discrimination of languages based on GMM mean supervectors (GSVs). GSVs are created through MAP adaptation of...

READ MORE

A multi-class MLLR kernel for SVM speaker recognition

Published in:
Proc. IEEE Int. Connf. on Acoustics, Speech and Signal Processing, ICASSP, 31 March - 4 April 2008, pp. 4117-4120.

Summary

Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a Gaussian mixture model (GMM), and maximum likelihood linear regression (MLLR) adaptation is used to adapt the means of the UBM. Recent work has examined this setup for the case where a global MLLR transform is applied to all the mixture components of the GMM UBM. This work produced positive results that warrant examining this setup with multi-class MLLR adaptation, which groups the UBM mixture components into classes and applies a different transform to each class. This paper extends the MLLR/GMM framework to the multiclass case. Experiments on the NIST SRE 2006 corpus show that multi-class MLLR improves on global MLLR and that the proposed system?s performance is comparable with state of the art systems.
READ LESS

Summary

Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a...

READ MORE

Exploiting temporal change in pitch in formant estimation

Published in:
Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Processes, ICASSP, 31 March - 4 April 2008, pp. 3929-3932.

Summary

This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological modeling studies implicating the use of temporal changes in speech by humans. Specifically, we develop and assess signal processing schemes aimed at exploiting temporal change of pitch as a basis for formant estimation. Our methods are cast in a generalized framework of two-dimensional processing of speech and show quantitative improvements under certain conditions over representations derived from traditional and homomorphic linear prediction. We conclude by highlighting potential benefits of our framework in the particular application of speaker recognition with preliminary results indicating a performance gender-gap closure on subsets of the TIMIT corpus.
READ LESS

Summary

This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological modeling studies implicating the use of temporal changes in speech by humans. Specifically, we develop and assess...

READ MORE

Language recognition with discriminative keyword selection

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 31 March - 4 April 2008, pp. 4145-4148.

Summary

One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the classification. One problem with these approaches is that the number of N-grams grows exponentially as the order N is increased. This is especially problematic for an SVM classifier as each utterance is represented as a distinct N-gram vector. In this paper we propose a novel approach for modeling higher order Ngrams using an SVM via an alternating filter-wrapper feature selection method. We demonstrate the effectiveness of this technique on the NIST 2007 language recognition task.
READ LESS

Summary

One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and...

READ MORE

Multisensor very low bit rate speech coding using segment quantization

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 31 March - 4 April 2008, pp. 3997-4000.

Summary

We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and crosschannel noise cancellation. One coder uses a 600 bps scalable phonetic vocoder, with a phonetic speech recognizer followed by joint predictive vector quantization of the error in wideband MELP parameters. The second coder operates at 300 bps with fixed 80 ms segments, using novel variable-rate multistage matrix quantization techniques. Formal test results show that both coders achieve equivalent intelligibility to the 2.4 kbps NATO standard MELPe coder in harsh acoustic noise environments, at much lower bit rates, with only modest quality loss.
READ LESS

Summary

We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and crosschannel noise cancellation. One coder uses a 600 bps...

READ MORE