Publications
A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models
Summary
Summary
We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by...
High-pitch formant estimation by exploiting temporal change of pitch
Summary
Summary
This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological studies implicating the use of pitch dynamics in speech by humans. We develop and assess signal processing...
Query-by-example spoken term detection using phonetic posteriorgram templates
Summary
Summary
This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user...
Speaker comparison with inner product discriminant functions
Summary
Summary
Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications - speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech...
The MIT-LL/AFRL IWSLT-2008 MT System
Summary
Summary
This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2008 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance for both text and speech-based translation on Chinese and Arabic...
A multi-sensor compressed sensing receiver: performance bounds and simulated results
Summary
Summary
Multi-sensor receivers are commonly tasked with detecting, demodulating and geolocating target emitters over very wide frequency bands. Compressed sensing can be applied to persistently monitor a wide bandwidth, given that the received signal can be represented using a small number of coefficients in some basis. In this paper we present...
Sinewave parameter estimation using the fast Fan-Chirp Transform
Summary
Summary
Sinewave analysis/synthesis has long been an important tool for audio analysis, modification and synthesis [1]. The recently introduced Fan-Chirp Transform (FChT) [2,3] has been shown to improve the fidelity of sinewave parameter estimates for a harmonic audio signal with rapid frequency modulation [4]. A fast version of the FChT [3]...
Towards co-channel speaker separation by 2-D demodulation of spectrograms
Summary
Summary
This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps...
A log-frequency approach to the identification of the Wiener-Hammerstein model
Summary
Summary
In this paper we present a simple closed-form solution to the Wiener-Hammerstein (W-H) identification problem. The identification process occurs in the log-frequency domain where magnitudes and phases are separable. We show that the theoretically optimal W-H identification is unique up to an amplitude, phase and delay ambiguity, and that the...
2-D processing of speech for multi-pitch analysis.
Summary
Summary
This paper introduces a two-dimensional (2-D) processing approach for the analysis of multi-pitch speech sounds. Our framework invokes the short-space 2-D Fourier transform magnitude of a narrowband spectrogram, mapping harmonically related signal components to multiple concentrated entities in a new 2-D space. First, localized time-frequency regions of the spectrogram are...