Publications
Retrieval and browsing of spoken content
Summary
Summary
Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range...
Elementary surveillance (ELS) and enhanced surveillance (EHS) validation via Mode S secondary radar surveillance
Summary
Summary
Several applications of the Mode S data link are currently being implemented and equipage requirements have been issued in countries around the world. Elementary surveillance (ELS) and enhanced surveillance (EHS) applications have been mandated in Europe with full equipage of all aircraft in the airspace required by 2009. Exemptions to...
Effect of carrier lifetime on forward-biased silicon Mach-Zehnder modulators
Summary
Summary
We present a systematic study of Mach-Zehnder silicon optical modulators based on carrier-injection. Detailed comparisons between modeling and measurement results are made with good agreement obtained for both DC and AC characteristics. A figure of merit, static VpiL, as low as 0.24Vmm is achieved. The effect of carrier lifetime variation...
Organometallic vapor phase epitaxy of relaxed InPAs/InP as multiplication layers for avalanche photodiodes
Summary
Summary
InP1-yAsy epitaxial layers grown lattice-mismatched (LMM) on InP substrates were investigated as a new materials system for multiplication layers in Geiger-mode avalanche photodiodes (GM APDs) for detection of photons in the range 1.6-2.5 mm. LMM InP1-yAsy epilayers were grown on semi-insulating (1 0 0) InP substrates misoriented 0.2 and 2...
Adaptive short-time analysis-synthesis for speech enhancement
Summary
Summary
In this paper we propose a multiresolution short-time analysis method for speech enhancement. It is well known that fixed resolution methods such as the traditional short-time Fourier transform do not generally match the time-frequency structure of the signal being analyzed resulting in poor estimates of the speech and noise spectra...
A covariance kernel for SVM language recognition
Summary
Summary
Discriminative training for language recognition has been a key tool for improving system performance. In addition, recognition directly from shifted-delta cepstral features has proven effective. A recent successful example of this paradigm is SVM-based discrimination of languages based on GMM mean supervectors (GSVs). GSVs are created through MAP adaptation of...
A multi-class MLLR kernel for SVM speaker recognition
Summary
Summary
Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a...
Exploiting temporal change in pitch in formant estimation
Summary
Summary
This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological modeling studies implicating the use of temporal changes in speech by humans. Specifically, we develop and assess...
Language recognition with discriminative keyword selection
Summary
Summary
One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and...
Multisensor very low bit rate speech coding using segment quantization
Summary
Summary
We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and crosschannel noise cancellation. One coder uses a 600 bps...