Publications
Speech recognition by humans and machines under conditions with severe channel variability and noise
Summary
Summary
Despite dramatic recent advances in speech recognition technology, speech recognizers still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered...
AM-FM separation using auditory-motivated filters
Summary
Summary
An approach to the joint estimation of sine-wave amplitude modulation (AM) and frequency modulation (FM) is described based on the transduction of frequency modulation into amplitude modulation by linear filters, being motivated by the hypothesis that the auditory system uses a similar transduction mechanism in measuring sine-wave FM. An AM-FM...
Automated English-Korean translation for enhanced coalition communications
Summary
Summary
This article describes our progress on automated, two-way English-Korean translation of text and speech for enhanced military coalition communications. Our goal is to improve multilingual communications by producing accurate translations across a number of languages. Therefore, we have chosen an interlingua-based approach to machine translation that readily extends to multiple...
Automatic English-to-Korean text translation of telegraphic messages in a limited domain
Summary
Summary
This paper describes our work-in-progress in automatic English-to-Korean text; translation. This work is an initial step toward the ultimate goal of text and speech translation for enhanced multilingual and multinational operations. For this purpose, we have adopted an interlingual approach with natural language understanding (TINA) and generation (GENESIS) modules at...
Improving wordspotting performance with artificially generated data
Summary
Summary
Lack of training data is a major problem that limits the performance of speech recognizers. Performance can often only be improved by expensive collection of data from many different talkers. This paper demonstrates that artificially transformed speech can increase the variability of training data and increase the performance of a...
Automatic dialect identification of extemporaneous, conversational, Latin American Spanish Speech
Summary
Summary
A dialect identification technique is described that takes as input extemporaneous, conversational speech spoken in Latin American Spanish and produces as output a hypothesis of the dialect. The system has been trained to recognize Cuban and Peruvian dialects of Spanish, but could be extended easily to other dialects (and languages)...
Fine structure features for speaker identification
Summary
Summary
The performance of speaker identification (SID) systems can be improved by the addition of the rapidly varying "fine structure" features of formant amplitude and/or frequency modulation and multiple excitation pulses. This paper shows how the estimation of such fine structure features can be improved further by obtaining better estimates of...
Low rate coding of the spectral envelope using channel gains
Summary
Summary
A dual rate embedded sinusoidal transform coder is described in which a core 14th order allpole coder operating at 2400 b/s is augmented with a set of channel gain residuals in order to operate at the higher 4800 b/s rate. The channel gains are a set of non-uniformly spaced samples...
The effects of handset variability on speaker recognition performance: experiments on the switchboard corpus
Summary
Summary
This paper presents an empirical study of the effects of handset variability on text-independent speaker recognition performance using the Switchboard corpus. Handset variability occurs when training speech is collected using one type of handset, but a different handset is used for collecting test speech. For the Switchboard corpus, the calling...
Unsupervised topic clustering of switchboard speech messages
Summary
Summary
This paper presents a statistical technique which can be used to automatically group speech data records based on the similarity of their content. A tree-based clustering algorithm is used to generate a hierarchical structure for the corpus. This structure can then be used to guide the search for similar material...