Publications

Refine Results

(Filters Applied) Clear All

Improving wordspotting performance with artificially generated data

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 9 May 1996, pp. 526-9.

Summary

Lack of training data is a major problem that limits the performance of speech recognizers. Performance can often only be improved by expensive collection of data from many different talkers. This paper demonstrates that artificially transformed speech can increase the variability of training data and increase the performance of a wordspotter without additional expensive data collection. This approach was shown to be effective on a high-performance whole-word wordspotter on the Switchboard Credit Card database. The proposed approach used in combination with a discriminative training approach increased the Figure of Merit of the wordspotting system by 9.4% percentage points (62.5% to 71.9%). The increase in performance provided by artificially transforming speech was roughly equivalent to the increase that would have been provided by doubling the amount of training data. The performance of the wordspotter was also compared to that of human listeners who were able to achieve lower error rates because of improved consonant recognition.
READ LESS

Summary

Lack of training data is a major problem that limits the performance of speech recognizers. Performance can often only be improved by expensive collection of data from many different talkers. This paper demonstrates that artificially transformed speech can increase the variability of training data and increase the performance of a...

READ MORE

Automatic dialect identification of extemporaneous, conversational, Latin American Spanish Speech

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 2, ICASSP, 7-10 May 1996, pp. 777-780.

Summary

A dialect identification technique is described that takes as input extemporaneous, conversational speech spoken in Latin American Spanish and produces as output a hypothesis of the dialect. The system has been trained to recognize Cuban and Peruvian dialects of Spanish, but could be extended easily to other dialects (and languages) as well. Building on our experience in automatic language identification, the dialect-ID system uses an English phone recognizer trained on the TIMIT corpus to tokenize training speech spoken in each Spanish dialect. Phonotactic language models generated from this tokenized training speech are used during testing to compute dialect likelihoods for each unknown message. This system has an error rate of 16% on the Cuban/Peruvian two-alternative forced-choice test. We introduce the new "Miami" Latin American Spanish speech corpus that is capable of supporting our research into the future.
READ LESS

Summary

A dialect identification technique is described that takes as input extemporaneous, conversational speech spoken in Latin American Spanish and produces as output a hypothesis of the dialect. The system has been trained to recognize Cuban and Peruvian dialects of Spanish, but could be extended easily to other dialects (and languages)...

READ MORE

Fine structure features for speaker identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, Speech (Part II), 7-10 May 1996, pp. 689-692.

Summary

The performance of speaker identification (SID) systems can be improved by the addition of the rapidly varying "fine structure" features of formant amplitude and/or frequency modulation and multiple excitation pulses. This paper shows how the estimation of such fine structure features can be improved further by obtaining better estimates of formant frequency locations and uncovering various sources of error in the feature extraction systems. Most female telephone speech showed "spurious" formants, due to distortion in the telephone network. Nevertheless, SID performance was greatest with these spurious formants as formant estimates. A new feature has also been identified which can increase SID performance: cepstral coefficients from noise in the estimated excitation waveform. Finally, statistical tools have been developed to explore the relative importance of features used for SID, with the ultimate goal of uncovering the source of the features that provide SID performance improvement.
READ LESS

Summary

The performance of speaker identification (SID) systems can be improved by the addition of the rapidly varying "fine structure" features of formant amplitude and/or frequency modulation and multiple excitation pulses. This paper shows how the estimation of such fine structure features can be improved further by obtaining better estimates of...

READ MORE

Low rate coding of the spectral envelope using channel gains

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 7-10 May 1996, pp. 769-772.

Summary

A dual rate embedded sinusoidal transform coder is described in which a core 14th order allpole coder operating at 2400 b/s is augmented with a set of channel gain residuals in order to operate at the higher 4800 b/s rate. The channel gains are a set of non-uniformly spaced samples of the spline envelope and constitute a lowpass estimate of the short-time vocal tract magnitude spectrum. The channel gain residuals represent the difference between the spline envelope and the quantized 14th order allpole spectrum at the channel gain frequencies. The channel gain residuals are coded using pitch dependent scalar quantization. Informal listening indicates that the quality of the embedded coder at 4800 b/s is comparable to that of an existing high quality 4800 b/s allpole coder.
READ LESS

Summary

A dual rate embedded sinusoidal transform coder is described in which a core 14th order allpole coder operating at 2400 b/s is augmented with a set of channel gain residuals in order to operate at the higher 4800 b/s rate. The channel gains are a set of non-uniformly spaced samples...

READ MORE

The effects of handset variability on speaker recognition performance: experiments on the switchboard corpus

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, 7-10 May 1996, pp. 113-116.

Summary

This paper presents an empirical study of the effects of handset variability on text-independent speaker recognition performance using the Switchboard corpus. Handset variability occurs when training speech is collected using one type of handset, but a different handset is used for collecting test speech. For the Switchboard corpus, the calling telephone number associated with a file is used to imply the handset used. Analysis of experiments designed to focus on handset variability on the SPIDRE database and the May95 NIST speaker recognition evaluation database, show that a performance gap between matched and mismatched handset tests persists even after applying several standard channel compensation techniques. Error rates for the mismatched tests are over 4 times those for the matched tests. Lastly, a new energy dependent cepstral mean subtraction technique is proposed to compensate for nonlinear distortions, but is not found to improve performance on the databases used.
READ LESS

Summary

This paper presents an empirical study of the effects of handset variability on text-independent speaker recognition performance using the Switchboard corpus. Handset variability occurs when training speech is collected using one type of handset, but a different handset is used for collecting test speech. For the Switchboard corpus, the calling...

READ MORE

Unsupervised topic clustering of switchboard speech messages

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, 7-10 May 1996, pp. 315-318.

Summary

This paper presents a statistical technique which can be used to automatically group speech data records based on the similarity of their content. A tree-based clustering algorithm is used to generate a hierarchical structure for the corpus. This structure can then be used to guide the search for similar material in data from other corpora. The SWITCHBOARD Speech Corpus was used to demonstrate these techniques, since it provides sets of speech files which are nominally on the same topic. Excellent automatic clustering was achieved on the truth text transcripts provided with the SWITCHBOARD corpus, with an average cluster purity of 97.3%. Degraded clustering was achieved using the output transcriptions of a speech recognizer, with a clustering purity of 61.4%.
READ LESS

Summary

This paper presents a statistical technique which can be used to automatically group speech data records based on the similarity of their content. A tree-based clustering algorithm is used to generate a hierarchical structure for the corpus. This structure can then be used to guide the search for similar material...

READ MORE

ASR-9 Weather System Processor (WSP): wind shear algorithms performance assessment

Published in:
MIT Lincoln Laboratory Report ATC-247

Summary

Lincoln Laboratory has developed a prototype Airport Surveillance Radar Weather Systems Processor (ASR-WSP) that has been used for field measurements and operational demonstrations since 1987. Measurements acquired with this prototype provide an extensive data base for development and validation of the algorithms the WSP uses to generate operational wind shear information for Air Traffic Controllers. This report addresses the performance of the current versions of the WSP's microburst and gust front wind shear detection algorithms on available data from each of the WSP's operational sites. Evaluation of the associated environmental characteristics (e.g., storm structure, radar ground clutter environment) allows for generalization of results of the other major U.S. climatic regimes where the production version of WSP will be deployed.
READ LESS

Summary

Lincoln Laboratory has developed a prototype Airport Surveillance Radar Weather Systems Processor (ASR-WSP) that has been used for field measurements and operational demonstrations since 1987. Measurements acquired with this prototype provide an extensive data base for development and validation of the algorithms the WSP uses to generate operational wind shear...

READ MORE

Beacon radar and TCAS interrogation rates: airborne measurements in the 1030 MHz band

Published in:
MIT Lincoln Laboratory Report ATC-239

Summary

Airborne measurements were made of the rates of beacon-radar interrogations and suppressions in the 1030 MHz band. These measurements were undertaken in order to provide a basis for interference analysis of the proposed system of GPS-Squitter. The measurements were made during a flight along the East Coast, including New York, Philadelphia, Baltimore, and Washington. Measurements were also made at Atlanta and in the Dallas Fort Worth area. Results were given in a form that shows the rates of interrogations and suppressions as a function of time and location of the aircraft. Interrogations are also separated into those that were transmitted by ground-based interrogators and those that were transmitted by airborne TCAS equipment. Mode S interrogations were also separated from other modes. The number of TCAS aircraft in the vicinity was also measured during the flights. The results indicate that the rates of interrogations and suppressions were consistent in most respects from location to location. The rates Mode A and C interrogations from the ground were consistently less than 100 per second with two brief exceptions. Previous measurements had indicated a trend of decreasing interrogation rates with time since the early 1970's. The new measurements support this observation and indicate that the trend has continued.
READ LESS

Summary

Airborne measurements were made of the rates of beacon-radar interrogations and suppressions in the 1030 MHz band. These measurements were undertaken in order to provide a basis for interference analysis of the proposed system of GPS-Squitter. The measurements were made during a flight along the East Coast, including New York...

READ MORE

ASR-9 processor augmentation card scan-scan correlator algorithms

Published in:
MIT Lincoln Laboratory Report ATC-245

Summary

This report documents the Scan-Scan correlator algorithms for the ASR-9 Processor Augmentation Card (9-PAC) project. The 9-PAC is a processor card that serves as a processing enhancement to the existing ASR-9's post-processor system. It provides increased speed and memory capabilities to the processor, which allows for the introduction of more complex scan-scan correlator algorithms. These more complex algorithms improve the ASR-9's system performance through decreased false alarms, and increased detection of aircraft. The 9-PAC Scan-Scan correlator, also known as the Tracker, consists of three basic processing tasks: initialization, input/output, and the actual Tracker. The Tracker can be broken down further into four main processing functions: report-to track association, report-to-track correlation, track update, and track initiation.
READ LESS

Summary

This report documents the Scan-Scan correlator algorithms for the ASR-9 Processor Augmentation Card (9-PAC) project. The 9-PAC is a processor card that serves as a processing enhancement to the existing ASR-9's post-processor system. It provides increased speed and memory capabilities to the processor, which allows for the introduction of more...

READ MORE

Anomalous propagation ground clutter suppression with the Airport Surveillance Radar (ASR) Weather Systems Processor (WSP)

Published in:
MIT Lincoln Laboratory Report ATC-244

Summary

Ground-clutter breakthrough caused by anomalous propagation (AP)--ducting of the radar beam when passing through significant atmospheric temperature and/or moisture gradients--is a significant issue for air traffic controllers who use Airport Surveillance Radar (ASR) weather channel data to guide aircraft through the airport terminal area. At present, these data are often contaminated with AP, leaving the controller unsure about the validity of information on storm location and intensity. The Weather System Processor (WSP), which is scheduled for deployment at 33 airports in the U.S., includes an AP-Editing algorithm designed to remove AP based on its Doppler-spectrum characteristics in ASR-9 data. This report provides a description of the algorithm currently used in the FAA/Lincoln Laboratory WSP prototype and a measurement of the performance of the algorithm during nine episodes of AP and/or true weather in Orlando, florida in 1991 and 1992.
READ LESS

Summary

Ground-clutter breakthrough caused by anomalous propagation (AP)--ducting of the radar beam when passing through significant atmospheric temperature and/or moisture gradients--is a significant issue for air traffic controllers who use Airport Surveillance Radar (ASR) weather channel data to guide aircraft through the airport terminal area. At present, these data are often...

READ MORE