Publications

Refine Results

(Filters Applied) Clear All

Dynamic response of an electronically shuttered CCD imager

Published in:
IEEE. Trans. Electron Devices, Vol. 51, No. 6, June 2004, pp. 864-869.

Summary

The dynamic response of an electronically shuttered charge-coupled device (CCD) imager to nanosecond voltage pulses has been investigated. Measurements show that the shutter can be dynamically opened and closed in nanosecond times. For the shutter opening, simulations indicate that the collection of photoelectrons occurs in times much shorter than that needed to form the steady-state depletion region under the CCD well. In addition, the shutter closing occurs faster than the reconstitution of the p-buried (shutter) layer. Simulations further indicate that electric fields created in the neutral substrate by the shutter clocks enable photogenerated charge collection/rejection on nanosecond time scales despite the fact that the depletion-region formation and collapse take much longer times.
READ LESS

Summary

The dynamic response of an electronically shuttered charge-coupled device (CCD) imager to nanosecond voltage pulses has been investigated. Measurements show that the shutter can be dynamically opened and closed in nanosecond times. For the shutter opening, simulations indicate that the collection of photoelectrons occurs in times much shorter than that...

READ MORE

Channel compensation for SVM speaker recognition

Published in:
Odyssey, The Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Summary

One of the major remaining challenges to improving accuracy in state-of-the-art speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channel-adaptation techniques are known and available for adapting models between different channel conditions, but for the much more recent Support Vector Machine (SVM) based approaches to this problem, much less is known about the best way to handle this issue. In this paper we explore techniques that are specific to the SVM framework in order to derive fully non-linear channel compensations. The result is a system that is less sensitive to specific kinds of labeled channel variations observed in training.
READ LESS

Summary

One of the major remaining challenges to improving accuracy in state-of-the-art speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channel-adaptation techniques are known and available for adapting models between different channel conditions...

READ MORE

Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data

Published in:
ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Summary

Discriminatively trained support vector machines have recently been introduced as a novel approach to speaker recognition. Support vector machines (SVMs) have a distinctly different modeling strategy in the speaker recognition problem. The standard Gaussian mixture model (GMM) approach focuses on modeling the probability density of the speaker and the background (a generative approach). In contrast, the SVM models the boundary between the classes. Another interesting aspect of the SVM is that it does not directly produce probabilistic scores. This poses a challenge for combining results with a GMM. We therefore propose strategies for fusing the two approaches. We show that the SVM and GMM are complementary technologies. Recent evaluations by NIST (telephone data) and NFI/TNO (forensic data) give a unique opportunity to test the robustness and viability of fusing GMM and SVM methods. We show that fusion produces a system which can have relative error rates 23% lower than individual systems.
READ LESS

Summary

Discriminatively trained support vector machines have recently been introduced as a novel approach to speaker recognition. Support vector machines (SVMs) have a distinctly different modeling strategy in the speaker recognition problem. The standard Gaussian mixture model (GMM) approach focuses on modeling the probability density of the speaker and the background...

READ MORE

Dialect identification using Gaussian mixture models

Published in:
ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Summary

Recent results in the area of language identification have shown a significant improvement over previous systems. In this paper, we evaluate the related problem of dialect identification using one of the techniques recently developed for language identification, the Gaussian mixture models with shifted-delta-cepstral features. The system shown is developed using the same methodology followed for the language identification case. Results show that the use of the GMM techniques yields an average of 30% equal error rate for the dialects in the Miami corpus and about 13% equal error rate for the dialects in the CallFriend corpus.
READ LESS

Summary

Recent results in the area of language identification have shown a significant improvement over previous systems. In this paper, we evaluate the related problem of dialect identification using one of the techniques recently developed for language identification, the Gaussian mixture models with shifted-delta-cepstral features. The system shown is developed using...

READ MORE

Speaker diarisation for broadcast news

Published in:
Odyssey 2004, 31 May - 4 June 2004.

Summary

It is often important to be able to automatically label 'who spoke when' during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure and BIC-based stopping criterion for the CUED system is introduced which improves both performance and robustness to changes in segmentation. Finally a hybrid 'Plug and Play' system is built which combines different parts of the CUED and MIT-LL systems to produce a single system which outperforms both the individual systems.
READ LESS

Summary

It is often important to be able to automatically label 'who spoke when' during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure...

READ MORE

Language recognition with support vector machines

Published in:
ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Summary

Support vector machines (SVMs) have become a popular tool for discriminative classification. Powerful theoretical and computational tools for support vector machines have enabled significant improvements in pattern classification in several areas. An exciting area of recent application of support vector machines is in speech processing. A key aspect of applying SVMs to speech is to provide a SVM kernel which compares sequences of feature vectors--a sequence kernel. We propose the use of sequence kernels for language recognition. We apply our methods to the NIST 2003 language evaluation task. Results demonstrate the potential of the new SVM methods.
READ LESS

Summary

Support vector machines (SVMs) have become a popular tool for discriminative classification. Powerful theoretical and computational tools for support vector machines have enabled significant improvements in pattern classification in several areas. An exciting area of recent application of support vector machines is in speech processing. A key aspect of applying...

READ MORE

The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation

Summary

We describe efforts to create corpora to support and evaluate systems that meet the challenge of speaker recognition in the face of both channel and language variation. In addition to addressing ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and crosschannel dimensions. We report on specific data collection efforts at the Linguistic Data Consortium, the 2004 speaker recognition evaluation program organized by the National Institute of Standards and Technology (NIST), and the research ongoing at the US Federal Bureau of Investigation and MIT Lincoln Laboratory. We cover the design and requirements, the collections and evaluation integrating discussions of the data preparation, research, technology development and evaluation on a grand scale.
READ LESS

Summary

We describe efforts to create corpora to support and evaluate systems that meet the challenge of speaker recognition in the face of both channel and language variation. In addition to addressing ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and crosschannel dimensions. We report on...

READ MORE

The effect of text difficulty on machine translation performance -- a pilot study with ILR-related texts in Spanish, Farsi, Arabic, Russian and Korean

Published in:
4th Int. Conf. on Language Resources and Evaluation, LREC, 26-28 May 2004.

Summary

We report on initial experiments that examine the relationship between automated measures of machine translation performance (Doddington, 2003, and Papineni et al. 2001) and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several decades (Child, Clifford and Lowe 1993). The main question we ask is how technology-oriented measures of MT performance relate to the ILR difficulty levels, where we understand that a linguist with ILR proficiency level N is expected to be able to understand a document rated at level N, but to have increasing difficulty with documents at higher levels. In this paper, we find that some key aspects of MT performance track with ILR difficulty levels, primarily for MT output whose quality is good enough to be readable by human readers.
READ LESS

Summary

We report on initial experiments that examine the relationship between automated measures of machine translation performance (Doddington, 2003, and Papineni et al. 2001) and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several...

READ MORE

Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004

Published in:
Proc. Language Resource Evaluation Conf., LREC, 24-30 May 2004, pp. 587-590.

Summary

This paper discusses some of the factors that should be considered when designing a speech corpus collection to be used for text independent speaker recognition evaluation. The factors include telephone handset type, telephone transmission type, language, and (non-telephone) microphone type. The paper describes the design of the new corpus collection being undertaken by the Linguistic Data Consortium (LDC) to support the 2004 and subsequent NIST speech recognition evaluations. Some preliminary information on the resulting 2004 evaluation test set is offered.
READ LESS

Summary

This paper discusses some of the factors that should be considered when designing a speech corpus collection to be used for text independent speaker recognition evaluation. The factors include telephone handset type, telephone transmission type, language, and (non-telephone) microphone type. The paper describes the design of the new corpus collection...

READ MORE

The mixer corpus of multilingual, multichannel speaker recognition data

Published in:
Proc. Language Resource Evaluation Conf., LREC, 24-30 May 2004, pp. 627-630.

Summary

This paper describes efforts to create corpora to support and evaluate systems that perform speaker recognition where channel and language may vary. Beyond the ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and cross channel dimensions. We report on specific data collection efforts at the Linguistic Data Consortium and the research ongoing at the US Federal Bureau of Investigation and MIT Lincoln Laboratories. We cover the design and requirements, the collections and final properties of the corpus integrating discussions of the data preparation, research, technology development and evaluation on a grand scale.
READ LESS

Summary

This paper describes efforts to create corpora to support and evaluate systems that perform speaker recognition where channel and language may vary. Beyond the ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and cross channel dimensions. We report on specific data collection efforts at the...

READ MORE