Publications
Tagged As
Magnitude-only estimation of handset nonlinearity with application to speaker recognition
Summary
Summary
A method is described for estimating telephone handset nonlinearity by matching the spectral magnitude of the distorted signal to the output of a nonlinear channel model, driven by an undistorted reference. The "magnitude-only" representation allows the model to directly match unwanted speech formants that arise over nonlinear channels and that...
Comparison of background normalization methods for text-independent speaker verification
Summary
Summary
This paper compares two approaches to background model representation for a text-independent speaker verification task using Gaussian mixture models. We compare speaker-dependent background speaker sets to the use of a universal, speaker-independent background model (UBM). For the UBM, we describe how Bayesian adaptation can be used to derive claimant speaker...
Improving wordspotting performance with artificially generated data
Summary
Summary
Lack of training data is a major problem that limits the performance of speech recognizers. Performance can often only be improved by expensive collection of data from many different talkers. This paper demonstrates that artificially transformed speech can increase the variability of training data and increase the performance of a...
Fine structure features for speaker identification
Summary
Summary
The performance of speaker identification (SID) systems can be improved by the addition of the rapidly varying "fine structure" features of formant amplitude and/or frequency modulation and multiple excitation pulses. This paper shows how the estimation of such fine structure features can be improved further by obtaining better estimates of...
The effects of handset variability on speaker recognition performance: experiments on the switchboard corpus
Summary
Summary
This paper presents an empirical study of the effects of handset variability on text-independent speaker recognition performance using the Switchboard corpus. Handset variability occurs when training speech is collected using one type of handset, but a different handset is used for collecting test speech. For the Switchboard corpus, the calling...
Unsupervised topic clustering of switchboard speech messages
Summary
Summary
This paper presents a statistical technique which can be used to automatically group speech data records based on the similarity of their content. A tree-based clustering algorithm is used to generate a hierarchical structure for the corpus. This structure can then be used to guide the search for similar material...
Measuring fine structure in speech: application to speaker identification
Summary
Summary
The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude...
The effects of telephone transmission degradations on speaker recognition performance
Summary
Summary
The two largest factors affecting automatic speaker identification performance are the size of the population an the degradations introduced by noisy communication, channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both...
Large population speaker identification using clean and telephone speech
Summary
Summary
This paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech, and telephone speech. A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TIMIT and NTIMIT databases. The TIMIT results...
Robust text-independent speaker identification using Gaussian mixture speaker models
Summary
Summary
This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identify. The focus of this work is on applications which require...