Publications

Refine Results

(Filters Applied) Clear All

Towards reduced false-alarms using cohorts

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 4512-4515.

Summary

The focus of the 2010 NIST Speaker Recognition Evaluation (SRE) was the low false alarm regime of the detection error trade-off (DET) curve. This paper presents several approaches that specifically target this issue. It begins by highlighting the main problem with operating in the low-false alarm regime. Two sets of methods to tackle this issue are presented that require a large and diverse impostor set: the first set penalizes trials whose enrollment and test utterances are not nearest neighbors of each other while the second takes an adaptive score normalization approach similar to TopNorm and ATNorm.
READ LESS

Summary

The focus of the 2010 NIST Speaker Recognition Evaluation (SRE) was the low false alarm regime of the detection error trade-off (DET) curve. This paper presents several approaches that specifically target this issue. It begins by highlighting the main problem with operating in the low-false alarm regime. Two sets of...

READ MORE

USSS-MITLL 2010 human assisted speaker recognition

Summary

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from USSS casework. The USSS-MIT/LL 2010 HASR results are presented. We also present post-evaluation results. The results are encouraging within the resolving power of the evaluation, which was limited to enable reasonable levels of human effort. Future ideas and efforts are discussed, including new features and capitalizing on naive listeners.
READ LESS

Summary

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from...

READ MORE

Graph-embedding for speaker recognition

Published in:
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, 26-30 September 2010, pp. 2742-2745.

Summary

Popular methods for speaker classification perform speaker comparison in a high-dimensional space, however, recent work has shown that most of the speaker variability is captured by a low-dimensional subspace of that space. In this paper we examine whether additional structure in terms of nonlinear manifolds exist within the high-dimensional space. We will use graph embedding as a proxy to the manifold and show the use of the embedding in data visualization and exploration. ISOMAP will be used to explore the existence and dimension of the space. We also examine whether the manifold assumption can help in two classification tasks: data-mining and standard NIST speaker recognition evaluations (SRE). Our results show that the data lives on a manifold and that exploiting this structure can yield significant improvements on the data-mining task. The improvement in preliminary experiments on all trials of the NIST SRE Eval-06 core task are less but significant.
READ LESS

Summary

Popular methods for speaker classification perform speaker comparison in a high-dimensional space, however, recent work has shown that most of the speaker variability is captured by a low-dimensional subspace of that space. In this paper we examine whether additional structure in terms of nonlinear manifolds exist within the high-dimensional space...

READ MORE

Simple and efficient speaker comparison using approximate KL divergence

Published in:
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, 26-30 September 2010, pp. 362-365.

Summary

We describe a simple, novel, and efficient system for speaker comparison with two main components. First, the system uses a new approximate KL divergence distance extending earlier GMM parameter vector SVM kernels. The approximate distance incorporates data-dependent mixture weights as well as the standard MAP-adapted GMM mean parameters. Second, the system applies a weighted nuisance projection method for channel compensation. A simple eigenvector method of training is presented. The resulting speaker comparison system is straightforward to implement and is computationally simple? only two low-rank matrix multiplies and an inner product are needed for comparison of two GMM parameter vectors. We demonstrate the approach on a NIST 2008 speaker recognition evaluation task. We provide insight into what methods, parameters, and features are critical for good performance.
READ LESS

Summary

We describe a simple, novel, and efficient system for speaker comparison with two main components. First, the system uses a new approximate KL divergence distance extending earlier GMM parameter vector SVM kernels. The approximate distance incorporates data-dependent mixture weights as well as the standard MAP-adapted GMM mean parameters. Second, the...

READ MORE

Transcript-dependent speaker recognition using mixer 1 and 2

Published in:
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, 26-30 September 2010, pp. 2102-2015.

Summary

Transcript-dependent speaker-recognition experiments are performed with the Mixer 1 and 2 read-transcription corpus using the Lincoln Laboratory speaker recognition system. Our analysis shows how widely speaker-recognition performance can vary on transcript-dependent data compared to conversational data of the same durations, given enrollment data from the same spontaneous conversational speech. A description of the techniques used to deal with the unaudited data in order to create 171 male and 198 female text-dependent experiments from the Mixer 1 and 2 read transcription corpus is given.
READ LESS

Summary

Transcript-dependent speaker-recognition experiments are performed with the Mixer 1 and 2 read-transcription corpus using the Lincoln Laboratory speaker recognition system. Our analysis shows how widely speaker-recognition performance can vary on transcript-dependent data compared to conversational data of the same durations, given enrollment data from the same spontaneous conversational speech. A...

READ MORE

Weighted nuisance attribute projection

Published in:
Odyssey 2010, the Speaker and Language Recognition Workshop, 28 June - 1 July 2010.

Summary

Nuisance attribute projection (NAP) has become a common method for compensation of channel effects, session variation, speaker variation, and general mismatch in speaker recognition. NAP uses an orthogonal projection to remove a nuisance subspace from a larger expansion space that contains the speaker information. Training the NAP subspace is based on optimizing pairwise distances to reduce intraspeaker variability and retain interspeaker variability. In this paper, we introduce a novel form of NAP called weighted NAP (WNAP) which significantly extends the current methodology. For WNAP, we propose a training criterion that incorporates two critical extensions to NAP variable metrics and instance-weighted training. Both an eigenvector and iterative method are proposed for solving the resulting optimization problem. The effectiveness of WNAP is shown on a NIST speaker recognition evaluation task where error rates are reduced by over 20%.
READ LESS

Summary

Nuisance attribute projection (NAP) has become a common method for compensation of channel effects, session variation, speaker variation, and general mismatch in speaker recognition. NAP uses an orthogonal projection to remove a nuisance subspace from a larger expansion space that contains the speaker information. Training the NAP subspace is based...

READ MORE

Speaker comparison with inner product discriminant functions

Published in:
Neural Information Processing Symp., 7 December 2009.

Summary

Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications - speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison - support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require significantly less computation than current techniques.
READ LESS

Summary

Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications - speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech...

READ MORE

A framework for discriminative SVM/GMM systems for language recognition

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

Language recognition with support vector machines and shifted-delta cepstral features has been an excellent performer in NIST-sponsored language evaluation for many years. A novel improvement of this method has been the introduction of hybrid SVM/GMM systems. These systems use GMM supervectors as an SVM expansion for classification. In prior work, methods for scoring SVM/GMM systems have been introduced based upon either standard SVM scoring or GMM scoring with a pushed model. Although prior work showed experimentally that GMM scoring yielded better results, no framework was available to explain the connection between SVM scoring and GMM scoring. In this paper, we show that there are interesting connections between SVM scoring and GMM scoring. We provide a framework both theoretically and experimentally that connects the two scoring techniques. This connection should provide the basis for further research in SVM discriminative training for GMM models.
READ LESS

Summary

Language recognition with support vector machines and shifted-delta cepstral features has been an excellent performer in NIST-sponsored language evaluation for many years. A novel improvement of this method has been the introduction of hybrid SVM/GMM systems. These systems use GMM supervectors as an SVM expansion for classification. In prior work...

READ MORE

The MIT Lincoln Laboratory 2008 speaker recognition system

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker verification systems. In this paper we present the MIT Lincoln Laboratory Speaker Recognition system applied to the task in the NIST 2008 SRE. Our approach during the evaluation was two-fold: 1) Utilize recent advances in variational nuisance modeling (latent factor analysis and nuisance attribute projection) to allow our spectral speaker verification systems to better compensate for the channel variation introduced, and 2) fuse systems targeting the different linguistic tiers of information, high and low. The performance of the system is presented when applied on a NIST 2008 SRE task. Post evaluation analysis is conducted on the sub-task when interview microphones are present.
READ LESS

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker...

READ MORE

Variability compensated support vector machines applied to speaker verification

Published in:
INTERSPEECH 2009, Proc. of the 10th Annual Conf. of the Internatinoal Speech Communication Association, 6-9 September 2009, pp. 1555-1558.

Summary

Speaker verification using SVMs has proven successful, specifically using the GSV Kernel [1] with nuisance attribute projection (NAP) [2]. Also, the recent popularity and success of joint factor analysis [3] has led to promising attempts to use speaker factors directly as SVM features [4]. NAP projection and the use of speaker factors with SVMs are methods of handling variability in SVM speaker verification: NAP by removing undesirable nuisance variability, and using the speaker factors by forcing the discrimination to be performed based on inter-speaker variability. These successes have led us to propose a new method we call variability compensated SVM (VCSVM) to handle both inter and intra-speaker variability directly in the SVM optimization. This is done by adding a regularized penalty to the optimization that biases the normal to the hyperplane to be orthogonal to the nuisance subspace or alternatively to the complement of the subspace containing the inter-speaker variability. This bias will attempt to ensure that inter-speaker variability is used in the recognition while intra-speaker variability is ignored. In this paper we present the theory and promising results on nuisance compensation.
READ LESS

Summary

Speaker verification using SVMs has proven successful, specifically using the GSV Kernel [1] with nuisance attribute projection (NAP) [2]. Also, the recent popularity and success of joint factor analysis [3] has led to promising attempts to use speaker factors directly as SVM features [4]. NAP projection and the use of...

READ MORE