Publications
Model compression for GMM based speaker recognition systems
Summary
Summary
For large-scale deployments of speaker verification systems models size can be an important issue for not only minimizing storage requirements but also reducing transfer time of models over networks. Model size is also critical for deployments to small, portable devices. In this paper we present a new model compression technique...
Measuring the readability of automatic speech-to-text transcripts
Summary
Summary
This paper reports initial results from a novel psycholinguistic study that measures the readability of several types of speech transcripts. We define a four-part figure of merit to measure readability: accuracy of answers to comprehension questions, reaction-time for passage reading, reaction-time for question answering and a subjective rating of passage...
Combining cross-stream and time dimensions in phonetic speaker recognition
Summary
Summary
Recent studies show that phonetic sequences from multiple languages can provide effective features for speaker recognition. So far, only pronunciation dynamics in the time dimension, i.e., n-gram modeling on each of the phone sequences, have been examined. In the JHU 2002 Summer Workshop, we explored modeling the statistical pronunciation dynamics...
Channel robust speaker verification via feature mapping
Summary
Summary
In speaker recognition applications, channel variability is a major cause of errors. Techniques in the feature, model and score domains have been applied to mitigate channel effects. In this paper we present a new feature mapping technique that maps feature vectors into a channel independent space. The feature mapping learns...
Conditional pronunciation modeling in speaker detection
Summary
Summary
In this paper, we present a conditional pronunciation modeling method for the speaker detection task that does not rely on acoustic vectors. Aiming at exploiting higher-level information carried by the speech signal, it uses time-aligned streams of phones and phonemes to model a speaker's specific Pronunciation. Our system uses phonemes...
The SuperSID project : exploiting high-level information for high-accuracy speaker recognition
Summary
Summary
The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have indeed produced very low error rates, they ignore other levels of information beyond low-level acoustics that convey speaker information. Recently published work has shown examples...
Using prosodic and conversational features for high-performance speaker recognition : report from JHU WS'02
Summary
Summary
While there has been a long tradition of research seeking to use prosodic features, especially pitch, in speaker recognition systems, results have generally been disappointing when such features are used in isolation and only modest improvements have been set when used in conjunction with traditional cepstral GMM systems. In contrast...
Phonetic speaker recognition with support vector machines
Summary
Summary
A recent area of significant progress in speaker recognition is the use of high level features-idiolect, phonetic relations, prosody, discourse structure, etc. A speaker not only has a distinctive acoustic sound but uses language in a characteristic manner. Large corpora of speech data available in recent years allow experimentation with...
Modeling prosodic dynamics for speaker recognition
Summary
Summary
Most current state-of-the-art automatic speaker recognition systems extract speaker-dependent features by looking at short-term spectral information. This approach ignores long-term information that can convey supra-segmental information, such as prosodics and speaking style. We propose two approaches that use the fundamental frequency and energy trajectories to capture long-term information. The first...
Approaches to language identification using Gaussian mixture models and shifted delta cepstral features
Summary
Summary
Published results indicate that automatic language identification (LID) systems that rely on multiple-language phone recognition and n-gram language modeling produce the best performance in formal LID evaluations. By contrast, Gaussian mixture model (GMM) systems, which measure acoustic characteristics, are far more efficient computationally but have tended to provide inferior levels...