Summary
Recently, high-level features such as word idiolect, pronunciation, phone usage, prosody, etc., have been successfully used in speaker verification. The benefit of these features was demonstrated in the NIST extended data task for speaker verification; with enough conversational data, a recognition system can become familiar with a speaker and achieve excellent accuracy. Typically, high-level-feature recognition systems produce a sequence of symbols from the acoustic signal and then perform recognition using the frequency and co-occurrence of symbols. We propose the use of support vector machines for performing the speaker verification task from these symbol frequencies. Support vector machines have been applied to text classification problems with much success. A potential difficulty in applying these methods is that standard text classification methods tend to smooth frequencies which could potentially degrade speaker verification. We derive a new kernel based upon standard log likelihood ratio scoring to address limitations of text classification methods. We show that our methods achieve significant gains over standard methods for processing high-level features.