Information Systems Technology
Publication Abstract
Peskin, B., Navratil, J., Abramson, J., Jones, D., Klusacek, D., Reynolds, D. A., Xiang, B. Using Prosodic and Conversational Features for High Performance Speaker Recognition: Report from JHU WS'02 In Proc. International Conference on Acoustics, Speech, and Signal Processing in Hong Kong, China, IEEE, pp. IV: 792-795, 06-10 April 2003.
Abstract
While there has been a long tradition of research seeking to use prosodic features, especially pitch, in speaker recognition systems, results have generally been disappointing when such features are used in isolation and only modest improvements have been seen when used in conjunction with traditional cepstral GMM systems. In contrast, we report here on work from the JHU 2002 Summer Workshop exploring a range of prosodic features, using as testbed NISTs 2001 Extended Data task. We examined a variety of modeling techniques, such as ngram models of turn-level prosodic features and simple vectors of summary statistics per conversation side scored by kth nearestneighbor classifiers. We found that purely prosodic models were able to achieve equal error rates of under 10%, and yielded significant gains when combined with more traditional systems. We also report on exploratory work on conversational features, capturing properties of the interaction across conversation sides, such as turn-taking patterns.
