Summary
An important concern in speaker recognition is the performance degradation that occurs when speaker models trained with speech from one type of channel are subsequently used to score speech from another type of channel, known as channel mismatch. This paper investigates the relative performance of two different spectral subtraction methods for additive noise compensation in the context of speaker verification. The first method, termed "soft" spectral subtraction, is performed in the spectral domain on the |DFT|^2 values of the speech frames while the second method, termed "hard" spectral subtraction, is performed on the Mel-filter energy features. It is shown through both an analytical argument as well as a simulation that soft spectral subtraction results in a higher signal-to-noise ratio in the resulting Mel-filter energy features. In the context of Gaussian mixture model-based speaker verification with additive noise in testing utterances, this is shown to result in an equal error rate improvement over a system without spectral subtraction of approximately 7% in absolute terms, 21% in relative terms, over an additive white Gaussian noise range of 5-25 dB.