Summary
Lack of training data is a major problem that limits the performance of speech recognizers. Performance can often only be improved by expensive collection of data from many different talkers. This paper demonstrates that artificially transformed speech can increase the variability of training data and increase the performance of a wordspotter without additional expensive data collection. This approach was shown to be effective on a high-performance whole-word wordspotter on the Switchboard Credit Card database. The proposed approach used in combination with a discriminative training approach increased the Figure of Merit of the wordspotting system by 9.4% percentage points (62.5% to 71.9%). The increase in performance provided by artificially transforming speech was roughly equivalent to the increase that would have been provided by doubling the amount of training data. The performance of the wordspotter was also compared to that of human listeners who were able to achieve lower error rates because of improved consonant recognition.