Summary
Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data - e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically rely upon either phonetic or spectral characteristics of the input. A challenge with spectral system, such as those based on shifted-delta cepstral coefficients, is that they achieve good performance but do not provide insight into distinctive dialect features. In this work, a novel method based upon discriminative training and phone N- grams is proposed. This approach achieves excellent classification performance, fuses well with other systems, and has interpretable dialect characteristics in the phonetic tier. The method is demonstrated on data from the LDC and prior NIST language recognition evaluations. The method is also combined with spectral methods to demonstrate state-of-the-art performance in dialect recognition.