Summary
The two largest factors affecting automatic speaker identification performance are the size of the population to be distinguished among and the degradations introduced by noisy communication channels (e.g. telephone transmission). To experimentally examine these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech and telephone speech. A system based on Gaussian mixture speaker models is used for speaker identification and experiments are conducted on the TIMIT and NTIMIT databases. The aims of this study are to (1) establish how well text-independent speaker identification can perform under near ideal conditions for very large populations (using the TIMIT database), (2) gauge the performance loss incurred by transmitting the speech over the telephone network (using the NTIMIT database), and (3) examine the validity of current models of telephone degradations commonly used in developing compensation techniques (using the NTIMIT calibration signals). This is believed to be the first speaker identification experiments on the complete 630 speaker TIMIT and NTIMIT databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 99.5% and 60.7% are achieved on the TIMIT and NTIMIT databases, respectively.