Summary
As ever greater numbers of telephone transactions are being conducted solely between a caller and an automated answering system, the need increases for software which can automatically identify and authenticate these callers without the need for an onerous speaker enrollment process. In this paper we introduce and investigate a novel speaker detection and tracking (SDT) technique, which dynamically merges the traditional enrollment and recognition phases of the static speaker recognition task. In this speaker recognition application, no prior speaker models exist and the goal is to detect and model new speakers as they call into the system while also recognizing utterances from the previously modeled callers. New speakers are added to the enrolled set of speakers and speech from speakers in the currently enrolled set is used to update models. We describe a system based on a GMM speaker identification (SID) system and develop a new measure to evaluate the performance of the system on the SDT task. Results for both static, open-set detection and the SDT task are presented using a portion of the Switchboard corpus of telephone speech communications. Static open-set detection produces an equal error rate of about 5%. As expected, performance for SDT is quite varied, depending greatly on the speaker set and ordering of the test sequence. These initial results, however, are quite promising and point to potential areas in which to improve the system performance.