This paper, based on three presentations made in 1998 at the RLA2C Workshop in Avignon, discusses the evaluation of speaker recognition systems from several perspectives. A general discussion of the speaker recognition task and the challenges and issues involved in its evaluation is offered. The NIST evaluations in this area and specifically the 1998 evaluation, its objectives, protocols and test data, are described. The algorithms used by the systems that were developed for this evaluation are summarized, compared and contrasted. Overall performance results of this evaluation are presented by means of detection error trade-off (DET) curves. These show the performance trade-off of missed detections and false alarms for each system and the effects on performance of training condition, test segment duration, the speakers' sex and the match or mismatch of training and test handsets. Several factors that were found to have an impact on performance, including pitch frequency, handset type and noise, are discussed and DET curves showing their effects are presented. The paper concludes with some perspective on the history of this technology and where it may be going.