Cognitive services for the user
January 1, 2009
Software-defined cognitive radios (CRs) use voice as a primary input/output (I/O) modality and are expected to have substantial computational resources capable of supporting advanced speech- and audio-processing applications. This chapter extends previous work on speech applications (e.g., ) to cognitive services that enhance military mission capability by capitalizing on automatic processes, such as speech information extraction and understanding the environment. Such capabilities go beyond interaction with the intended user of the software-defined radio (SDR) - they extend to speech and audio applications that can be applied to information that has been extracted from voice and acoustic noise gathered from other users and entities in the environment. For example, in a military environment, situational awareness and understanding could be enhanced by informing users based on processing voice and noise from both friendly and hostile forces operating in a given battle space. This chapter provides a survey of a number of speech- and audio-processing technologies and their potential applications to CR, including: - A description of the technology and its current state of practice. - An explanation of how the technology is currently being applied, or could be applied, to CR. - Descriptions and concepts of operations for how the technology can be applied to benefit users of CRs. - A description of relevant future research directions for both the speech and audio technologies and their applications to CR. A pictorial overview of many of the core technologies with some applications presented in the following sections is shown in Figure 10.1. Also shown are some overlapping components between the technologies. For example, Gaussian mixture models (GMMs) and support vector machines (SVMs) are used in both speaker and language recognition technologies . These technologies and components are described in further detail in the following sections. Speech and concierge cognitive services and their corresponding applications are covered in the following sections. The services covered include speaker recognition, language identification (LID), text-to-speech (TTS) conversion, speech-to-text (STT) conversion, machine translation (MT), background noise suppression, speech coding, speaker characterization, noise management, noise characterization, and concierge services. These technologies and their potential applications to CR are discussed at varying levels of detail commensurate with their innovation and utility.