Publications

Refine Results

(Filters Applied) Clear All

Speaker detection and tracking for telephone transactions

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 13-17 May 2002, pp. 129-132.

Summary

As ever greater numbers of telephone transactions are being conducted solely between a caller and an automated answering system, the need increases for software which can automatically identify and authenticate these callers without the need for an onerous speaker enrollment process. In this paper we introduce and investigate a novel speaker detection and tracking (SDT) technique, which dynamically merges the traditional enrollment and recognition phases of the static speaker recognition task. In this speaker recognition application, no prior speaker models exist and the goal is to detect and model new speakers as they call into the system while also recognizing utterances from the previously modeled callers. New speakers are added to the enrolled set of speakers and speech from speakers in the currently enrolled set is used to update models. We describe a system based on a GMM speaker identification (SID) system and develop a new measure to evaluate the performance of the system on the SDT task. Results for both static, open-set detection and the SDT task are presented using a portion of the Switchboard corpus of telephone speech communications. Static open-set detection produces an equal error rate of about 5%. As expected, performance for SDT is quite varied, depending greatly on the speaker set and ordering of the test sequence. These initial results, however, are quite promising and point to potential areas in which to improve the system performance.
READ LESS

Summary

As ever greater numbers of telephone transactions are being conducted solely between a caller and an automated answering system, the need increases for software which can automatically identify and authenticate these callers without the need for an onerous speaker enrollment process. In this paper we introduce and investigate a novel...

READ MORE

Speech enhancement based on auditory spectral change

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. I, Speech Processing Neural Networks for Signal Processing, 13-17 May 2002, pp. I-257 - I-260.

Summary

In this paper, an adaptive approach to the enhancement of speech signals is developed based on auditory spectral change. The algorithm is motivated by sensitivity of aural biologic systems to signal dynamics, by evidence that noise is aurally masked by rapid changes in a signal, and by analogies to these two aural phenomena in biologic visual processing. Emphasis is on preserving nonstationarity, i.e., speech transient and time-varying components, such as plosive bursts, formant transitions, and vowel onsets, while suppressing additive noise. The essence of the enhancement technique is a Wiener filter that uses a desired signal spectrum whose estimation adapts to stationarity of the measured signal. The degree of stationarity is derived from a signal change measurement, based on an auditory spectrum that accentuates change in spectral bands. The adaptive filter is applied in an unconventional overlap-add analysis/synthesis framework, using a very short 4-ms analysis window and a 1-ms frame interval. In informal listening, the reconstructions are judged to be "crisp" corresponding to good temporal resolution of transient and rapidly-moving speech events.
READ LESS

Summary

In this paper, an adaptive approach to the enhancement of speech signals is developed based on auditory spectral change. The algorithm is motivated by sensitivity of aural biologic systems to signal dynamics, by evidence that noise is aurally masked by rapid changes in a signal, and by analogies to these...

READ MORE

Automated generation and analysis of attack graphs

Published in:
Proc. of the 2002 IEEE Symp. on Security and Privacy, 12-15 May 2002, pp. 254-265.

Summary

An integral part of modeling the global view of network security is constructing attack graphs. In practice, attack graphs are produced manually by Red Teams. Construction by hand, however, is tedious, error-prone, and impractical for attack graphs have larger than a hundred nodes. In this paper we present an automated technique for generating and analyzing attack graphs. We base our technique on symbolic model checking algorithms, letting us construct attack graphs automatically and efficiently. We also describe two analyses to help decide which attacks would be most cost-effective to guard against. We implemented our techniques in a tool suite and tested it on a small network example, which includes models of a firewall and an intrusion detection system.
READ LESS

Summary

An integral part of modeling the global view of network security is constructing attack graphs. In practice, attack graphs are produced manually by Red Teams. Construction by hand, however, is tedious, error-prone, and impractical for attack graphs have larger than a hundred nodes. In this paper we present an automated...

READ MORE

Speech-to-speech translation: technology and applications study

Published in:
MIT Lincoln Laboratory Report TR-1080

Summary

This report describes a study effort on the state-of-the-art and lessons learned in automated, two- way, speech-to-speech translation and its potential application to military problems. The study includes and comments upon an extensive set of references on prior and current work in speech translation. The study includes recommendations on future military applications and on R&D needed to successfully achieve those applications. Key findings of the study include: (1) R&D speech translation systems have been demonstrated, but only in limited domains, and their performance is inadequate for operational use; (2) as far as we have been able to determine, there are currently no operational two-way speech translation systems; (3) intensive, sustained R&D will be needed to develop usable two-way speech translation systems. Major recommendations include: (1) a substantial R&D program in speech translation is needed, especially including full end-to-end system prototyping and evaluation; (2) close cooperation among researchers and users speaking multiple languages will be needed for the development of useful application systems; (3) to get military users involved and interacting in a mode which enables them to provide useful inputs and feedback on system requirements and performance, it will be necessary to provide them at the start with a fairly robust, open-domain system which works to the degree that some two-way speech translation is operational.
READ LESS

Summary

This report describes a study effort on the state-of-the-art and lessons learned in automated, two- way, speech-to-speech translation and its potential application to military problems. The study includes and comments upon an extensive set of references on prior and current work in speech translation. The study includes recommendations on future...

READ MORE

Gender-dependent phonetic refraction for speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 13-17 May 2002, Vol. 1, pp. 149-152.

Summary

This paper describes improvement to an innovative high-performance speaker recognition system. Recent experiments showed that with sufficient training data phone strings from multiple languages are exceptional features for speaker recognition. The prototype phonetic speaker recognition system used phone sequences from six languages to produce an equal error rate of 11.5% on Switchboard-I audio files. The improved system described in this paper reduces the equal error rate to less than 4%. This is accomplished by incorporating gender-dependent phone models, pre-processing the speech files to remove cross-talk, and developing more sophisticated fusion techniques for the multi-language likelihood scores.
READ LESS

Summary

This paper describes improvement to an innovative high-performance speaker recognition system. Recent experiments showed that with sufficient training data phone strings from multiple languages are exceptional features for speaker recognition. The prototype phonetic speaker recognition system used phone sequences from six languages to produce an equal error rate of 11.5%...

READ MORE

Language identification using Gaussian mixture model tokenization

Published in:
Proc. IEEE Int. Conf., on Acoustics, Speech and Signal Processing, ICASSP, Vol. I, 13-17 May 2002, pp. I-757 - I-760.

Summary

Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification. In this paper, this technique is generalized by using Gaussian mixture models as the basis for tokenizing. Performance results are presented for a system employing a GMM tokenizer in conjunction with multiple language processing and score combination techniques. On the 1996 CallFriend LID evaluation set, a 12-way closed set error rate of 17% was obtained.
READ LESS

Summary

Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification. In this paper, this technique is generalized by using Gaussian mixture models as the basis for tokenizing. Performance results are presented for a system employing a GMM tokenizer in conjunction with multiple...

READ MORE

Interlingua-based English-Korean two-way speech translation of doctor-patient dialogues with CCLINC

Published in:
Machine Trans. Vol. 17, No. 3, 2002, pp. 213-243.

Summary

Development of a robust two-way real-time speech translation system exposes researchers and system developers to various challenges of machine translation (MT) and spoken language dialogues. The need for communicating in at least two different languages poses problems not present for a monolingual spoken language dialogue system, where no MT engine is embedded within the process flow. Integration of various component modules for real-time operation poses challenges not present for text translation. In this paper, we present the CCLINC (Common Coalition Language System at Lincoln Laboratory) English-Korean two-way speech translation system prototype trained on doctor-patient dialogues, which integrates various techniques to tackle the challenges of automatic real-time speech translation. Key features of the system include (i) language-independent meaning representation which preserves the hierarchical predicate-argument structure of an input utterance, providing a powerful mechanism for discourse understanding of utterances originating from different languages, word-sense disambiguation and generation of various word orders of many languages, (ii) adoption of the DARPA Communicator architecture, a plug-and-play distributed system architecture which facilitates integration of component modules and system operation in real time, and (iii) automatic acquisition of grammar rules and lexicons for easy porting of the system to different languages and domains. We describe these features in detail and present experimental results.
READ LESS

Summary

Development of a robust two-way real-time speech translation system exposes researchers and system developers to various challenges of machine translation (MT) and spoken language dialogues. The need for communicating in at least two different languages poses problems not present for a monolingual spoken language dialogue system, where no MT engine...

READ MORE

Detecting clusters of galaxies in the Sloan Digital Sky Survey. I. Monte Carlo comparison of cluster detection algorithms

Summary

We present a comparison of three cluster-finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg(2) region of Sloan Digital Sky Survey (SDSS) imaging data: the matched filter (MF), the adaptive matched filter (AMF), and a color-magnitude filtered Voronoi tessellation technique (VTT). Among the two matched filters, we find that the MF is more efficient in detecting faint clusters, whereas the AMF evaluates the redshifts and richnesses more accurately, therefore suggesting a hybrid method (HMF) that combines the two. The HMF outperforms the VTT when using a background that is uniform, but it is more sensitive to the presence of a nonuniform galaxy background than is the VTT; this is due to the assumption of a uniform background in the HMF model. We thus find that for the detection thresholds we determine to be appropriate for the SDSS data, the performance of both algorithms are similar; we present the selection function for each method evaluated with these thresholds as a function of redshift and richness. For simulated clusters generated with a Schechter luminosity function (M(*r) = -21.5 and (a = -1.1), both algorithms are complete for Abell richness >~ clusters up to z ~0.4 for a sample magnitude limited to r = 21. While the cluster parameter evaluation shows a mild correlation with the local background density, the detection efficiency is not significantly affected by the background fluctuations, unlike previous shallower surveys.
READ LESS

Summary

We present a comparison of three cluster-finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg(2) region of Sloan Digital Sky Survey (SDSS) imaging data: the matched filter (MF), the adaptive matched filter (AMF), and a color-magnitude filtered Voronoi tessellation technique (VTT). Among the...

READ MORE

Discrete optimization using decision-directed learning for distributed networked computing

Summary

Decision-directed learning (DDL) is an iterative discrete approach to finding a feasible solution for large-scale combinatorial optimization problems. DDL is capable of efficiently formulating a solution to network scheduling problems that involve load limiting device utilization, selecting parallel configurations for software applications and host hardware using a minimum set of resources, and meeting time-to-result performance requirements in a dynamic network environment. This paper quantifies the algorithms that constitute DDL and compares its performance to other popular combinatorial self-directed real-time networked resource configuration for dynamically building a mission specific signal-processor for real-time distributed and parallel applications.
READ LESS

Summary

Decision-directed learning (DDL) is an iterative discrete approach to finding a feasible solution for large-scale combinatorial optimization problems. DDL is capable of efficiently formulating a solution to network scheduling problems that involve load limiting device utilization, selecting parallel configurations for software applications and host hardware using a minimum set of...

READ MORE

The effect of personality type on the usage of a multimedia engineering education system

Author:
Published in:
32nd Annual ASEE/IEEE Frontiers in Education Conf., 6-9 November 2002, pp. T3A-7 - T3A-12.

Summary

Multimedia education has quickly entered our classrooms and offices providing tutorials and lessons on many different topics. The assumption that most people interact with these multimedia systems in similar ways can easily be made, but are these assumptions valid? What factors determine whether students will embrace computer-based multimedia-augmented learning? One factor may be the student's personality type. This paper explores the reasons why some students may enjoy learning using computer-based educational delivery systems while others may have absolutely no enthusiasm for this type of learning and how that enthusiasm may relate to the students' personality types.
READ LESS

Summary

Multimedia education has quickly entered our classrooms and offices providing tutorials and lessons on many different topics. The assumption that most people interact with these multimedia systems in similar ways can easily be made, but are these assumptions valid? What factors determine whether students will embrace computer-based multimedia-augmented learning? One...

READ MORE