Publications

Refine Results

(Filters Applied) Clear All

Coverage maximization using dynamic taint tracing

Published in:
MIT Lincoln Laboratory Report TR-1112

Summary

We present COMET, a system that automatically assembles a test suite for a C program to improve line coverage, and give initial results for a prototype implementation. COMET works dynamically, running the program under a variety of instrumentations in a feedback loop that adds new inputs to an initial corpus with each iteration. One instrumentation in particular is crucial to the success of this approach: dynamic taint tracing. Inputs are labeled as tainted at the byte level and all read/write pairs in the program are augmented to track the flow of taint between memory objects. This allows COMET to determine from which bytes of which inputs the variables in conditions derive, thereby dramatically narrowing the search over inputs necessary to expose new code. On a test set of 13 example program, COMET improves upon the level of coverage reached in random testing by an average of 23% relative, takes only about twice the time, and requires a tiny fraction of the number of inputs to do so.
READ LESS

Summary

We present COMET, a system that automatically assembles a test suite for a C program to improve line coverage, and give initial results for a prototype implementation. COMET works dynamically, running the program under a variety of instrumentations in a feedback loop that adds new inputs to an initial corpus...

READ MORE

Auditory modeling as a basis for spectral modulation analysis with application to speaker recognition

Published in:
MIT Lincoln Laboratory Report TR-1119

Summary

This report explores auditory modeling as a basis for robust automatic speaker verification. Specifically, we have developed feature-extraction front-ends that incorporate (1) time-varying, level-dependent filtering, (2) variations in analysis filterbank size,and (3) nonlinear adaptation. Our methods are motivated both by a desire to better mimic auditory processing relative to traditional front-ends (e.g., the mel-cepstrum) as well as by reported gains in automatic speech recognition robustness exploiting similar principles. Traditional mel-cepstral features in automatic speaker recognition are derived from ~20 invariant band-pass filter weights, thereby discarding temporal structure from phase. In contrast, cochlear frequency decomposition can be more precisely modeled as the output of ~3500 time-varying, level-dependent filters. Auditory signal processing is therefore more resolved in frequency than mel-cepstral analysis and also derives temporal information. Furthermore, loss of level-dependence has been suggested to reduce human speech reception in adverse acoustic environments. We were thus motivated to employ a recently proposed level-dependent compressed gammachirp filterbank in feature extraction as well as vary the number of filters or filter weights to improve frequency resolution. We are also simulating nonlinear adaptation models of inner hair cell function along the basilar membrane that presumably mimic temporal masking effects. Auditory-based front-ends are being evaluated with the Lincoln Laboratory Gaussian mixture model recognizer on the TIMIT database under clean and noisy (additive Gaussian white noise) conditions. Preliminary results of features derived from our auditory models suggest that they provide complementary information to the mel-cepstrum under clean and noisy conditions, resulting in speaker recognition performance improvements.
READ LESS

Summary

This report explores auditory modeling as a basis for robust automatic speaker verification. Specifically, we have developed feature-extraction front-ends that incorporate (1) time-varying, level-dependent filtering, (2) variations in analysis filterbank size,and (3) nonlinear adaptation. Our methods are motivated both by a desire to better mimic auditory processing relative to traditional...

READ MORE

Automatic language recognition via spectral and token based approaches

Published in:
Chapter 41 in Springer Handbook of Speech Processing and Communication, 2007, pp. 811-24.

Summary

Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and systems for making recognition decisions using one or more of these language cues. A performance assessment of these systems is also provided, in terms of both accuracy and computation considerations, using the National Institute of Science and Technology (NIST) language recognition evaluation benchmarks.
READ LESS

Summary

Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and...

READ MORE

Practical attack graph generation for network defense

Published in:
Proc. of the 22nd Annual Computer Security Applications Conf., IEEE, 11-15 December 2006, pp.121-130.

Summary

Attack graphs are a valuable tool to network defenders, illustrating paths an attacker can use to gain access to a targeted network. Defenders can then focus their efforts on patching the vulnerabilities and configuration errors that allow the attackers the greatest amount of access. We have created a new type of attack graph, the multiple-prerequisite graph, that scales nearly linearly as the size of a typical network increases. We have built a prototype system using this graph type. The prototype uses readily available source data to automatically compute network reachability, classify vulnerabilities, build the graph, and recommend actions to improve network security. We have tested the prototype on an operational network with over 250 hosts, where it helped to discover a previously unknown configuration error. It has processed complex simulated networks with over 50,000 hosts in under four minutes.
READ LESS

Summary

Attack graphs are a valuable tool to network defenders, illustrating paths an attacker can use to gain access to a targeted network. Defenders can then focus their efforts on patching the vulnerabilities and configuration errors that allow the attackers the greatest amount of access. We have created a new type...

READ MORE

Experimental facility for measuring the impact of environmental noise and speaker variation on speech-to-speech translation devices

Published in:
Proc. IEEE Spoken Language Technology Workshop, 10-13 December 2006, pp. 250-253.

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately 60,000 phrases available for experimentation. We describe an initial experiment using the facility which shows the impact of environmental noise and speaker variability on phrase recognition accuracy for two commercially available oneway speech-to-speech translation devices configured for English-to-Arabic.
READ LESS

Summary

We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately...

READ MORE

The security of OpenBSD: milk or wine?

Published in:
;login:, Vol. 31, No. 6, December 2006, pp. 26-32.

Summary

Purchase a fine wine, place it in a cellar, and wait a few years: The aging will have resulted in a delightful beverage, a product far better than the original. Purchase a gallon of milk, place it in a cellar, and wait a few years. You will be sorry. We know how the passing of time affects milk and wine, but how does aging affect the security of software? Many in the security research community have criticized software developers both for releasing software with so many vulnerabilities and for the lack of any apparent improvement in this software over time. However, critics have lacked quantitative evidence that applying effort over time will result in software with fewer vulnerabilities. In short, we don't know whether software security is destined to age like milk or has the potential to become wine. We thus investigated whether or not the rate at which vulnerabilities are reported in OpenBSD is decreasing over time.
READ LESS

Summary

Purchase a fine wine, place it in a cellar, and wait a few years: The aging will have resulted in a delightful beverage, a product far better than the original. Purchase a gallon of milk, place it in a cellar, and wait a few years. You will be sorry. We...

READ MORE

An efficient graph search decoder for phrase-based statistical machine translation

Published in:
Int. Workshop on Spoken Language Translation, 28 November 2006.

Summary

In this paper we describe an efficient implementation of a graph search algorithm for phrase-based statistical machine translation. Our goal was to create a decoder that could be used for both our research system and a real-time speech-to-speech machine translation demonstration system. The search algorithm is based on a Viterbi graph search with an A* heuristic. We were able to increase the speed of our decoder substantially through the use of on-the-fly beam pruning and other algorithmic enhancements. The decoder supports a variety of reordering constraints as well as arbitrary n-gram decoding. In addition, we have implemented disk based translation models and a messaging interface to communicate with other components for use in our real-time speech translation system.
READ LESS

Summary

In this paper we describe an efficient implementation of a graph search algorithm for phrase-based statistical machine translation. Our goal was to create a decoder that could be used for both our research system and a real-time speech-to-speech machine translation demonstration system. The search algorithm is based on a Viterbi...

READ MORE

The MIT-LL/AFRL IWSLT-2006 MT system

Published in:
Proc. Int. Workshop on Spoken Language Translation, IWSLT, 27-28 November 2006.

Summary

The MIT-LL/AFRL MT system is a statistical phrase-based translation system that implements many modern SMT training and decoding techniques. Our system was designed with the long-term goal of dealing with corrupted ASR input and limited amounts of training data for speech-to-speech MT applications. This paper will discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2005 system, and experiments with manual and ASR transcription data that were run as part of the IWSLT-2006 evaluation campaign.
READ LESS

Summary

The MIT-LL/AFRL MT system is a statistical phrase-based translation system that implements many modern SMT training and decoding techniques. Our system was designed with the long-term goal of dealing with corrupted ASR input and limited amounts of training data for speech-to-speech MT applications. This paper will discuss the architecture of...

READ MORE

The JHU Workshop 2006 IWSLT System

Published in:
Int. Workshop on Spoken Language Translation, IWSLT, 27-28 November 2006.

Summary

This paper describes the SMT we built during the 2006 JHU Summer Workshop for the IWSLT 2006 evaluation. Our effort focuses on two parts of the speech translation problem: 1) efficient decoding of word lattices and 2) novel applications of factored translation models to IWSLT-specific problems. In this paper, we present results from the open-track Chinese-to-English condition. Improvements of 5-10% relative BLEU are obtained over a high performing baseline. We introduce a new open-source decoder that implements the state-of-the-art in statistical machine translation.
READ LESS

Summary

This paper describes the SMT we built during the 2006 JHU Summer Workshop for the IWSLT 2006 evaluation. Our effort focuses on two parts of the speech translation problem: 1) efficient decoding of word lattices and 2) novel applications of factored translation models to IWSLT-specific problems. In this paper, we...

READ MORE

High productivity computing and usable petascale systems

Published in:
SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Summary

High Performance Computing has seen extraordinary growth in peak performance which has been accompanied by a significant increase in the difficulty of using these systems. High Productivity Computing Systems (HPCS) seek to address this gap by producing petascale computers that are usable by a broader range of scientists and engineers. One of the most important HPCS innovations is the concept of a flatter memory hierarchy, which means that data from remote processors can be retrieved and used very efficiently. A flatter memory hierarchy increases performance and is easier to program.
READ LESS

Summary

High Performance Computing has seen extraordinary growth in peak performance which has been accompanied by a significant increase in the difficulty of using these systems. High Productivity Computing Systems (HPCS) seek to address this gap by producing petascale computers that are usable by a broader range of scientists and engineers...

READ MORE