Publications

Refine Results

(Filters Applied) Clear All

The MITLL/AFRL IWSLT-2014 MT System

Summary

This report summarizes the MITLL-AFRL MT and ASR systems and the experiments run using them during the 2014 IWSLT evaluation campaign. Our MT system is much improved over last year, owing to integration of techniques such as PRO and DREM optimization, factored language models, neural network joint model rescoring, multiple phrase tables, and development set creation. We focused our efforts this year on the tasks of translating from Arabic, Russian, Chinese, and Farsi into English, as well as translating from English to French. ASR performance also improved, partly due to increased efforts with deep neural networks for hybrid and tandem systems. Work focused on both the English and Italian ASR tasks.
READ LESS

Summary

This report summarizes the MITLL-AFRL MT and ASR systems and the experiments run using them during the 2014 IWSLT evaluation campaign. Our MT system is much improved over last year, owing to integration of techniques such as PRO and DREM optimization, factored language models, neural network joint model rescoring, multiple...

READ MORE

Comparing a high and low-level deep neural network implementation for automatic speech recognition

Published in:
1st Workshop for High Performance Technical Computing in Dynamic Languages, HPTCDL 2014, 17 November 2014.

Summary

The use of deep neural networks (DNNs) has improved performance in several fields including computer vision, natural language processing, and automatic speech recognition (ASR). The increased use of DNNs in recent years has been largely due to performance afforded by GPUs, as the computational cost of training large networks on a CPU is prohibitive. Many training algorithms are well-suited to the GPU; however, writing hand-optimized GPGPU code is a significant undertaking. More recently, high-level libraries have attempted to simplify GPGPU development by automatically performing tasks such as optimization and code generation. This work utilizes Theano, a high-level Python library, to implement a DNN for the purpose of phone recognition in ASR. Performance is compared against a low-level, hand-optimized C++/CUDA DNN implementation from Kaldi, a popular ASR toolkit. Results show that the DNN implementation in Theano has CPU and GPU runtimes on par with that of Kaldi, while requiring approximately 95% less lines of code.
READ LESS

Summary

The use of deep neural networks (DNNs) has improved performance in several fields including computer vision, natural language processing, and automatic speech recognition (ASR). The increased use of DNNs in recent years has been largely due to performance afforded by GPUs, as the computational cost of training large networks on...

READ MORE

Finding good enough: a task-based evaluation of query biased summarization for cross language information retrieval

Published in:
EMNLP 2014, Proc. of Conf. on Empirical Methods in Natural Language Processing, 25-29 October, 2014, pp. 657-69.

Summary

In this paper we present our task-based evaluation of query biased summarization for cross-language information retrieval (CLIR) using relevance prediction. We describe our 13 summarization methods each from one of four summarization strategies. We show how well our methods perform using Farsi text from the CLEF 2008 shared-task, which we translated to English automatically. We report precision/recall/F1, accuracy and time-on-task. We found that different summarization methods perform optimally for different evaluation metrics, but overall query biased word clouds are the best summarization strategy. In our analysis, we demonstrate that using the ROUGE metric on our sentence-based summaries cannot make the same kinds of distinctions as our evaluation framework does. Finally, we present our recommendations for creating much-needed evaluation standards and databases.
READ LESS

Summary

In this paper we present our task-based evaluation of query biased summarization for cross-language information retrieval (CLIR) using relevance prediction. We describe our 13 summarization methods each from one of four summarization strategies. We show how well our methods perform using Farsi text from the CLEF 2008 shared-task, which we...

READ MORE

Using deep belief networks for vector-based speaker recognition

Published in:
INTERSPEECH 2014: 15th Annual Conf. of the Int. Speech Communication Assoc., 14-18 September 2014.

Summary

Deep belief networks (DBNs) have become a successful approach for acoustic modeling in speech recognition. DBNs exhibit strong approximation properties, improved performance, and are parameter efficient. In this work, we propose methods for applying DBNs to speaker recognition. In contrast to prior work, our approach to DBNs for speaker recognition starts at the acoustic modeling layer. We use sparse-output DBNs trained with both unsupervised and supervised methods to generate statistics for use in standard vector-based speaker recognition methods. We show that a DBN can replace a GMM UBM in this processing. Methods, qualitative analysis, and results are given on a NIST SRE 2012 task. Overall, our results show that DBNs show competitive performance to modern approaches in an initial implementation of our framework.
READ LESS

Summary

Deep belief networks (DBNs) have become a successful approach for acoustic modeling in speech recognition. DBNs exhibit strong approximation properties, improved performance, and are parameter efficient. In this work, we propose methods for applying DBNs to speaker recognition. In contrast to prior work, our approach to DBNs for speaker recognition...

READ MORE

Talking Head Detection by Likelihood-Ratio Test(220.2 KB)

Published in:
Second Workshop on Speech, Language, Audio in Multimedia

Summary

Detecting accurately when a person whose face is visible in an audio-visual medium is the audible speaker is an enabling technology with a number of useful applications. The likelihood-ratio test formulation and feature signal processing employed here allow the use of high-dimensional feature sets in the audio and visual domain, and the approach appears to have good detection performance for AV segments as short as a few seconds.
READ LESS

Summary

Detecting accurately when a person whose face is visible in an audio-visual medium is the audible speaker is an enabling technology with a number of useful applications. The likelihood-ratio test formulation and feature signal processing employed here allow the use of high-dimensional feature sets in the audio and visual domain...

READ MORE

Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. These problems become more difficult as the background graph grows larger and noisier and the coordination patterns become more subtle. In this paper, we discuss the computational challenges of a statistical framework designed to address this cross-mission challenge. The statistical framework is based on spectral analysis of the graph data, and three partitioning methods are evaluated for computing the principal eigenvector of the graph's residuals matrix. While a standard one-dimensional partitioning technique enables this computation for up to four billion vertices, the communication overhead prevents this method from being used for even larger graphs. Recent two-dimensional partitioning methods are shown to have much more favorable scaling properties. A data-dependent partitioning method, which has the best scaling performance, is also shown to improve computation time even as a graph changes over time, allowing amortization of the upfront cost.
READ LESS

Summary

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network...

READ MORE

Content+context=classification: examining the roles of social interactions and linguist content in Twitter user classification

Published in:
Proc. Second Workshop on Natural Language Processing for Social Media, SocialNLP, 24 August 2014, pp. 59-65.

Summary

Twitter users demonstrate many characteristics via their online presence. Connections, community memberships, and communication patterns reveal both idiosyncratic and general properties of users. In addition, the content of tweets can be critical for distinguishing the role and importance of a user. In this work, we explore Twitter user classification using context and content cues. We construct a rich graph structure induced by hashtags and social communications in Twitter. We derive features from this graph structure - centrality, communities, and local flow of information. In addition, we perform detailed content analysis on tweets looking at offensiveness and topics. We then examine user classification and the role of feature types (context, content) and learning methods (propositional, relational) through a series of experiments on annotated data. Our work contrasts with prior approaches in that we use relational learning and alternative, non-specialized feature sets. Our goal is to understand how both content and context are predictive of user characteristics. Experiments demonstrate that the best performance for user classification uses relational learning with varying content and context features.
READ LESS

Summary

Twitter users demonstrate many characteristics via their online presence. Connections, community memberships, and communication patterns reveal both idiosyncratic and general properties of users. In addition, the content of tweets can be critical for distinguishing the role and importance of a user. In this work, we explore Twitter user classification using...

READ MORE

VizLinc: integrating information extraction, search, graph analysis, and geo-location for the visual exploration of large data sets

Published in:
Proc. KDD 2014 Workshop on Interactive Data Exploration and Analytics, IDEA, 24 August 2014, pp. 10-18.

Summary

In this demo paper we introduce VizLinc; an open-source software suite that integrates automatic information extraction, search, graph analysis, and geo-location for interactive visualization and exploration of large data sets. VizLinc helps users in: 1) understanding the type of information the data set under study might contain, 2) finding patterns and connections between entities, and 3) narrowing down the corpus to a small fraction of relevant documents that users can quickly read. We apply the tools offered by VizLinc to a subset of the New York Times Annotated Corpus and present use cases that demonstrate VizLinc's search and visualization features.
READ LESS

Summary

In this demo paper we introduce VizLinc; an open-source software suite that integrates automatic information extraction, search, graph analysis, and geo-location for interactive visualization and exploration of large data sets. VizLinc helps users in: 1) understanding the type of information the data set under study might contain, 2) finding patterns...

READ MORE

Exploiting morphological, grammatical, and semantic correlates for improved text difficulty assessment

Author:
Published in:
Proc. 9th Workshop on Innovative Use of NLP for Building Educational Applications, 26 June 2014, pp. 155-162.

Summary

We present a low-resource, language-independent system for text difficulty assessment. We replicate and improve upon a baseline by Shen et al. (2013) on the Interagency Language Roundtable (ILR) scale. Our work demonstrates that the addition of morphological, information theoretic, and language modeling features to a traditional readability baseline greatly benefits our performance. We use the Margin-Infused Relaxed Algorithm and Support Vector Machines for experiments on Arabic, Dari, English, and Pashto, and provide a detailed analysis of our results.
READ LESS

Summary

We present a low-resource, language-independent system for text difficulty assessment. We replicate and improve upon a baseline by Shen et al. (2013) on the Interagency Language Roundtable (ILR) scale. Our work demonstrates that the addition of morphological, information theoretic, and language modeling features to a traditional readability baseline greatly benefits...

READ MORE

Audio-visual identity grounding for enabling cross media search

Author:
Published in:
IEEE Computer Vision and Pattern Recognition Big Data Workshop, 23 June 2014.

Summary

Automatically searching for media clips in large heterogeneous datasets is an inherently difficult challenge, and nearly impossibly so when searching across distinct media types (e.g. finding audio clips that match an image). In this paper we introduce the exploitation of identity grounding for enabling this cross media search and exploration capability. Through the use of grounding we leverage one media channel (e.g. visual identity) as a noisy label for training a model in a different channel (e.g. audio speaker model). Finally, we demonstrate this search capability using images from the Labeled Faces in the Wild (LFW) dataset to query audio files that have been extracted from the YouTube Faces (YTF) dataset.
READ LESS

Summary

Automatically searching for media clips in large heterogeneous datasets is an inherently difficult challenge, and nearly impossibly so when searching across distinct media types (e.g. finding audio clips that match an image). In this paper we introduce the exploitation of identity grounding for enabling this cross media search and exploration...

READ MORE