Large-scale 3D scene reconstruction using Structure from Motion (SfM) continues to be very computationally challenging despite much active research in the area. We propose an efficient, scalable processing chain designed for cluster computing and suitable for use on aerial video. The sparse bundle adjustment step, which is iterative and difficult to parallelize, is accomplished by partitioning the input image set, generating independent point clouds in parallel, and then fusing the clouds and combining duplicate points. We compare this processing chain to a leading parallel SfM implementation, which exploits fine-grained parallelism in various matrix operations and is not designed to scale beyond a multi-core workstation with GPU. We show our cluster-based approach offers significant improvement in scalability and runtime while producing comparable point cloud density and more accurate point location estimates.

READ LESS

Summary

Cluster-based 3D reconstruction of aerial video

Benchmarking parallel eigen decomposition for residuals analysis of very large graphs

September 10, 2012

Conference Paper

Author:

Edward M. Rutledge

…

Published in:

HPEC 2012: IEEE Conf. on High Performance Extreme Computing, 10-12 September 2012.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Graph analysis is used in many domains, from the social sciences to physics and engineering. The computational driver for one important class of graph analysis algorithms is the computation of leading eigenvectors of matrix representations of a graph. This paper explores the computational implications of performing an eigen decomposition of a directed graph's symmetrized modularity matrix using commodity cluster hardware and freely available eigensolver software, for graphs with 1 million to 1 billion vertices, and 8 million to 8 billion edges. Working with graphs of these sizes, parallel eigensolvers are of particular interest. Our results suggest that graph analysis approaches based on eigen space analysis of graph residuals are feasible even for graphs of these sizes.

READ LESS

Summary

Benchmarking parallel eigen decomposition for residuals analysis of very large graphs

Supervector LDA - a new approach to reduced-complexity i-vector language recognition

September 9, 2012

Conference Paper

Author:

Alan V. McCree

…

Bengt J. Borgstrom

Published in:

INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper, we extend our previous analysis of Gaussian Mixture Model (GMM) subspace compensation techniques using Gaussian modeling in the supervector space combined with additive channel and observation noise. We show that under the modeling assumptions of a total-variability i-vector system, full Gaussian supervector scoring can also be performed cheaply in the total subspace, and that i-vector scoring can be viewed as an approximation to this. Next, we show that covariance matrix estimation in the i-vector space can be used to generate PCA estimates of supervector covariance matrices needed for Joint Factor Analysis (JFA). Finally, we derive a new technique for reduced-dimension i-vector extraction which we call Supervector LDA (SV-LDA), and demonstrate a 100-dimensional i-vector language recognition system with equivalent performance to a 600-dimensional version at much lower complexity.

READ LESS

Summary

Supervector LDA - a new approach to reduced-complexity i-vector language recognition

Vocal-source biomarkers for depression - a link to psychomotor activity

September 9, 2012

Conference Paper

Author:

Thomas F. Quatieri

…

Nicolas Malyska

Published in:

INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

A hypothesis in characterizing human depression is that change in the brain's basal ganglia results in a decline of motor coordination. Such a neuro-physiological change may therefore affect laryngeal control and dynamics. Under this hypothesis, toward the goal of objective monitoring of depression severity, we investigate vocal-source biomarkers for depression; specifically, source features that may relate to precision in motor control, including vocal-fold shimmer and jitter, degree of aspiration, fundamental frequency dynamics, and frequency-dependence of variability and velocity of energy. We use a 35-subject database collected by Mundt et al. in which subjects were treated over a six-week period, and investigate correlation of our features with clinical (HAMD), as well as self-reported (QIDS) Total subject assessment scores. To explicitly address the motor aspect of depression, we compute correlations with the Psychomotor Retardation component of clinical and self-reported Total assessments. For our longitudinal database, most correlations point to statistical relationships of our vocal-source biomarkers with psychomotor activity, as well as with depression severity.

READ LESS

Summary

Vocal-source biomarkers for depression - a link to psychomotor activity

Speech enhancement using sparse convolutive non-negative matrix factorization with basis adaptation

September 9, 2012

Conference Paper

Author:

Michael A. Carlin

…

Published in:

INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

We introduce a framework for speech enhancement based on convolutive non-negative matrix factorization that leverages available speech data to enhance arbitrary noisy utterances with no a priori knowledge of the speakers or noise types present. Previous approaches have shown the utility of a sparse reconstruction of the speech-only components of an observed noisy utterance. We demonstrate that an underlying speech representation which, in addition to applying sparsity, also adapts to the noisy acoustics improves overall enhancement quality. The proposed system performs comparably to a traditional Wiener filtering approach, and the results suggest that the proposed framework is most useful in moderate- to low-SNR scenarios.

READ LESS

Summary

Speech enhancement using sparse convolutive non-negative matrix factorization with basis adaptation

Analyzing and interpreting automatically learned rules across dialects

September 9, 2012

Conference Paper

Author:

Nancy Chen

…

Published in:

INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper, we demonstrate how informative dialect recognition systems such as acoustic pronunciation model (APM) help speech scientists locate and analyze phonetic rules efficiently. In particular, we analyze dialect-specific characteristics automatically learned from APM across two American English dialects. We show that unsupervised rule retrieval performs similarly to supervised retrieval, indicating that APM is useful in practical applications, where word transcripts are often unavailable. We also demonstrate that the top-ranking rules learned from APM generally correspond to the linguistic literature, and can even pinpoint potential research directions to refine existing knowledge. Thus, the APM system can help phoneticians analyze rules efficiently by characterizing large amounts of data to postulate rule candidates, so they can reserve time to conduct more targeted investigations. Potential applications of informative dialect recognition systems include forensic phonetics and diagnosis of spoken language disorders.

READ LESS

Summary

Analyzing and interpreting automatically learned rules across dialects

Query-by-example using speaker content graphs

September 9, 2012

Conference Paper

Author:

William M. Campbell

…

Elliot Singer

Published in:

INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Topic:

social network

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

We describe methods for constructing and using content graphs for query-by-example speaker recognition tasks within a large speech corpus. This goal is achieved as follows: First, we describe an algorithm for constructing speaker content graphs, where nodes represent speech signals and edges represent speaker similarity. Speech signal similarity can be based on any standard vector-based speaker comparison method, and the content graph can be constructed using an efficient incremental method for streaming data. Second, we apply random walk methods to the content graph to find matching examples to an unlabeled query set of speech signals. The content-graph based method is contrasted to a more traditional approach that uses supervised training and stack detectors. Performance is compared in terms of information retrieval measures and computational complexity. The new content-graph based method is shown to provide a promising low-complexity scalable alternative to standard speaker recognition methods.

READ LESS

Summary

Query-by-example using speaker content graphs

Individual and group dynamics in the reality mining corpus

September 3, 2012

Conference Paper

Author:

Cagri K. Dagli

…

William M. Campbell

Published in:

Proc. 2012 ASE/IEEE Int. Conf. on Social Computing, 3-5 September 2012, pp. 61-70.

Topic:

social network

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Though significant progress has been made in recent years, traditional work in social networks has focused on static network analysis or dynamics in a large-scale sense. In this work, we explore ways in which temporal information from sociographic data can be used for the analysis and prediction of individual and group behavior in dynamic, real-world situations. Using the MIT Reality Mining corpus, we show how temporal information in highly-instrumented sociographic data can be used to gain insights otherwise unavailable from static snapshots. We show how pattern of life features extend from the individual to the group level. In particular, we show how anonymized location information can be used to infer individual identity. Additionally, we show how proximity information can be used in a multilinear clustering framework to detect interesting group behavior over time. Experimental results and discussion suggest temporal information has great potential for improving both individual and group level understanding of real-world, dense social network data.

READ LESS

Summary

Individual and group dynamics in the reality mining corpus

Toward matched filter optimization for subgraph detection in dynamic networks

August 5, 2012

Conference Paper

Author:

Benjamin A. Miller

…

Nadya T. Bliss

Published in:

2012 SSP: 2012 IEEE Statistical Signal Processing Workshop, 5-8 August 2012, pp. 113-116.

Topic:

signal processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper outlines techniques for optimization of filter coefficients in a spectral framework for anomalous subgraph detection. Restricting the scope to the detection of a known signal in i.i.d. noise, the optimal coefficients for maximizing the signal's power are shown to be found via a rank-1 tensor approximation of the subgraph's dynamic topology. While this technique optimizes our power metric, a filter based on average degree is shown in simulation to work nearly as well in terms of power maximization and detection performance, and better separates the signal from the noise in the eigenspace.

READ LESS

Summary

Toward matched filter optimization for subgraph detection in dynamic networks

Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks

June 25, 2012

Conference Paper

Author:

William M. Campbell

…

Published in:

Odyssey 2012, the Speaker and Language Recognition Workshop, 25-28 June 2012.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise properties, etc. In this paper, we explore alternate choices of preprocessing and examine their effects on speaker recognition performance. Specifically, we consider the effects of quantization, sampling rate, enhancement, and two-channel speech activity detection. Experiments on the NIST 2010 SRE interview microphone corpus demonstrate that performance can be dramatically improved with a different preprocessing chain.

READ LESS

Summary

Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results