Publications

Refine Results

(Filters Applied) Clear All

Feedback-based social media filtering tool for improved situational awareness

Published in:
15th Annual IEEE Int. Symp. on Technologies for Homeland Security, HST 2016, 10-12 May 2016.

Summary

This paper describes a feature-rich model of data relevance, designed to aid first responder retrieval of useful information from social media sources during disasters or emergencies. The approach is meant to address the failure of traditional keyword-based methods to sufficiently suppress clutter during retrieval. The model iteratively incorporates relevance feedback to update feature space selection and classifier construction across a multimodal set of diverse content characterization techniques. This approach is advantageous because the aspects of the data (or even the modalities of the data) that signify relevance cannot always be anticipated ahead of time. Experiments with both microblog text documents and coupled imagery and text documents demonstrate the effectiveness of this model on sample retrieval tasks, in comparison to more narrowly focused models operating in a priori selected feature spaces. The experiments also show that even relatively low feedback levels (i.e., tens of examples) can lead to a significant performance boost during the interactive retrieval process.
READ LESS

Summary

This paper describes a feature-rich model of data relevance, designed to aid first responder retrieval of useful information from social media sources during disasters or emergencies. The approach is meant to address the failure of traditional keyword-based methods to sufficiently suppress clutter during retrieval. The model iteratively incorporates relevance feedback...

READ MORE

Assessing functional neural connectivity as an indicator of cognitive performance

Published in:
5th NIPS Workshop on Machine Learning and Interpretation in Neuroimaging, MLINI 2015, 11-12 December 2015.

Summary

Studies in recent years have demonstrated that neural organization and structure impact an individual's ability to perform a given task. Specifically, individuals with greater neural efficiency have been shown to outperform those with less organized functional structure. In this work, we compare the predictive ability of properties of neural connectivity on a working memory task. We provide two novel approaches for characterizing functional network connectivity from electroencephalography (EEG), and compare these features to the average power across frequency bands in EEG channels. Our first novel approach represents functional connectivity structure through the distribution of eigenvalues making up channel coherence matrices in multiple frequency bands. Our second approach creates a connectivity network at each frequency band, and assesses variability in average path lengths of connected components and degree across the network. Failures in digit and sentence recall on single trials are detected using a Gaussian classifier for each feature set, at each frequency band. The classifier results are then fused across frequency bands, with the resulting detection performance summarized using the area under the receiver operating characteristic curve (AUC) statistic. Fused AUC results of 0.63/0.58/0.61 for digit recall failure and 0.58/0.59/0.54 for sentence recall failure are obtained from the connectivity structure, graph variability, and channel power features respectively.
READ LESS

Summary

Studies in recent years have demonstrated that neural organization and structure impact an individual's ability to perform a given task. Specifically, individuals with greater neural efficiency have been shown to outperform those with less organized functional structure. In this work, we compare the predictive ability of properties of neural connectivity...

READ MORE

Improved hidden clique detection by optimal linear fusion of multiple adjacency matrices

Published in:
2015 Asilomar Conf. on Signals, Systems and Computers, 8-11 November 2015.

Summary

Graph fusion has emerged as a promising research area for addressing challenges associated with noisy, uncertain, multi-source data. While many ad-hoc graph fusion techniques exist in the current literature, an analytical approach for analyzing the fundamentals of the graph fusion problem is lacking. We consider the setting where we are given multiple Erdos-Renyi modeled adjacency matrices containing a common hidden or planted clique. The objective is to combine them linearly so that the principal eigenvectors of the resulting matrix best reveal the vertices associated with the clique. We utilize recent results from random matrix theory to derive the optimal weighting coefficients and use these insights to develop a data-driven fusion algorithm. We demonstrate the improved performance of the algorithm relative to other simple heuristics.
READ LESS

Summary

Graph fusion has emerged as a promising research area for addressing challenges associated with noisy, uncertain, multi-source data. While many ad-hoc graph fusion techniques exist in the current literature, an analytical approach for analyzing the fundamentals of the graph fusion problem is lacking. We consider the setting where we are...

READ MORE

Residuals-based subgraph detection with cue vertices

Published in:
2015 Asilomar Conf. on Signals, Systems and Computers, 8-11 November 2015.

Summary

A common problem in modern graph analysis is the detection of communities, an example of which is the detection of a single anomalously dense subgraph. Recent results have demonstrated a fundamental limit for this problem when using spectral analysis of modularity. In this paper, we demonstrate the implication of these results on subgraph detection when a cue vertex is provided, indicating one of the vertices in the community of interest. Several recent algorithms for local community detection are applied in this context, and we compare their empirical performance to that of the simple method used to derive the theoretical detection limits.
READ LESS

Summary

A common problem in modern graph analysis is the detection of communities, an example of which is the detection of a single anomalously dense subgraph. Recent results have demonstrated a fundamental limit for this problem when using spectral analysis of modularity. In this paper, we demonstrate the implication of these...

READ MORE

Sampling operations on big data

Published in:
2015 Asilomar Conf. on Signals, Systems and Computers, 8-11 November 2015.

Summary

The 3Vs -- Volume, Velocity and Variety -- of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.
READ LESS

Summary

The 3Vs -- Volume, Velocity and Variety -- of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades...

READ MORE

Very large graphs for information extraction (VLG) - detection and inference in the presence of uncertainty

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one year proof-of-concept study--MIT LL developed novel techniques for anomalous subgraph detection, building on tools in the signal processing research literature. This report documents the technical results of this effort. Two datasets--a snapshot of Thompson Reuters' Web of Science database and a stream of web proxy logs--were parsed, and graphs were constructed from the raw data. From the phenomena in these datasets, several algorithms were developed to model the dynamic graph behavior, including a preferential attachment mechanism with memory, a streaming filter to model a graph as a weighted average of its past connections, and a generalized linear model for graphs where connection probabilities are determined by additional side information or metadata. A set of metrics was also constructed to facilitate comparison of techniques. The study culminated in a demonstration of the algorithms on the datasets of interest, in addition to simulated data. Performance in terms of detection, estimation, and computational burden was measured according to the metrics. Among the highlights of this demonstration were the detection of emerging coauthor clusters in the Web of Science data, detection of botnet activity in the web proxy data after 15 minutes (which took 10 days to detect using state-of-the-practice techniques), and demonstration of the core algorithm on a simulated 1-billion-vertex graph using a commodity computing cluster.
READ LESS

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one year proof-of-concept study--MIT LL developed...

READ MORE

Sampling large graphs for anticipatory analytics

Published in:
HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Summary

The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.
READ LESS

Summary

The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or...

READ MORE

A spectral framework for anomalous subgraph detection

Published in:
IEEE Trans. Signal Process., Vol. 63, No. 16, 15 August 2015, 4191-4206.

Summary

A wide variety of application domains is concerned with data consisting of entities and their relationships or connections, formally represented as graphs. Within these diverse application areas, a common problem of interest is the detection of a subset of entities whose connectivity is anomalous with respect to the rest of the data. While the detection of such anomalous subgraphs has received a substantial amount of attention, no application-agnostic framework exists for analysis of signal detectability in graph-based data. In this paper, we describe a framework that enables such analysis using the principal eigenspace of a graph's residuals matrix, commonly called the modularity matrix in community detection. Leveraging this analytical tool, we show that the framework has a natural power metric in the spectral norm of the anomalous subgraph's adjacency matrix (signal power) and of the background graph's residuals matrix (noise power). We propose several algorithms based on spectral properties of the residuals matrix, with more computationally expensive techniques providing greater detection power. Detection and identification performance are presented for a number of signal and noise models, including clusters and bipartite foregrounds embedded into simple random backgrounds, as well as graphs with community structure and realistic degree distributions. The trends observed verify intuition gleaned from other signal processing areas, such as greater detection power when the signal is embedded within a less active portion of the background. We demonstrate the utility of the proposed techniques in detecting small, highly anomalous subgraphs in real graphs derived from Internet traffic and product co-purchases.
READ LESS

Summary

A wide variety of application domains is concerned with data consisting of entities and their relationships or connections, formally represented as graphs. Within these diverse application areas, a common problem of interest is the detection of a subset of entities whose connectivity is anomalous with respect to the rest of...

READ MORE

Temporal and multi-source fusion for detection of innovation in collaboration networks

Published in:
Proc. of the 18th Int. Conf. On Information Fusion, 6-9 July 2015.

Summary

A common problem in network analysis is detecting small subgraphs of interest within a large background graph. This includes multi-source fusion scenarios where data from several modalities must be integrated to form the network. This paper presents an application of novel techniques leveraging the signal processing for graphs algorithmic framework, to well-studied collaboration networks in the field of evolutionary biology. Our multi-disciplinary approach allows us to leverage case studies of transformative periods in this scientific field as truth. We build on previous work by optimizing the temporal integration filters with respect to truth data using a tensor decomposition method that maximizes the spectral norm of the integrated subgraph's adjacency matrix. We also demonstrate that we can mitigate data corruption via fusion of different data sources, demonstrating the power of this analysis framework for incomplete and corrupted data.
READ LESS

Summary

A common problem in network analysis is detecting small subgraphs of interest within a large background graph. This includes multi-source fusion scenarios where data from several modalities must be integrated to form the network. This paper presents an application of novel techniques leveraging the signal processing for graphs algorithmic framework...

READ MORE

Planted clique detection below the noise floor using low-rank sparse PCA

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 19-24 April 2015.

Summary

Detection of clusters and communities in graphs is useful in a wide range of applications. In this paper we investigate the problem of detecting a clique embedded in a random graph. Recent results have demonstrated a sharp detectability threshold for a simple algorithm based on principal component analysis (PCA). Sparse PCA of the graph's modularity matrix can successfully discover clique locations where PCA-based detection methods fail. In this paper, we demonstrate that applying sparse PCA to low-rank approximations of the modularity matrix is a viable solution to the planted clique problem that enables detection of small planted cliques in graphs where running the standard semidefinite program for sparse PCA is not possible.
READ LESS

Summary

Detection of clusters and communities in graphs is useful in a wide range of applications. In this paper we investigate the problem of detecting a clique embedded in a random graph. Recent results have demonstrated a sharp detectability threshold for a simple algorithm based on principal component analysis (PCA). Sparse...

READ MORE