Publications

Refine Results

(Filters Applied) Clear All

Influence estimation on social media networks using causal inference

Published in:
Proc. IEEE Statistical Signal Processing (SSP) Workshop, 10-13 June 2018.

Summary

Estimating influence on social media networks is an important practical and theoretical problem, especially because this new medium is widely exploited as a platform for disinformation and propaganda. This paper introduces a novel approach to influence estimation on social media networks and applies it to the real-world problem of characterizing active influence operations on Twitter during the 2017 French presidential elections. The new influence estimation approach attributes impact by accounting for narrative propagation over the network using a network causal inference framework applied to data arising from graph sampling and filtering. This causal framework infers the difference in outcome as a function of exposure, in contrast to existing approaches that attribute impact to activity volume or topological features, which do not explicitly measure nor necessarily indicate actual network influence. Cramér-Rao estimation bounds are derived for parameter estimation as a step in the causal analysis, and used to achieve geometrical insight on the causal inference problem. The ability to infer high causal influence is demonstrated on real-world social media accounts that are later independently confirmed to be either directly affiliated or correlated with foreign influence operations using evidence supplied by the U.S. Congress and journalistic reports.
READ LESS

Summary

Estimating influence on social media networks is an important practical and theoretical problem, especially because this new medium is widely exploited as a platform for disinformation and propaganda. This paper introduces a novel approach to influence estimation on social media networks and applies it to the real-world problem of characterizing...

READ MORE

Intersection and convex combination in multi-source spectral planted cluster detection

Published in:
IEEE Global Conf. on Signal and Information Processing, GlobalSIP, 7-9 December 2016.

Summary

Planted cluster detection is an important form of signal detection when the data are in the form of a graph. When there are multiple graphs representing multiple connection types, the method of aggregation can have significant impact on the results of a detection algorithm. This paper addresses the tradeoff between two possible aggregation methods: convex combination and intersection. For a spectral detection method, convex combination dominates when the cluster is relatively sparse in at least one graph, while the intersection method dominates in cases where it is dense across graphs. Experimental results confirm the theory. We consider the context of adversarial cluster placement, and determine how an adversary would distribute connections among the graphs to best avoid detection.
READ LESS

Summary

Planted cluster detection is an important form of signal detection when the data are in the form of a graph. When there are multiple graphs representing multiple connection types, the method of aggregation can have significant impact on the results of a detection algorithm. This paper addresses the tradeoff between...

READ MORE

Matching community structure across online social networks

Author:
Published in:
arXiv, 3 August 2016.

Summary

The discovery of community structure in networks is a problem of considerable interest in recent years. In online social networks, often times, users are simultaneously involved in multiple social media sites, some of which share common social relationships. It is of great interest to uncover a shared community structure across these networks. However, in reality, users typically identify themselves with different usernames across social media sites. This creates a great difficulty in detecting the community structure. In this paper, we explore several approaches for community detection across online social networks with limited knowledge of username alignment across the networks. We refer to the known alignment of usernames as seeds. We investigate strategies for seed selection and its impact on networks with a different fraction of overlapping vertices. The goal is to study the interplay between network topologies and seed selection strategies, and to understand how it affects the detected community structure. We also propose several measures to assess the performance of community detection and use them to measure the quality of the detected communities in both Twitter-Twitter networks and Twitter-Instagram networks.
READ LESS

Summary

The discovery of community structure in networks is a problem of considerable interest in recent years. In online social networks, often times, users are simultaneously involved in multiple social media sites, some of which share common social relationships. It is of great interest to uncover a shared community structure across...

READ MORE

Cross-domain entity resolution in social media

Summary

The challenge of associating entities across multiple domains is a key problem in social media understanding. Successful cross-domain entity resolution provides integration of information from multiple sites to create a complete picture of user and community activities, characteristics, and trends. In this work, we examine the problem of entity resolution across Twitter and Instagram using general techniques. Our methods fall into three categories: profile, content, and graph based. For the profile-based methods, we consider techniques based on approximate string matching. For content-based methods, we perform author identification. Finally, for graph-based methods, we apply novel cross-domain community detection methods and generate neighborhood-based features. The three categories of methods are applied to a large graph of users in Twitter and Instagram to understand challenges, determine performance, and understand fusion of multiple methods. Final results demonstrate an equal error rate less than 1%.
READ LESS

Summary

The challenge of associating entities across multiple domains is a key problem in social media understanding. Successful cross-domain entity resolution provides integration of information from multiple sites to create a complete picture of user and community activities, characteristics, and trends. In this work, we examine the problem of entity resolution...

READ MORE

Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. These problems become more difficult as the background graph grows larger and noisier and the coordination patterns become more subtle. In this paper, we discuss the computational challenges of a statistical framework designed to address this cross-mission challenge. The statistical framework is based on spectral analysis of the graph data, and three partitioning methods are evaluated for computing the principal eigenvector of the graph's residuals matrix. While a standard one-dimensional partitioning technique enables this computation for up to four billion vertices, the communication overhead prevents this method from being used for even larger graphs. Recent two-dimensional partitioning methods are shown to have much more favorable scaling properties. A data-dependent partitioning method, which has the best scaling performance, is also shown to improve computation time even as a graph changes over time, allowing amortization of the upfront cost.
READ LESS

Summary

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network...

READ MORE

Efficient anomaly detection in dynamic, attributed graphs: emerging phenomena and big data

Published in:
ISI 2013: IEEE Int. Conf. on Intelligence and Security Informatics, 4-7 June 2013.

Summary

When working with large-scale network data, the interconnected entities often have additional descriptive information. This additional metadata may provide insight that can be exploited for detection of anomalous events. In this paper, we use a generalized linear model for random attributed graphs to model connection probabilities using vertex metadata. For a class of such models, we show that an approximation to the exact model yields an exploitable structure in the edge probabilities, allowing for efficient scaling of a spectral framework for anomaly detection through analysis of graph residuals, and a fast and simple procedure for estimating the model parameters. In simulation, we demonstrate that taking into account both attributes and dynamics in this analysis has a much more significant impact on the detection of an emerging anomaly than accounting for either dynamics or attributes alone. We also present an analysis of a large, dynamic citation graph, demonstrating that taking additional document metadata into account emphasizes parts of the graph that would not be considered significant otherwise.
READ LESS

Summary

When working with large-scale network data, the interconnected entities often have additional descriptive information. This additional metadata may provide insight that can be exploited for detection of anomalous events. In this paper, we use a generalized linear model for random attributed graphs to model connection probabilities using vertex metadata. For...

READ MORE

Link prediction methods for generating speaker content graphs

Published in:
ICASSP 2013, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 25-31 May 2013.

Summary

In a speaker content graph, vertices represent speech signals and edges represent speaker similarity. Link prediction methods calculate which potential edges are most likely to connect vertices from the same speaker; those edges are included in the generated speaker content graph. Since a variety of speaker recognition tasks can be performed on a content graph, we provide a set of metrics for evaluating the graph's quality independently of any recognition task. We then describe novel global and incremental algorithms for constructing accurate speaker content graphs that outperform the existing k nearest neighbors link prediction method. We evaluate those algorithms on a NIST speaker recognition corpus.
READ LESS

Summary

In a speaker content graph, vertices represent speech signals and edges represent speaker similarity. Link prediction methods calculate which potential edges are most likely to connect vertices from the same speaker; those edges are included in the generated speaker content graph. Since a variety of speaker recognition tasks can be...

READ MORE

Large-scale community detection on speaker content graphs

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 25-31 May 2013.

Summary

We consider the use of community detection algorithms to perform speaker clustering on content graphs built from large audio corpora. We survey the application of agglomerative hierarchical clustering, modularity optimization methods, and spectral clustering as well as two random walk algorithms: Markov clustering and Infomap. Our results on graphs built from the NIST 2005+2006 and 2008+2010 Speaker Recognition Evaluations (SREs) provide insight into both the structure of the speakers present in the data and the intricacies of the clustering methods. In particular, we introduce an additional parameter to Infomap that improves its clustering performance on all graphs. Lastly, we also develop an automatic technique to purify the neighbors of each node by pruning away unnecessary edges.
READ LESS

Summary

We consider the use of community detection algorithms to perform speaker clustering on content graphs built from large audio corpora. We survey the application of agglomerative hierarchical clustering, modularity optimization methods, and spectral clustering as well as two random walk algorithms: Markov clustering and Infomap. Our results on graphs built...

READ MORE

RECOG: Recognition and Exploration of Content Graphs

Published in:
Pacific Vision, 26 February - March 1, 2013.

Summary

We present RECOG (Recognition and Exploration of COntent Graphs), a system for visualizing and interacting with speaker content graphs constructed from large data sets of speech recordings. In a speaker content graph, nodes represent speech signals and edges represent speaker similarity. First, we describe a layout algorithm that optimizes content graphs for ease of navigability. We then present an interactive tool set that allows an end user to find and explore interesting occurrences in the corpus. We also present a tool set that allows a researcher to visualize the shortcomings of current content graph generation algorithms. RECOG's layout and toolsets were implemented as Gephi plugins [1].
READ LESS

Summary

We present RECOG (Recognition and Exploration of COntent Graphs), a system for visualizing and interacting with speaker content graphs constructed from large data sets of speech recordings. In a speaker content graph, nodes represent speech signals and edges represent speaker similarity. First, we describe a layout algorithm that optimizes content...

READ MORE

Social network analysis with content and graphs

Published in:
Lincoln Laboratory Journal, Vol. 20, No. 1, 2013, pp. 62-81.

Summary

Social network analysis has undergone a renaissance with the ubiquity and quantity of content from social media, web pages, and sensors. This content is a rich data source for constructing and analyzing social networks, but its enormity and unstructured nature also present multiple challenges. Work at Lincoln Laboratory is addressing the problems in constructing networks from unstructured data, analyzing the community structure of a network, and inferring information from networks. Graph analytics have proven to be valuable tools in solving these challenges. Through the use of these tools, Laboratory researchers have achieved promising results on real-world data. A sampling of these results are presented in this article.
READ LESS

Summary

Social network analysis has undergone a renaissance with the ubiquity and quantity of content from social media, web pages, and sensors. This content is a rich data source for constructing and analyzing social networks, but its enormity and unstructured nature also present multiple challenges. Work at Lincoln Laboratory is addressing...

READ MORE