Publications

Refine Results

(Filters Applied) Clear All

Graph matching via multi-scale heat diffusion

Author:
Published in:
IEEE Intl. Conf. on Big Data, 9-12 December 2019.

Summary

We propose a novel graph matching algorithm that uses ideas from graph signal processing to match vertices of graphs using alternative graph representations. Specifically, we consider a multi-scale heat diffusion on the graphs to create multiple weighted graph representations that incorporate both direct adjacencies as well as local structures induced from the heat diffusion. Then a multi-objective optimization method is used to match vertices across all pairs of graph representations simultaneously. We show that our proposed algorithm performs significantly better than the algorithm that only uses the adjacency matrices, especially when the number of known latent alignments between vertices (seeds) is small. We test the algorithm on a set of graphs and show that at the low seed level, the proposed algorithm performs at least 15–35% better than the traditional graph matching algorithm.
READ LESS

Summary

We propose a novel graph matching algorithm that uses ideas from graph signal processing to match vertices of graphs using alternative graph representations. Specifically, we consider a multi-scale heat diffusion on the graphs to create multiple weighted graph representations that incorporate both direct adjacencies as well as local structures induced...

READ MORE

Uncovering human trafficking networks through text analysis

Author:
Published in:
2019 Grace Hopper Celebration, 1-4 October 2019.

Summary

Human trafficking is a form of modern-day slavery affecting an estimated 40 million victims worldwide, primarily through the commercial sexual exploitation of women and children. In the last decade, the advertising of victims has moved from the streets to websites on the Internet, providing greater efficiency and anonymity for sex traffickers. This shift has allowed traffickers to list their victims in multiple geographic areas simultaneously, while also improving operational security by using multiple methods of electronic communication with buyers; complicating the ability of law enforcement to disrupt these illicit organizations. In this presentation, we present a novel unsupervised template matching algorithm for analyzing and detecting complex organizations operating on adult service websites. We apply this method to a large corpus of adult service advertisements retrieved from backpage.com, and show that the networks identified through the algorithm match well with surrogate truth data derived from phone number networks in the same corpus. Further exploration of the results show that the proposed method provides deeper insights into the complex structures of sex trafficking organizations, not possible through networks derived from phone numbers alone. This method provides a powerful new capability for law enforcement to more completely identify and gather evidence about trafficking operations
READ LESS

Summary

Human trafficking is a form of modern-day slavery affecting an estimated 40 million victims worldwide, primarily through the commercial sexual exploitation of women and children. In the last decade, the advertising of victims has moved from the streets to websites on the Internet, providing greater efficiency and anonymity for sex...

READ MORE

On large-scale graph generation with validation of diverse triangle statistics at edges and vertices

Published in:
2018 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW, 21 May 2018.

Summary

Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to their randomness: the output of a desired graph analytic can only be compared to expected values and not exact ground truth. Nonstochastic Kronecker product graphs meet these design criteria for several graph analytics. Here we show that many flavors of triangle participation can be cheaply calculated while generating a Kronecker product graph.
READ LESS

Summary

Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to...

READ MORE

Command and control for multifunction phased array radar

Published in:
IEEE Trans. Geosci. Remote Sens., Vol. 55, No. 10, October 2017, pp. 5899-5912.

Summary

We discuss the challenge of managing the Multifunction Phased Array Radar (MPAR) timeline to satisfy the requirements of its multiple missions, with a particular focus on weather surveillance. This command and control (C2) function partitions the available scan time among these missions, exploits opportunities to service multiple missions simultaneously, and utilizes techniques for increasing scan rate where feasible. After reviewing the candidate MPAR architectures and relevant previous research, we describe a specific C2 framework that is consistent with a demonstrated active array architecture using overlapped subarrays to realize multiple, concurrent receive beams. Analysis of recently articulated requirements for near-airport and national-scale aircraft surveillance indicates that with this architecture, 40–60% of the MPAR scan timeline would be available for the high-fidelity weather observations currently provided by the Weather Service Radar (WSR-88D) network. We show that an appropriate use of subarray generated concurrent receive beams, in concert with previously documented, complementary techniques to increase the weather scan rate, could enable MPAR to perform full weather volume scans at a rate of 1 per minute. Published observing system simulation experiments, human-in-the-loop studies and radar-data assimilation experiments indicate that high-quality weather radar observations at this rate may significantly improve the lead time and reliability of severe weather warnings relative to current observation capabilities.
READ LESS

Summary

We discuss the challenge of managing the Multifunction Phased Array Radar (MPAR) timeline to satisfy the requirements of its multiple missions, with a particular focus on weather surveillance. This command and control (C2) function partitions the available scan time among these missions, exploits opportunities to service multiple missions simultaneously, and...

READ MORE

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

A linear algebra approach to fast DNA mixture analysis using GPUs

Published in:
HPEC 2017: IEEE Conf. on High Performance Extreme Computing, 12-14 September 2017.

Summary

Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide polymorphisms (SNPs) has utility for analysis of complex DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses significant computational challenges because the forensic analysis scales by the product of the loci count and number of DNA samples to be analyzed. In this paper, we discuss the implementation of a DNA sequence comparison algorithm by re-casting the algorithm in terms of linear algebra primitives. By developing an overloaded matrix multiplication approach to DNA comparisons, we can leverage advances in GPU hardware and algorithms for Dense Generalized Matrix-Multiply (DGEMM) to speed up DNA sample comparisons. We show that it is possible to compare 2048 unknown DNA samples with 20 million known samples in under 6 seconds using a NVIDIA K80 GPU.
READ LESS

Summary

Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis...

READ MORE

Development of a new inanimate class for the WSR-88D hydrometeor classification algorithm

Published in:
38th Conf. on Radar Meteorology, 27 August-1 September 2017.

Summary

The current implementation of the Hydrometeor Classification Algorithm (HCA) on the WSR-88D network contains two non-hydrometeor-based classes: ground clutter/anomalous propagation and biologicals. A number of commonly observed non-hydrometeor-based phenomena do not fall into either of these two HCA categories, but often are misclassified as ground clutter, biologicals, unknown, or worse yet, weather hydrometeors. Some of these phenomena include chaff, sea clutter, combustion debris and smoke, and radio frequency interference. In order to address this discrepancy, a new class (nominally named "inanimate") is being developed that encompasses many of these targets. Using this class, a distinction between non-biological and biological non-hydrometeor targets can be made and potentially separated into sub-classes for more direct identification. A discussion regarding the fuzzy logic membership functions, optimization of membership weights, and class restrictions is presented, with a focus on observations of highly stochastic differential phase estimates in all of the aforementioned targets. Recent attempts to separate the results into sub-classes using a support vector machine are presented, and examples of each target type are detailed. Details concerning eventual implementation into the WSR-88D radar product generator are addressed.
READ LESS

Summary

The current implementation of the Hydrometeor Classification Algorithm (HCA) on the WSR-88D network contains two non-hydrometeor-based classes: ground clutter/anomalous propagation and biologicals. A number of commonly observed non-hydrometeor-based phenomena do not fall into either of these two HCA categories, but often are misclassified as ground clutter, biologicals, unknown, or worse...

READ MORE

A new radio frequency interference filter for weather radars

Author:
Published in:
J. Atmos. Ocean. Technol., Vol. 34, No. 7, 1 July 2017, pp. 1393-1406.

Summary

A new radio frequency interference (RFI) filter algorithm for weather radars is proposed in the two-dimensional (2D) range-time/sample-time domain. Its operation in 2D space allows RFI detection at lower interference-to-noise or interference-to-signal ratios compared to filters working only in the sample-time domain while maintaining very low false alarm rates. Simulations and real weather radar data with RFI are used to perform algorithm comparisons. Results are consistent with theoretical considerations and show the 2D RFI filter to be a promising addition to the signal processing arsenal against interference with weather radars. Increased computational burden is the only drawback relative to filters currently used by operational systems.
READ LESS

Summary

A new radio frequency interference (RFI) filter algorithm for weather radars is proposed in the two-dimensional (2D) range-time/sample-time domain. Its operation in 2D space allows RFI detection at lower interference-to-noise or interference-to-signal ratios compared to filters working only in the sample-time domain while maintaining very low false alarm rates. Simulations...

READ MORE

Automated provenance analytics: a regular grammar based approach with applications in security

Published in:
9th Intl. Workshop on Theory and Practice of Provenance, TaPP, 22-23 June 2017.

Summary

Provenance collection techniques have been carefully studied in the literature, and there are now several systems to automatically capture provenance data. However, the analysis of provenance data is often left "as an exercise for the reader". The provenance community needs tools that allow users to quickly sort through large volumes of provenance data and identify records that require further investigation. By detecting anomalies in provenance data that deviate from established patterns, we hope to actively thwart security threats. In this paper, we discuss issues with current graph analysis techniques as applied to data provenance, particularly Frequent Subgraph Mining (FSM). Then we introduce Directed Acyclic Graph regular grammars (DAGr) as a model for provenance data and show how they can detect anomalies. These DAGr provide an expressive characterization of DAGs, and by using regular grammars as a formalism, we can apply results from formal language theory to learn the difference between "good" and "bad" provenance. We propose a restricted subclass of DAGr called deterministic Directed Acyclic Graph automata (dDAGa) that guarantees parsing in linear time. Finally, we propose a learning algorithm for dDAGa, inspired by Minimum Description Length for Grammar Induction.
READ LESS

Summary

Provenance collection techniques have been carefully studied in the literature, and there are now several systems to automatically capture provenance data. However, the analysis of provenance data is often left "as an exercise for the reader". The provenance community needs tools that allow users to quickly sort through large volumes...

READ MORE

Fabrication security and trust of domain-specific ASIC processors

Summary

Application specific integrated circuits (ASICs) are commonly used to implement high-performance signal-processing systems for high-volume applications, but their high development costs and inflexible nature make ASICs inappropriate for algorithm development and low-volume DoD applications. In addition, the intellectual property (IP) embedded in the ASIC is at risk when fabricated in an untrusted foundry. Lincoln Laboratory has developed a flexible signal-processing architecture to implement a wide range of algorithms within one application domain, for example radar signal processing. In this design methodology, common signal processing kernels such as digital filters, fast Fourier transforms (FFTs), and matrix transformations are implemented as optimized modules, which are interconnected by a programmable wiring fabric that is similar to the interconnect in a field programmable gate array (FPGA). One or more programmable microcontrollers are also embedded in the fabric to sequence the operations. This design methodology, which has been termed a coarse-grained FPGA, has been shown to achieve a near ASIC level of performance. In addition, since the signal processing algorithms are expressed in firmware that is loaded at runtime, the important application details are protected from an unscrupulous foundry.
READ LESS

Summary

Application specific integrated circuits (ASICs) are commonly used to implement high-performance signal-processing systems for high-volume applications, but their high development costs and inflexible nature make ASICs inappropriate for algorithm development and low-volume DoD applications. In addition, the intellectual property (IP) embedded in the ASIC is at risk when fabricated in...

READ MORE