Publications

Refine Results

(Filters Applied) Clear All

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

Static graph challenge: subgraph isomorphism

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
READ LESS

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges...

READ MORE

Predicting exploitation of disclosed software vulnerabilities using open-source data

Published in:
3rd ACM Int. Workshop on Security and Privacy Analytics, IWSPA 2017, 24 March 2017.

Summary

Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social media posts that mention the vulnerability by name. However, these prior studies share multiple methodological shortcomings that infl ate predictive power of these approaches. We replicate key portions of the prior work, compare their approaches, and show how selection of training and test data critically affect the estimated performance of predictive models. The results of this study point to important methodological considerations that should be taken into account so that results reflect real-world utility.
READ LESS

Summary

Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are...

READ MORE

High-efficiency large-angle Pancharatnam phase deflector based on dual-twist design

Summary

We have previously shown through simulation that an optical beam deflector based on the Pancharatnam (geometric) phase can provide high efficiency with up to 80° deflection using a dual-twist structure for polarization-state control [Appl. Opt. 54, 10035 (2015)]. In this report, we demonstrate that its optical performance is as predicted and far beyond what could be expected for a conventional diffractive optical device. We provide details about construction and characterization of a ± 40° beam-steering device with 90% diffraction efficiency based on our dual-twist design at a 633nm wavelength.
READ LESS

Summary

We have previously shown through simulation that an optical beam deflector based on the Pancharatnam (geometric) phase can provide high efficiency with up to 80° deflection using a dual-twist structure for polarization-state control [Appl. Opt. 54, 10035 (2015)]. In this report, we demonstrate that its optical performance is as predicted...

READ MORE

SIAM data mining "brings it" to annual meeting

Summary

The Data Mining Activity Group is one of SIAM's most vibrant and dynamic activity groups. To better share our enthusiasm for data mining with the broader SIAM community, our activity group organized six minisymposia at the 2016 Annual Meeting. These minisymposia included 48 talks organized by 11 SIAM members.
READ LESS

Summary

The Data Mining Activity Group is one of SIAM's most vibrant and dynamic activity groups. To better share our enthusiasm for data mining with the broader SIAM community, our activity group organized six minisymposia at the 2016 Annual Meeting. These minisymposia included 48 talks organized by 11 SIAM members.

READ MORE

Causal inference under network interference: a framework for experiments on social networks

Author:
Published in:
Thesis (Ph.D.)--Harvard University, 2017.

Summary

No man is an island, as individuals interact and influence one another daily in our society. When social influence takes place in experiments on a population of interconnected individuals, the treatment on a unit may affect the outcomes of other units, a phenomenon known as interference. This thesis develops a causal framework and inference methodology for experiments where interference takes place on a network of influence (i.e. network interference). In this framework, the network potential outcomes serve as the key quantity and flexible building blocks for causal estimands that represent a variety of primary, peer, and total treatment effects. These causal estimands are estimated via principled Bayesian imputation of missing outcomes. The theory on the unconfoundedness assumptions leading to simplified imputation highlights the importance of including relevant network covariates in the potential outcome model. Additionally, experimental designs that result in balanced covariates and sizes across treatment exposure groups further improve the causal estimate, especially by mitigating potential outcome model mis-specification. The true potential outcome model is not typically known in real-world experiments, so the best practice is to account for interference and confounding network covariates through both balanced designs and model-based imputation. A full factorial simulated experiment is formulated to demonstrate this principle by comparing performance across different randomization schemes during the design phase and estimators during the analysis phase, under varying network topology and true potential outcome models. Overall, this thesis asserts that interference is not just a nuisance for analysis but rather an opportunity for quantifying and leveraging peer effects in real-world experiments.
READ LESS

Summary

No man is an island, as individuals interact and influence one another daily in our society. When social influence takes place in experiments on a population of interconnected individuals, the treatment on a unit may affect the outcomes of other units, a phenomenon known as interference. This thesis develops a...

READ MORE

Intersection and convex combination in multi-source spectral planted cluster detection

Published in:
IEEE Global Conf. on Signal and Information Processing, GlobalSIP, 7-9 December 2016.

Summary

Planted cluster detection is an important form of signal detection when the data are in the form of a graph. When there are multiple graphs representing multiple connection types, the method of aggregation can have significant impact on the results of a detection algorithm. This paper addresses the tradeoff between two possible aggregation methods: convex combination and intersection. For a spectral detection method, convex combination dominates when the cluster is relatively sparse in at least one graph, while the intersection method dominates in cases where it is dense across graphs. Experimental results confirm the theory. We consider the context of adversarial cluster placement, and determine how an adversary would distribute connections among the graphs to best avoid detection.
READ LESS

Summary

Planted cluster detection is an important form of signal detection when the data are in the form of a graph. When there are multiple graphs representing multiple connection types, the method of aggregation can have significant impact on the results of a detection algorithm. This paper addresses the tradeoff between...

READ MORE

Application of a resilience framework to military installations: a methodology for energy resilience business case decisions

Published in:
MIT Lincoln Laboratory Report TR-1216

Summary

The goal of the study was to develop and demonstrate an energy resilience framework at four DoD installations. This framework, predominantly focused on developing a business case, was established for broader application across the DoD. The methodology involves gathering data from an installation on critical energy load requirements, the energy costs and usage, quantifying the cost and performance of the existing energy resilience solution at the installation, and then conducting an analysis of alternatives to look at new system designs. Improvements in data collection at the installation level, as recommended in this report, will further increase the fidelity of future analysis and the accuracy of the recommendations. And most importantly, increased collaboration between the facility personnel and the mission operators at the installation will encourage holistic solutions that improve both the life cycle costs and the resilience of the installation's energy systems and supporting infrastructure.
READ LESS

Summary

The goal of the study was to develop and demonstrate an energy resilience framework at four DoD installations. This framework, predominantly focused on developing a business case, was established for broader application across the DoD. The methodology involves gathering data from an installation on critical energy load requirements, the energy...

READ MORE

Boston community energy study - zonal analysis for urban microgrids

Published in:
MIT Lincoln Laboratory Report TR-1201

Summary

Superstorm Sandy illustrated the economic and human impact that severe weather can have on urban areas such as New York City. While flooding and wind damaged or destroyed some of the energy infrastructure, all installed microgrids in the New York City region remained operational during Sandy, including those at Princeton University, Goldman Sachs, New York University, and Co-op City. The resilience provided by these microgrids sparked renewed interest in pursuing more microgrid deployments as means to increase resiliency throughout the nation and in the face of many potential threats including severe weather events, and potentially terrorism. MIT Lincoln Laboratory has been engaged with the Department of Homeland Security (DHS), the Department of Energy (DoE), and the City of Boston in this Community Energy Study to explore the potential for microgrid deployment within Boston's thriving neighborhoods. Using hourly simulated building energy data for every building in Boston, provided by the Sustainable Design Lab on MIT campus, MIT Lincoln Laboratory was able to develop an approach that can identify zones within the city where microgrids could be implemented with a high return on investment in terms of resiliency, offering both cost savings and social benefit in the face of grid outages. An important part of this approach leverages a microgrid optimization tool developed by Lawrence Berkeley National Laboratory, with whom the MIT Lincoln Laboratory is now collaborating on microgrid modeling work. Using the microgrid optimization tool, along with building energy use data, forty-two community microgrids were identified, including ten multiuser microgrids, ten energy justice microgrids, and twenty-two emergency microgrids.
READ LESS

Summary

Superstorm Sandy illustrated the economic and human impact that severe weather can have on urban areas such as New York City. While flooding and wind damaged or destroyed some of the energy infrastructure, all installed microgrids in the New York City region remained operational during Sandy, including those at Princeton...

READ MORE

Development of a real-time hardware-in-the-loop power systems simulation platform to evaluate commercial microgrid controllers

Summary

This report describes the development of a real-time hardware-in-the-loop (HIL) power system simulation platform to evaluate commercial microgrid controllers. The effort resulted in the successful demonstration of HIL simulation technology at a Technical Symposium organized by the Mass Clean Energy Center (CEC) for utility distribution system engineers, project developers, systems integrators, equipment vendors, academia, regulators, City of Boston officials, and Commonwealth officials. Actual microgrid controller hardware was integrated along with actual, commercial genset controller hardware in a particular microgrid configuration, which included dynamic loads, distributed energy resources (DERs), and conventional power sources. The end product provides the ability to quickly and cost-effectively assess the performance of different microgrid controllers as quantified by certain metrics, such as fuel consumption, power flow management precision at the point of common coupling, load-not-served (LNS) while islanded, peak-shaving kWh, and voltage stability. Additional applications include protection system testing and evaluation, distributed generation prime mover controller testing, integration and testing of distribution control systems, behavior testing and studies of DER controls, detailed power systems analysis, communications testing and integration, and implementation and evaluation of smart grid concepts. Microgrids and these additional applications promise to improve the reliability, resiliency, and efficiency of the nation's aging but critical power distribution systems. This achievement was a collaborative effort between MIT Lincoln Laboratory and industry microgrid controller manufacturers. This work was sponsored by the Department of Homeland Security (DHS), Science and Technology Directorate (S&T) and the Department of Energy (DOE) Office of Electricity Delivery and Energy Reliability.
READ LESS

Summary

This report describes the development of a real-time hardware-in-the-loop (HIL) power system simulation platform to evaluate commercial microgrid controllers. The effort resulted in the successful demonstration of HIL simulation technology at a Technical Symposium organized by the Mass Clean Energy Center (CEC) for utility distribution system engineers, project developers, systems...

READ MORE