Publications

Refine Results

(Filters Applied) Clear All

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

Performance measurements of supercomputing and cloud storage solutions

Summary

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an opensource, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.
READ LESS

Summary

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in...

READ MORE

Static graph challenge: subgraph isomorphism

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
READ LESS

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges...

READ MORE

Benchmarking data analysis and machine learning applications on the Intel KNL many-core processor

Summary

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and Octave. More recently, machine learning applications, such as the UC Berkeley Caffe deep learning framework, have become increasingly important to LLSC users. Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities. Our data analysis benchmarks of these application on the Intel KNL processor indicate that single-core double-precision generalized matrix multiply (DGEMM) performance on KNL systems has improved by ~3.5x compared to prior Intel Xeon technologies. Our data analysis applications also achieved ~60% of the theoretical peak performance. Also a performance comparison of a machine learning application, Caffe, between the two different Intel CPUs, Xeon E5 v3 and Xeon Phi 7210, demonstrated a 2.7x improvement on a KNL node.
READ LESS

Summary

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher...

READ MORE

Development of a new inanimate class for the WSR-88D hydrometeor classification algorithm

Published in:
38th Conf. on Radar Meteorology, 27 August-1 September 2017.

Summary

The current implementation of the Hydrometeor Classification Algorithm (HCA) on the WSR-88D network contains two non-hydrometeor-based classes: ground clutter/anomalous propagation and biologicals. A number of commonly observed non-hydrometeor-based phenomena do not fall into either of these two HCA categories, but often are misclassified as ground clutter, biologicals, unknown, or worse yet, weather hydrometeors. Some of these phenomena include chaff, sea clutter, combustion debris and smoke, and radio frequency interference. In order to address this discrepancy, a new class (nominally named "inanimate") is being developed that encompasses many of these targets. Using this class, a distinction between non-biological and biological non-hydrometeor targets can be made and potentially separated into sub-classes for more direct identification. A discussion regarding the fuzzy logic membership functions, optimization of membership weights, and class restrictions is presented, with a focus on observations of highly stochastic differential phase estimates in all of the aforementioned targets. Recent attempts to separate the results into sub-classes using a support vector machine are presented, and examples of each target type are detailed. Details concerning eventual implementation into the WSR-88D radar product generator are addressed.
READ LESS

Summary

The current implementation of the Hydrometeor Classification Algorithm (HCA) on the WSR-88D network contains two non-hydrometeor-based classes: ground clutter/anomalous propagation and biologicals. A number of commonly observed non-hydrometeor-based phenomena do not fall into either of these two HCA categories, but often are misclassified as ground clutter, biologicals, unknown, or worse...

READ MORE

Wind information requirements for NextGen operations, phase 5 report

Published in:
MIT Lincoln Laboratory Report ATC-439

Summary

NextGen applications with time-based control elements, such as required time of arrival (RTA) at a meter fix under 4D trajectory-based operations (4D-TBO)/time of arrival control (TOAC) procedures or assigned spacing goal between aircraft under Interval Management (IM) procedures, are subject to the quality of the atmospheric forecast utilized by participating aircraft. The work described in this report summarizes the major activities conducted in the current phase of this program which builds upon prior work. The major objectives were: 1. Support RTCA Special Committee-206 Aeronautical Information and Meteorological Data Link Services and co-chair a sub-group responsible for developing the document "Guidance for Data Linking Forecast and Real-Time Wind Information to Aircraft." 2. Analyze the performance of publicly available forecast as compared to in-situ reported atmospheric conditions, specifically comparing Global Forecast System (GFS) and High Resolution Rapid Refresh (HRRR) forecast data to recorded in-flight weather Meteorological Data Collection and Reporting System (MDCRS) data. 3. Analyze current and future Flight Management Systems (FMSs) to conduct operations at significantly lower altitudes than previous studies. 4. Evaluate potential sources of aircraft-derived winds to better support 4D-TBO activities. 5. Provide recommendations for high-value future work.
READ LESS

Summary

NextGen applications with time-based control elements, such as required time of arrival (RTA) at a meter fix under 4D trajectory-based operations (4D-TBO)/time of arrival control (TOAC) procedures or assigned spacing goal between aircraft under Interval Management (IM) procedures, are subject to the quality of the atmospheric forecast utilized by participating...

READ MORE

Flexible glucose sensors and fuel cells for bioelectronic implants

Published in:
IEEE 60th Int. Midwest Symp. on Circuits and Systems, MWSCAS, 6-9 August 2017.

Summary

Microfabrication techniques were developed to create flexible 24 um thick glucose sensors on polyimide substrates. Measurements of the sensor performance, recorded as voltage potential, were carried out for a range of glucose concentrations (0 – 8 mM) in physiological saline (0.1 M NaCl, pH 7.4). The sensors show rapid response times (seconds to stable potential) and good sensitivity in the 0 – 4 mM range. Additionally, we demonstrate that the sensors can operate as fuel cells, generating peak power levels up to 0.94 uW/cm2. Such flexible devices, which can be rolled up to increase surface area within a fixed volume, may enable ultra-low-power bio-electronic implants for glucose sensing or glucose energy harvesting in the future.
READ LESS

Summary

Microfabrication techniques were developed to create flexible 24 um thick glucose sensors on polyimide substrates. Measurements of the sensor performance, recorded as voltage potential, were carried out for a range of glucose concentrations (0 – 8 mM) in physiological saline (0.1 M NaCl, pH 7.4). The sensors show rapid response...

READ MORE

Implicitly-defined neural networks for sequence labeling

Published in:
Annual Meeting of Assoc. of Computational Lingusitics, 31 July 2017.

Summary

In this work, we propose a novel, implicitly defined neural network architecture and describe a method to compute its components. The proposed architecture forgoes the causality assumption previously used to formulate recurrent neural networks and allow the hidden states of the network to coupled together, allowing potential improvement on problems with complex, long-distance dependencies. Initial experiments demonstrate the new architecture outperforms both the Stanford Parser and a baseline bidirectional network on the Penn Treebank Part-of-Speech tagging task and a baseline bidirectional network on an additional artificial random biased walk task.
READ LESS

Summary

In this work, we propose a novel, implicitly defined neural network architecture and describe a method to compute its components. The proposed architecture forgoes the causality assumption previously used to formulate recurrent neural networks and allow the hidden states of the network to coupled together, allowing potential improvement on problems...

READ MORE

Polarization ratio determination with two identical linearly polarized antennas

Published in:
2017 IEEE AP-S Symp. on Antennas and Propagation and USNC Radio Science Meeting, 9-14 July 2017.

Summary

This paper describes a method for determining the complex polarization ratio using two identical, linearly polarized antennas. By Fourier transform analysis of s21 measurements with one of the antennas rotating about its axis a circular polarization ratio is derived which can be transformed into an equivalent linear polarization ratio. A linearly polarized reference antenna is not required. The technique was verified by electromagnetic simulations and illustrated by measurements in an anechoic chamber with two 3.3 GHz square patch antennas.
READ LESS

Summary

This paper describes a method for determining the complex polarization ratio using two identical, linearly polarized antennas. By Fourier transform analysis of s21 measurements with one of the antennas rotating about its axis a circular polarization ratio is derived which can be transformed into an equivalent linear polarization ratio. A...

READ MORE

A fully integrated broadband sub-mmWave chip-to-chip interconnect

Published in:
IEEE Trans. Microw. Theory Tech., Vol. 65, No. 7, July 2017, pp. 2373-86.

Summary

A new type of broadband link enabling extremely high-speed chip-to-chip communication is presented. The link is composed of fully integrated sub-mmWave on-chip traveling wave power couplers and a low-cost planar dielectric waveguide. This structure is based on a differentially driven half-mode substrate integrated waveguide supporting the first higher order hybrid microstrip mode. The cross-sectional width of the coupler structure is tapered in the direction of wave propagation to increase the coupling efficiency and maintain a large coupling bandwidth while minimizing its on-die size. A rectangular dielectric waveguide, constructed from Rogers Corporation R3006 material, is codesigned with the on-chip coupler structure to minimize coupling loss. The coupling structure achieves an average insertion loss of 4.8 dB from 220 to 270 GHz, with end-to-end link measurements presented. This system provides a packaging-friendly, cost effective, and high performance planar integration solution for ultrabroadband chip-to-chip communication utilizing millimeter waves.
READ LESS

Summary

A new type of broadband link enabling extremely high-speed chip-to-chip communication is presented. The link is composed of fully integrated sub-mmWave on-chip traveling wave power couplers and a low-cost planar dielectric waveguide. This structure is based on a differentially driven half-mode substrate integrated waveguide supporting the first higher order hybrid...

READ MORE