Publications

Refine Results

(Filters Applied) Clear All

Bringing physical construction and real-world data collection into a massively open online course (MOOC)

Summary

This Work-In-Progress paper details the process and lessons learned when converting a hands-on engineering minicourse to a scalable, self-paced Massively Open Online Course (MOOC). Online courseware has been part of academic and industry training and learning for decades. Learning activities in online courses strive to mimic in-person delivery by including lectures, homework assignments, software exercises and exams. While these instructional activities provide "theory and practice" for many disciplines, engineering courses often require hands-on activities with physical tools, devices and equipment. To accommodate the need for this type of learning, MIT Lincoln Laboratory's "Build A Small Radar" (BSR) course was used to explore teaching and learning strategies that support the inclusion of physical construction and real world data collection in a MOOC. These tasks are encountered across a range of engineering disciplines and the methods illustrated here are easily generalized to the learning experiences in engineering and science disciplines.
READ LESS

Summary

This Work-In-Progress paper details the process and lessons learned when converting a hands-on engineering minicourse to a scalable, self-paced Massively Open Online Course (MOOC). Online courseware has been part of academic and industry training and learning for decades. Learning activities in online courses strive to mimic in-person delivery by including...

READ MORE

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

Static graph challenge: subgraph isomorphism

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
READ LESS

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges...

READ MORE

Performance measurements of supercomputing and cloud storage solutions

Summary

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an opensource, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.
READ LESS

Summary

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in...

READ MORE

Fabrication security and trust of domain-specific ASIC processors

Summary

Application specific integrated circuits (ASICs) are commonly used to implement high-performance signal-processing systems for high-volume applications, but their high development costs and inflexible nature make ASICs inappropriate for algorithm development and low-volume DoD applications. In addition, the intellectual property (IP) embedded in the ASIC is at risk when fabricated in an untrusted foundry. Lincoln Laboratory has developed a flexible signal-processing architecture to implement a wide range of algorithms within one application domain, for example radar signal processing. In this design methodology, common signal processing kernels such as digital filters, fast Fourier transforms (FFTs), and matrix transformations are implemented as optimized modules, which are interconnected by a programmable wiring fabric that is similar to the interconnect in a field programmable gate array (FPGA). One or more programmable microcontrollers are also embedded in the fabric to sequence the operations. This design methodology, which has been termed a coarse-grained FPGA, has been shown to achieve a near ASIC level of performance. In addition, since the signal processing algorithms are expressed in firmware that is loaded at runtime, the important application details are protected from an unscrupulous foundry.
READ LESS

Summary

Application specific integrated circuits (ASICs) are commonly used to implement high-performance signal-processing systems for high-volume applications, but their high development costs and inflexible nature make ASICs inappropriate for algorithm development and low-volume DoD applications. In addition, the intellectual property (IP) embedded in the ASIC is at risk when fabricated in...

READ MORE

Wind turbine interference mitigation using a waveform diversity radar

Summary

Interference from the proliferation of wind turbines is becoming a problem for ground-based medium-to-high pulse repetition frequency (PRF) pulsed–Doppler air surveillance radars. This paper demonstrates that randomizing some parameters of the transmit waveform from pulse to pulse, a filter can be designed to suppress both the wind turbine interference and the ground clutter. Furthermore, a single coherent processing interval (CPI) is sufficient to make an unambiguous range measurement. Therefore, multiple CPIs are not needed for range disambiguation, as in the staggered PRFs techniques. First, we consider a waveform with fixed PRF but diverse (random) initial phase applied to each transmit pulse. Second, we consider a waveform with diverse (random) PRF. The theoretical results are validated through simulations and analysis of experimental data. Clutter-plus-interference suppression and range disambiguation in a single CPI may be attractive to the Federal Aviation Administration and coastal radars.
READ LESS

Summary

Interference from the proliferation of wind turbines is becoming a problem for ground-based medium-to-high pulse repetition frequency (PRF) pulsed–Doppler air surveillance radars. This paper demonstrates that randomizing some parameters of the transmit waveform from pulse to pulse, a filter can be designed to suppress both the wind turbine interference and...

READ MORE

In-storage embedded accelerator for sparse pattern processing

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing, bioinformatics, subgraph matching, machine learning, and graph processing. One slice of our prototype accelerator is capable of handling up to 1TB of data, and experiments show that it can outperform C/C++ software solutions on a 16-core system at a fraction of the power and cost; an optimized version of the accelerator can match the performance of a 48-core server.
READ LESS

Summary

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing, bioinformatics, subgraph matching, machine learning, and graph processing. One slice of our prototype accelerator is capable...

READ MORE

Novel graph processor architecture, prototype system, and results

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a rethinking of parallel architectures for graph problems. Our processor utilizes innovations that include a sparse matrix-based graph instruction set, a cacheless memory system, accelerator-based architecture, a systolic sorter, high-bandwidth multidimensional toroidal communication network, and randomized communications. A field-programmable gate array (FPGA) prototype of the new graph processor has been developed with significant performance enhancement over conventional processors in graph computational throughput.
READ LESS

Summary

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a rethinking of parallel architectures for graph problems. Our processor utilizes innovations that include a sparse matrix-based graph...

READ MORE

Application of the Fornasini-Marchesini first model to data collected on a complex target model

Summary

This work describes the computation of scatterers that lay on the body of a real target which are depicted in radar images. A novelty of the approach is the target echoes collected by the radar are formulated into the first Fornasini-Marchesini (F-M) state space model to compute poles that give rise to the scatterer locations in the two-dimensional (2-D) space. Singular value decomposition carried out on the data provides state matrices that capture the dynamics of the target. Furthermore, eigenvalues computed from the state transition matrices provide range and cross-range locations of the scatterers that exhibit the target silhouette in 2-D space. The maximum likelihood function is formulated with the state matrices to obtain an iterative expression for the Fisher information matrix (FIM) from which posterior Cramer-Rao bounds associated with the various scatterers are derived. Effectiveness of the 2-D state-space technique is tested on static range data collected on a complex conical target model; its accuracy to extract target length is judged and compared with the physical measurements. Validity of the proposed 2-D state-space technique and the Cramer-Rao bounds are demonstrated through data collected on the target model.
READ LESS

Summary

This work describes the computation of scatterers that lay on the body of a real target which are depicted in radar images. A novelty of the approach is the target echoes collected by the radar are formulated into the first Fornasini-Marchesini (F-M) state space model to compute poles that give...

READ MORE

Very large graphs for information extraction (VLG) - summary of first-year proof-of-concept study

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel techniques for anomalous subgraph detection, building on tools in the signal processing research literature. This report documents the technical results of this effort. Two datasets--a snapshot of Thompson Reuters? Web of Science database and a stream of web proxy logs--were parsed, and graphs were constructed from the raw data. From the phenomena in these datasets, several algorithms were developed to model the dynamic graph behavior, including a preferential attachment mechanism with memory, a streaming filter to model a graph as a weighted average of its past connections, and a generalized linear model for graphs where connection probabilities are determined by additional side information or metadata. A set of metrics was also constructed to facilitate comparison of techniques. The study culminated in a demonstration of the algorithms on the datasets of interest, in addition to simulated data. Performance in terms of detection, estimation, and computational burden was measured according to the metrics. Among the highlights of this demonstration were the detection of emerging coauthor clusters in the Web of Science data, detection of botnet activity in the web proxy data after 15 minutes (which took 10 days to detect using state-of-the-practice techniques), and demonstration of the core algorithm on a simulated 1-billion-vertex graph using a commodity computing cluster.
READ LESS

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel...

READ MORE