Publications

Refine Results

(Filters Applied) Clear All

Hyperscaling internet graph analysis with D4M on the MIT SuperCloud

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for "at scale" algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.
READ LESS

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of...

READ MORE

Large-scale Bayesian kinship analysis

Summary

Kinship prediction in forensics is limited to first degree relatives due to the small number of short tandem repeat loci characterized. The Genetic Chain Rule for Probabilistic Kinship Estimation can leverage large panels of single nucleotide polymorphisms (SNPs) or sets of sequence linked SNPs, called haploblocks, to estimate more distant relationships between individuals. This method uses allele frequencies and Markov Chain Monte Carlo methods to determine kinship probabilities. Allele frequencies are a crucial input to this method. Since these frequencies are estimated from finite populations and many alleles are rare, a Bayesian extension to the algorithm has been developed to determine credible intervals for kinship estimates as a function of the certainty in allele frequency estimates. Generation of sufficiently large samples to accurately estimate credible intervals can take significant computational resources. In this paper, we leverage hundreds of compute cores to generate large numbers of Dirichlet random samples for Bayesian kinship prediction. We show that it is possible to generate 2,097,152 random samples on 32,768 cores at a rate of 29.68 samples per second. The ability to generate extremely large number of samples enables the computation of more statistically significant results from a Bayesian approach to kinship analysis.
READ LESS

Summary

Kinship prediction in forensics is limited to first degree relatives due to the small number of short tandem repeat loci characterized. The Genetic Chain Rule for Probabilistic Kinship Estimation can leverage large panels of single nucleotide polymorphisms (SNPs) or sets of sequence linked SNPs, called haploblocks, to estimate more distant...

READ MORE

Interactive supercomputing on 40,000 cores for machine learning and data analysis

Summary

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges – in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.
READ LESS

Summary

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as...

READ MORE

GraphChallenge.org: raising the bar on graph analytic performance

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. Graph Challenge 2017 received 22 submissions by 111 authors from 36 organizations. The submissions highlighted graph analytic innovations in hardware, software, algorithms, systems, and visualization. These submissions produced many comparable performance measurements that can be used for assessing the current state of the art of the field. There were numerous submissions that implemented the triangle counting challenge and resulted in over 350 distinct measurements. Analysis of these submissions show that their execution time is a strong function of the number of edges in the graph, Ne, and is typically proportional to N4=3 e for large values of Ne. Combining the model fits of the submissions presents a picture of the current state of the art of graph analysis, which is typically 108 edges processed per second for graphs with 108 edges. These results are 30 times faster than serial implementations commonly used by many graph analysts and underscore the importance of making these performance benefits available to the broader community. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

A parallel implementation of FANO using OpenMP and MPI

Published in:
IEEE High Performance Extreme Computing Conf., HPEC, 25-27 September 2018.

Summary

We present a parallel implementation of the Fast Accurate NURBS Optimization (FANO) program using OpenMP and MPI. The software is used for designing imaging freeform optical systems comprised of NURBS surfaces. An important step in the design process is the optimization of the shape and position of the optical surfaces within the optical system. FANO uses the Levenberg-Marquardt (LM) algorithm for minimization of the merit function. The parallelization of the code is achieved without modifying readily available commercial or open source implementations of the LM algorithm. Instead, MPI instructions are being used to distribute the computation of the Jacobian over multiple nodes, each of which performs the computationally intensive task of raytracing. The results from the raytracing are collected on the master and used for calculating the values of the variable parameters for the next iteration. Speed increases of ~100x and more are possible when running on the cluster of the MIT Lincoln Laboratory Super Computing Center (LLSC).
READ LESS

Summary

We present a parallel implementation of the Fast Accurate NURBS Optimization (FANO) program using OpenMP and MPI. The software is used for designing imaging freeform optical systems comprised of NURBS surfaces. An important step in the design process is the optimization of the shape and position of the optical surfaces...

READ MORE

TabulaROSA: tabular operating system architecture for massively parallel heterogeneous compute engines

Summary

The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and tracks the resources of the compute engines and is akin to a database management system. To explore the idea of using a database in an operating system role, this work defines key operating system functions in terms of rigorous mathematical semantics (associative array algebra) that are directly translatable into database operations. These operations possess a number of mathematical properties that are ideal for parallel operating systems by guaranteeing correctness over a wide range of parallel operations. The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables.
READ LESS

Summary

The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In...

READ MORE

Measuring the impact of Spectre and Meltdown

Summary

The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we measure the performance impact on several workloads relevant to HPC systems. We show that the impact can be significant on both synthetic and realistic workloads. We also show that the performance penalties are difficult to avoid even in dedicated systems where security is a lesser concern.
READ LESS

Summary

The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we...

READ MORE

Simulation approach to sensor placement using Unity3D

Summary

3D game simulation engines have demonstrated utility in the areas of training, scientific analysis, and knowledge solicitation. This paper will make the case for the use of 3D game simulation engines in the field of sensor placement optimization. Our study used a series of parallel simulations in the Unity3D simulation framework to answer the questions: how many sensors of various modalities are required and where they should be placed to meet a desired threat detection threshold? The result is a framework that not only answers this sensor placement question, but can be easily expanded to differing optimization criteria as well as answer how a particular configuration responds to differing crowd flows or informed/non-informed adversaries. Additionally, we demonstrate the scalability of this framework by running parallel instances on a supercomputing grid and illustrate the processing speed gained.
READ LESS

Summary

3D game simulation engines have demonstrated utility in the areas of training, scientific analysis, and knowledge solicitation. This paper will make the case for the use of 3D game simulation engines in the field of sensor placement optimization. Our study used a series of parallel simulations in the Unity3D simulation...

READ MORE

Colorization of H&E stained tissue using deep learning

Published in:
40th Int. Conf. of the IEEE Engineering in Medicine and Biology Society, EMBC, 17-21 July 2018.

Summary

Histopathology is a critical tool in the diagnosis and stratification of cancer. Digital Pathology involves the scanning of stained and fixed tissue samples to produce high-resolution images that can be used for computer-aided diagnosis and research. A common challenge in digital pathology related to the quality and characteristics of staining, which can vary widely from center to center and also within the same institution depending on the age of the stain and other human factors. In this paper we examine the use of deep learning models for colorizing H&E stained tissue images and compare the results with traditional image processing/statistical approaches that have been developed for standardizing or normalizing histopathology images. We adapt existing deep learning models that have been developed for colorizing natural images and compare the results with models developed specifically for digital pathology. Our results show that deep learning approaches can standardize the colorization of H&E images. The performance as measured by the chi-square statistic shows that the deep learning approach can be nearly as good as current state-of-the art normalization methods.
READ LESS

Summary

Histopathology is a critical tool in the diagnosis and stratification of cancer. Digital Pathology involves the scanning of stained and fixed tissue samples to produce high-resolution images that can be used for computer-aided diagnosis and research. A common challenge in digital pathology related to the quality and characteristics of staining...

READ MORE

Lessons learned from a decade of providing interactive, on-demand high performance computing to scientists and engineers

Summary

For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively develop and share code projects across the globe. As the HPC community faces the challenges associated with guiding researchers from disciplines using high productivity interactive tools to effective use of HPC systems, it seems appropriate to revisit the assumptions surrounding the necessary skills required for access to large computational systems. For over a decade, MIT Lincoln Laboratory has been supporting interactive, on demand high performance computing by seamlessly integrating familiar high productivity tools to provide users with an increased number of design turns, rapid prototyping capability, and faster time to insight. In this paper, we discuss the lessons learned while supporting interactive, on-demand high performance computing from the perspectives of the users and the team supporting the users and the system. Building on these lessons, we present an overview of current needs and the technical solutions we are building to lower the barrier to entry for new users from the humanities, social, and biological sciences.
READ LESS

Summary

For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively...

READ MORE