Publications

Refine Results

(Filters Applied) Clear All

HPC-VMs: virtual machines in high performance computing systems

Published in:
HPEC 2012: IEEE Conf. on High Performance Extreme Computing, 10-12 September 2012.

Summary

The concept of virtual machines dates back to the 1960s. Both IBM and MIT developed operating system features that enabled user and peripheral time sharing, the underpinnings of which were early virtual machines. Modern virtual machines present a translation layer of system devices between a guest operating system and the host operating system executing on a computer system, while isolating each of the guest operating systems from each other. In the past several years, enterprise computing has embraced virtual machines to deploy a wide variety of capabilities from business management systems to email server farms. Those who have adopted virtual deployment environments have capitalized on a variety of advantages including server consolidation, service migration, and higher service reliability. But they have also ended up with some challenges including a sacrifice in performance and more complex system management. Some of these advantages and challenges also apply to HPC in virtualized environments. In this paper, we analyze the effectiveness of using virtual machines in a high performance computing (HPC) environment. We propose adding some virtual machine capability to already robust HPC environments for specific scenarios where the productivity gained outweighs the performance lost for using virtual machines. Finally, we discuss an implementation of augmenting virtual machines into the software stack of a HPC cluster, and we analyze the affect on job launch time of this implementation.
READ LESS

Summary

The concept of virtual machines dates back to the 1960s. Both IBM and MIT developed operating system features that enabled user and peripheral time sharing, the underpinnings of which were early virtual machines. Modern virtual machines present a translation layer of system devices between a guest operating system and the...

READ MORE

Scalable cryptographic authentication for high performance computing

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that uses web services among nodes in the HPC may need to perform authentication between nodes of the same job or a job may need to reach out to data sources outside the HPC. Traditional authentication mechanisms such as passwords or digital certificates encounter issues with the distributed and potentially disconnected nature of HPC systems. Distributing and storing plain-text passwords or cryptographic keys among nodes in a HPC system without special protection is a poor security practice. Systems that reach back to the user's terminal for access to the authenticator are possible, but only in fully interactive supercomputing where connectivity to the user's terminal can be guaranteed. Point solutions can be enabled for these use cases, such as software-based role or self-signed certificates, however they require significant expertise in digital certificates to configure. A more general solution is called for that is both secure and easy to use. This paper presents an overview of a solution implemented on the interactive, on-demand LLGrid computing system at MIT Lincoln Laboratory and its use to solve one such authentication problem.
READ LESS

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that...

READ MORE

Driving big data with big compute

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.
READ LESS

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the...

READ MORE

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language "SQL" databases). D4M allows linear algebra to be readily applied to databases. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. This work describes the D4M technology and its application and performance.
READ LESS

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but...

READ MORE

Rapid prototyping of radar algorithms

Author:
Published in:
IEEE Sig. Proc. Mag., Vol. 26, No. 6, November 2009, pp. 158-162.

Summary

Rapid prototyping of advanced signal processing algorithms is critical to developing new radars. Signal processing engineers usually use high level languages like MATLAB, IDL, or Python to develop advanced algorithms and to determine the optimal parameters for these algorithms. Many of these algorithms have very long execution times due to computational complexity and/or very large data sets, which hinders an efficient engineering development workflow. That is, signal processing engineers must wait hours, or even days, to get the results of the current algorithm, parameters, and data set before making changes and refinements for the next iteration. In the meantime, the engineer may have thought of several more permutations that he or she wants to test.
READ LESS

Summary

Rapid prototyping of advanced signal processing algorithms is critical to developing new radars. Signal processing engineers usually use high level languages like MATLAB, IDL, or Python to develop advanced algorithms and to determine the optimal parameters for these algorithms. Many of these algorithms have very long execution times due to...

READ MORE

High-productivity software development with pMATLAB

Published in:
Comput. Sci. Eng., Vol. 11, No. 1, January/February 2009, pp. 75-79.

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.
READ LESS

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.

READ MORE

Parallel and Distributed Processing

Author:
Published in:
High Performance Embedded Computing Handbook, Chapter 18

Summary

This chapter discusses parallel and distributed programming technologies for high performance embedded systems. Computational or memory constraints can be overcome with parallel processing. The primary goal of parallel processing is to improve performance by distributing computation across multiple processors or increasing dataset sizes by distributing data across multiple processors’ memory. The typical programmer has little to no experience writing programs that run on multiple processors. The transition from serial to parallel programming requires significant changes in the programmer’s way of thinking. For example, the programmer must worry about how to distribute data and computation across multiple processors to maximize performance and how to synchronize and communicate between processors. Although most programmers will likely admit to having no experience with parallel programming, many have indeed had exposure to a rudimentary type in the form of threads. A typical threaded program starts execution as a single thread.
READ LESS

Summary

This chapter discusses parallel and distributed programming technologies for high performance embedded systems. Computational or memory constraints can be overcome with parallel processing. The primary goal of parallel processing is to improve performance by distributing computation across multiple processors or increasing dataset sizes by distributing data across multiple processors’ memory...

READ MORE

Radar Signal Processing: An Example of High Performance Embedded Computing

Published in:
High Performance Embedded Computing Handbook, Chapter 6

Summary

This chapter focuses on the computational complexity of the front-end of the surface moving-target indication (SMTI) radar application. SMTI radars can require over one trillion operations per second of computation for wideband systems. The adaptive beamforming performed in SMTI radars is one of the major computational complexity drivers. The goal of the SMTI radar is to process the received signals to detect targets while rejecting clutter returns and noise. The radar must also mitigate interference from unintentional sources such as RF systems transmitting in the same band and from jammers that may be intentionally trying to mask targets. The pulse compression stage filters the data to concentrate the signal energy of a relatively long transmitted radar pulse into a short pulse response. The relative range rate between the radar and the ground along the line of sight of the sidelobe may be the same as range rate of the target detected in the mainbeam.
READ LESS

Summary

This chapter focuses on the computational complexity of the front-end of the surface moving-target indication (SMTI) radar application. SMTI radars can require over one trillion operations per second of computation for wideband systems. The adaptive beamforming performed in SMTI radars is one of the major computational complexity drivers. The goal...

READ MORE

Benchmarking the MIT LL HPCMP DHPI system

Published in:
Annual High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture of the capabilities of the system. The system acceptance test for MIT LL HPCMP DHPI hardware involved an array of benchmarks that exercised each of the components of the memory hierarchy, the scheduler, and the disk arrays. These benchmarks isolated the components to verify the functionality and performance of the system, and several system issues were discovered and rectified by using these benchmarks. The memory hierarchy was evaluated using the HPC Challenge benchmark suite, which is comprised of the following benchmarks: High Performance Linpack (HPL, also known as Top 500), Fast Fourier Transform (FFT), STREAM, RandomAccess, and Effective Bandwidth. The compute nodes' Random Array of Independent Disks (RAID) arrays were evaluated with the Iozone benchmark. Finally, the scheduler and the reliability of the entire system were tested using both the HPC Challenge suite and the Iozone benchmark. For example executing the HPC Challenge benchmark suite on 416 processors, the system was able to achieve 1.42 TFlops (HPL), 34.7 GFlops (FFT), 1.24 TBytes/sec (STREAM Triad), and 0.16 GUPS (RandomAccess). This paper describes the components of the MIT Lincoln Laboratory HPCMP DHPI system, including its memory hierarchy. We present the HPC Challenge benchmark suite and Iozone benchmark and describe how each of the component benchmarks stress various components of the TX-2500 system. The results of the benchmarks are discussed, and the implications they have on the performance of the system. We conclude with a presentation of the findings.
READ LESS

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture...

READ MORE

Technical challenges of supporting interactive HPC

Published in:
Ann. High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Summary

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with on-demand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access. Keywords: grid computing, on-demand, interactive high performance computing, cluster computing, parallel MATLAB.
READ LESS

Summary

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation...

READ MORE