Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

pMATLAB parallel MATLAB library

September 1, 2007

Journal Article

Author:

Nadya T. Bliss

…

Jeremy Kepner

Published in:

Int. J. High Perform. Comp. Appl., Vol. 21, No. 3, Fall 2007, pp. 336-359.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions), interpretive, interactive programming, and powerful mathematical graphics. The compute intensive nature of technical computing means that many MATLAB users have codes that can significantly benefit from the increased performance offered by parallel computing. pMatlab provides this capability by implementing parallel global array semantics using standard operator overloading techniques. The core data structure in pMatlab is a distributed numerical array whose distribution onto multiple processors is specified with a "map" construct. Communication operations between distributed arrays are abstracted away from the user and pMatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pMatlab is built on top of the MatlabMPI communication library and runs on any combination of heterogeneous systems that support MATLAB, which includes Windows, Linux, MacOS X, and SunOS. This paper describes the overall design and architecture of the pMatlab implementation. Performance is validated by implementing the HPC Challenge benchmark suite and comparing pMatlab performance with the equivalent C+MPI codes. These results indicate that pMatlab can often achieve comparable performance to C+MPI, usually at one tenth the code size. Finally, we present implementation data collected from a sample of real pMatlab applications drawn from the approximately one hundred users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to an efficient pMatlab code in about 3 hours while changing less than 1% of their code.

READ LESS

Summary

pMATLAB parallel MATLAB library

Benchmarking the MIT LL HPCMP DHPI system

June 19, 2007

Conference Paper

Author:

Albert I. Reuther

…

Published in:

Annual High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture of the capabilities of the system. The system acceptance test for MIT LL HPCMP DHPI hardware involved an array of benchmarks that exercised each of the components of the memory hierarchy, the scheduler, and the disk arrays. These benchmarks isolated the components to verify the functionality and performance of the system, and several system issues were discovered and rectified by using these benchmarks. The memory hierarchy was evaluated using the HPC Challenge benchmark suite, which is comprised of the following benchmarks: High Performance Linpack (HPL, also known as Top 500), Fast Fourier Transform (FFT), STREAM, RandomAccess, and Effective Bandwidth. The compute nodes' Random Array of Independent Disks (RAID) arrays were evaluated with the Iozone benchmark. Finally, the scheduler and the reliability of the entire system were tested using both the HPC Challenge suite and the Iozone benchmark. For example executing the HPC Challenge benchmark suite on 416 processors, the system was able to achieve 1.42 TFlops (HPL), 34.7 GFlops (FFT), 1.24 TBytes/sec (STREAM Triad), and 0.16 GUPS (RandomAccess). This paper describes the components of the MIT Lincoln Laboratory HPCMP DHPI system, including its memory hierarchy. We present the HPC Challenge benchmark suite and Iozone benchmark and describe how each of the component benchmarks stress various components of the TX-2500 system. The results of the benchmarks are discussed, and the implications they have on the performance of the system. We conclude with a presentation of the findings.

READ LESS

Summary

Benchmarking the MIT LL HPCMP DHPI system

Technical challenges of supporting interactive HPC

June 19, 2007

Conference Paper

Author:

Albert I. Reuther

…

Published in:

Ann. High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with on-demand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access. Keywords: grid computing, on-demand, interactive high performance computing, cluster computing, parallel MATLAB.

READ LESS

Summary

Technical challenges of supporting interactive HPC

PMatlab: parallel Matlab library for signal processing applications

April 1, 2007

Conference Paper

Author:

Nadya T. Bliss

…

Published in:

ICASSP, 32nd IEEE Int. Conf. on Acoustics Speech and Signal Processing, April 2007, pp. IV-1189 - IV-1192.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

MATLAB is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB'S popularity is based on availability of high-level abstractions leading to reduced code development time. Due to the compute intensive nature of scientific computing, these applications often require long running times and would benefit greatly from increased performance offered by parallel computing. pMatlab implements partitioned global address space (PGAS) support via standard operator overloading techniques. The core data structures in pMatlab are distributed arrays and maps, which simplify parallel programming by removing the need for explicit message passing. This paper presents the pMaltab design and results for the HPC Challenge benchmark suite. Additionally, two case studies of pMatlab use are described.

READ LESS

Summary

PMatlab: parallel Matlab library for signal processing applications

pMapper: automatic mapping of parallel Matlab programs

January 1, 2006

Conference Paper

Author:

Nadya T. Bliss

…

Published in:

Proc. of the HPCM (High Performance Computing Modernization), Users Group Conf., 2005, 27-30 June 2005, pp. 254-261.

Topic:

signal processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Algorithm implementation efficiency is key to delivering high-performance computing capabilities to demanding, high throughput DoD signal and image processing applications and simulations. Significant progress has been made in compiler optimization of serial programs, but many applications require parallel processing, which brings with it the difficult task of determining efficient mappings of algorithms to multiprocessor computers. The pMapper infrastructure addresses the problem of performance optimization of multistage MATLAB applications on parallel architectures. pMapper is an automatic performance tuning library written as a layer on top of pMatlab. pMatlab is a parallel Matlab toolbox that provides MATLAB users with global array semantics. While pMatlab abstracts the message-passing interface, the responsibility of generating maps for numerical arrays still falls on the user. A processor map for a numerical array is defined as an assignment of blocks of data to processing elements. Choosing the best mapping for a set of numerical arrays in a program is a nontrivial task that requires significant knowledge of programming languages, parallel computing, and processor architecture. pMapper automates the task of map generation, increasing the ease of programming and productivity. In addition to automating the mapping of parallel Matlab programs, pMapper could be used as a mapping tool for embedded systems. This paper addresses the design details of the pMapper infrastructure and presents preliminary results.

READ LESS

Summary

pMapper: automatic mapping of parallel Matlab programs

Multi-function phased array radar for U.S. civil-sector surveillance needs

October 24, 2005

Conference Paper

Author:

Mark E. Weber

…

Published in:

32nd Conf. on Radar Meteorology, 24-29 October 2005.

Topic:

aviation weather

R&D area:

Air Traffic Control

R&D group:

Summary

This paper is a concept study for possible future utilization of active electronically scanned radars to provide weather and aircraft surveillance functions in U.S. airspace. If critical technology costs decrease sufficiently, multi-function phased array radars might prove to be a cost effective alternative to current surveillance radars, since the number of required radars would be reduced, and maintenance and logistics infrastructure would be consolidated. A radar configuration that provides terminal-area and long-range aircraft surveillance and weather measurement capability is described and a radar network design that replicates or exceeds current airspace coverage is presented. Key technology issues are examined, including transmit-receive elements, overlapped sub-arrays, the digital beamformer, and weather and aircraft post-processing algorithms. We conclude by discussing implications relative to future national weather and non-cooperative aircraft target surveillance needs. The U.S. Government currently operates four separate ground based surveillance radar networks supporting public and aviation-specific weather warnings and advisories, and primary or "skin paint" aircraft surveillance. The separate networks are: (i) The 10-cm wavelength NEXRAD or WSR88-D (Serafin and Wilson, 2000) national-scale weather radar network. This is managed jointly by the National Weather Service (NWS), the Federal Aviation Administration (FAA), and the Department of Defense (DoD). (ii) The 5-cm wavelength Terminal Doppler Weather Radars (TDWR) (Evans and Turnbull, 1989) deployed at large airports to detect low-altitude wind-shear phenomena. (iii) The 10-cm wavelength Airport Surveillance Radars (ASR-9 and ASR-11) (Taylor and Brunins, 1985) providing terminal area primary aircraft surveillance and vertically averaged precipitation reflectivity measurements. (iv) The 30-cm wavelength Air Route Surveillance Radars (ARSR-1, 2, 3 and 4) (Weber, 2005) that provide national-scale primary aircraft surveillance. The latter three networks are managed primarily by the FAA, although the DoD operates a limited number of ASRs and has partial responsibility for maintenance of the ARSR network. In total there are 513 of these radars in the contiguous United States (CONUS), Alaska, and Hawaii. The agencies that maintain these radars conduct various "life extension" activities that are projected to extend their operational life to approximately 2020. At this time, there are no defined programs to acquire replacement radars. The NWS and FAA have recently begun exploratory research on the capabilities and technology issues related to the use of multi-function phased array radar (MPAR) as a possible replacement approach. A key concept is that the MPAR network could provide both weather and primary aircraft surveillance, thereby reducing the total number of ground-based radars. In addition, MPAR surveillance capabilities would likely exceed those of current operational radars, for example, by providing more frequent weather volume scans and by providing vertical resolution and height estimates for primary aircraft targets. Table 1 summarizes the capabilities of current U.S. surveillance radars. These are approximations and do not fully capture variations in capability as a function, for example, of range or operating mode. A key observation is that significant variation in update rates between the aircraft and weather surveillance functions are currently achieved by using fundamentally different antenna patterns--low-gain vertical "fan beams" for aircraft surveillance that are scanned in azimuth only, versus high-gain weather radar "pencil beams" that are scanned volumetrically at much lower update rates. Note also that, if expressed in consistent units, the power-aperture products of the weather radars significantly exceed those of the ASRs and ARSRs. In the next section, we present a concept design for MPAR and demonstrate that it can simultaneously provide the measurement capabilities summarized in Table 1. In Section 3 we present an MPAR network concept that duplicates the airspace coverage provided by the current multiple radar networks. Section 4 discusses technology issues and associated cost considerations. We conclude in Section 5 by discussing implications relative to future national weather and non-cooperative aircraft target surveillance needs.

READ LESS

Summary

Multi-function phased array radar for U.S. civil-sector surveillance needs

Automatic parallelization with pMapper

September 27, 2005

Conference Paper

Author:

Nadya T. Bliss

…

Published in:

2005 IEEE Int. Conf. on Cluster Computing, 27-30 September 2005, 46-51.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Algorithm implementation efficiency is key to delivering high-performance computing capabilities to demanding, high throughput signal and image processing applications and simulations. Significant progress has been made in optimization of serial programs, but many applications require parallel processing, which brings with it the difficult task of determining efficient mappings of algorithms. The pMapper infrastructure addresses the problem of performance optimization of multistage MATLAB applications on parallel architectures. pMapper is an automatic performance tuning library written as a layer on top of pMatlab: Parallel Matlab toolbox. While pMatlab abstracts the message-passing interface, the responsibility of mapping numerical arrays falls on the user. Choosing the best mapping for a set of numerical arrays is a nontrivial task that requires significant knowledge of programming languages, parallel computing, and processor architecture. pMapper automates the task of map generation. This abstract addresses the design details of pMapper and presents preliminary results.

READ LESS

Summary

Automatic parallelization with pMapper

Parallel out-of-core Matlab for extreme virtual memory (Abstract)

September 27, 2005

Conference Paper

Author:

Hahn G. Kim

…

Published in:

2005 IEEE Int. Conf. on Cluster Computing, 27-30 September 2005, p. 482 [abstract only].

Topic:

computer architecture

R&D area:

R&D group:

Embedded and Open Systems

Summary

Large data sets that cannot fit in memory can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for eXtreme Virtual Memory (pMatlab XVM) library adds out-of-core extensions to the Parallel Matlab (pMatlab) library. We have applied pMatlab XVM to the DARPA High Productivity Computing Systems? HPCchallenge FFT benchmark. The benchmark was run using several different implementations: C+MPI, pMatlab, pMatlab hand coded for out-of-core and pMatlab XVM. These experiments found 1) the performance of the C+MPI and pMatlab versions were comparable; 2) the out-of-core versions deliver 80% of the performance of the in-core versions; 3) the out-of-core versions were able to perform a 1 terabyte (64 billion point) FFT and 4) the pMatlab XVM program was smaller, easier to implement and verify, and more efficient than its hand coded equivalent. We are transitioning this technology to several DoD signal processing applications and plan to apply pMatlab XVM to the full HPCchallenge benchmark suite. Using next generation hardware, problems sizes a factor of 100 to 1000 times larger should be feasible.

READ LESS

Summary

Parallel out-of-core Matlab for extreme virtual memory (Abstract)

Introduction to parallel programming and pMatlab v2.0

September 12, 2005

Journal Article

Author:

Hahn G. Kim

…

Published in:

Lincoln Laboratory external web site, [2005].

Topic:

supercomputing

R&D area:

R&D group:

Embedded and Open Systems

Summary

The computational demands of software continue to outpace the capacities of processor and memory technologies, especially in scientific and engineering programs. One option to improve performance is parallel processing. However, despite decades of research and development, writing parallel programs continues to be difficult. This is especially the case for scientists and engineers who have limited backgrounds in computer science. MATLAB®, due to its ease of use compared to other programming languages like C and Fortran, is one of the most popular languages for implementing numerical computations, thus making it an excellent platform for developing an accessible parallel computing framework. The MIT Lincoln Laboratory has developed two libraries, pMatlab and MatlabMPI, that not only enables parallel programming with MATLAB in a simple fashion, accessible to non-computer scientists. This document will overview basic concepts in parallel programming and introduce pMatlab.

READ LESS

Summary

Introduction to parallel programming and pMatlab v2.0

Writing parallel parameter sweep applications with pMATLAB

September 12, 2005

Conference Paper

Author:

Hahn G. Kim

…

Published in:

Lincoln Laboratory external web site [2005].

Topic:

supercomputing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Parameter sweep applications execute the same piece of code multiple times with unique sets of input parameters. This type of application is extremely amenable to parallelization. This document describes how to parallelize parameter sweep applications with pMATLAB by introducting a simple serial parameter sweep applicaiton written in MATLAB, then parallelizing the application using pMATLAB.

READ LESS

Summary

Writing parallel parameter sweep applications with pMATLAB

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results