Publications

Refine Results

(Filters Applied) Clear All

PVTOL: providing productivity, performance, and portability to DoD signal processing applications on multicore processors

Published in:
DoD HPCMP 2008, High Performance Computing Modernization Program Users Group Conf., 14 July 2008, pp. 327-333.

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a simple API for moving data across these hierarchies. Functors encapsulate computational kernels; new functors can be easily developed using the PVTOL API and can be fused for more efficient computation. Existing computation and communication technologies that are optimized for various architectures are used to achieve high performance. PVTOL abstracts the details of the underlying processor architectures to provide portability. We are actively developing PVTOL for Intel, PowerPC and Cell architectures and intend to add support for more computational kernels on these architectures. FPGAs are becoming popular for accelerating computation in both the high performance computing (HPC) and high performance embedded computing (HPEC) communities. Integrated processor-FPGA technologies are now available from both HPC and HPEC vendors, e.g. Cray and Mercury Computer Systems. We plan to support FPGAs as co-processors in PVTOL. Finally, automated mapping technology has been demonstrated with pMatlab. We plan to begin implementing automated mapping in PVTOL next year. Similar to PVL, as PVTOL matures and is used in more projects at Lincoln, we plan to propose concepts demonstrated in PVTOL to HPEC-SI for adoption into future versions of VSIPL++.
READ LESS

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a...

READ MORE

pMATLAB parallel MATLAB library

Author:
Published in:
Int. J. High Perform. Comp. Appl., Vol. 21, No. 3, Fall 2007, pp. 336-359.

Summary

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions), interpretive, interactive programming, and powerful mathematical graphics. The compute intensive nature of technical computing means that many MATLAB users have codes that can significantly benefit from the increased performance offered by parallel computing. pMatlab provides this capability by implementing parallel global array semantics using standard operator overloading techniques. The core data structure in pMatlab is a distributed numerical array whose distribution onto multiple processors is specified with a "map" construct. Communication operations between distributed arrays are abstracted away from the user and pMatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pMatlab is built on top of the MatlabMPI communication library and runs on any combination of heterogeneous systems that support MATLAB, which includes Windows, Linux, MacOS X, and SunOS. This paper describes the overall design and architecture of the pMatlab implementation. Performance is validated by implementing the HPC Challenge benchmark suite and comparing pMatlab performance with the equivalent C+MPI codes. These results indicate that pMatlab can often achieve comparable performance to C+MPI, usually at one tenth the code size. Finally, we present implementation data collected from a sample of real pMatlab applications drawn from the approximately one hundred users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to an efficient pMatlab code in about 3 hours while changing less than 1% of their code.
READ LESS

Summary

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions)...

READ MORE

Benchmarking the MIT LL HPCMP DHPI system

Published in:
Annual High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture of the capabilities of the system. The system acceptance test for MIT LL HPCMP DHPI hardware involved an array of benchmarks that exercised each of the components of the memory hierarchy, the scheduler, and the disk arrays. These benchmarks isolated the components to verify the functionality and performance of the system, and several system issues were discovered and rectified by using these benchmarks. The memory hierarchy was evaluated using the HPC Challenge benchmark suite, which is comprised of the following benchmarks: High Performance Linpack (HPL, also known as Top 500), Fast Fourier Transform (FFT), STREAM, RandomAccess, and Effective Bandwidth. The compute nodes' Random Array of Independent Disks (RAID) arrays were evaluated with the Iozone benchmark. Finally, the scheduler and the reliability of the entire system were tested using both the HPC Challenge suite and the Iozone benchmark. For example executing the HPC Challenge benchmark suite on 416 processors, the system was able to achieve 1.42 TFlops (HPL), 34.7 GFlops (FFT), 1.24 TBytes/sec (STREAM Triad), and 0.16 GUPS (RandomAccess). This paper describes the components of the MIT Lincoln Laboratory HPCMP DHPI system, including its memory hierarchy. We present the HPC Challenge benchmark suite and Iozone benchmark and describe how each of the component benchmarks stress various components of the TX-2500 system. The results of the benchmarks are discussed, and the implications they have on the performance of the system. We conclude with a presentation of the findings.
READ LESS

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture...

READ MORE

Technical challenges of supporting interactive HPC

Published in:
Ann. High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Summary

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with on-demand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access. Keywords: grid computing, on-demand, interactive high performance computing, cluster computing, parallel MATLAB.
READ LESS

Summary

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation...

READ MORE

Integrated compensation network for low mutual coupling of planar microstrip antenna arrays

Published in:
IEEE Antennas and Propagation Society Int. Symp., 2007 Digest, 9-15 June 2007, pp. 1273-6.

Summary

The unavoidable presence of mutual coupling of antenna elements in an array limits the ability to transmit and receive signals concurrently [1]. In the absence of mutual coupling, it is conceivable although still difficult to transmit and receive at the same frequency at the same time, i.e., FM-CW radars. The reflection from the antenna, leakage through the circulator, and any other possible deleterious paths from the high power amplifier to the low noise amplifier must be cancelled or compensated for in some manner to keep the receiver linear. With a single antenna the signal and noise paths are correlated and therefore cancellation of the signal inherently eliminates the noise. However, in an array environment the mutual coupling of antenna elements cause noise from neighboring high power amplifiers to couple into each channel's receiver. While the signal coupling is coherent, the noise is uncorrelated to a degree that depends on the amplifier gain and noise figure. The use of a low mutual coupling antenna array is a critical element in operating systems in this manner.
READ LESS

Summary

The unavoidable presence of mutual coupling of antenna elements in an array limits the ability to transmit and receive signals concurrently [1]. In the absence of mutual coupling, it is conceivable although still difficult to transmit and receive at the same frequency at the same time, i.e., FM-CW radars. The...

READ MORE

Ultra-wideband step notch array using stripline feed

Published in:
IEEE Antennas and Propagation Society Int. Symp., 2007 Digest, 9-15 June 2007, pp. 3361-4.

Summary

Electronically scanned array (ESA) antennas capable of efficiently radiating over an octave of bandwidth provide system designs with more flexibility in multiple mode operation. Communication and radar bands occupy different frequency allocations and the growing research in Ultra-Wideband (UWB) communications make the use of a single ESA to cover these frequencies an area of interest. Array antennas constructed of tapered-slot antennas and TEM horns have been investigated and shown to operate successfully over an octave bandwidth. These antennas use vertical feeds which make them optimal for brick architectures, but less than desirable for tile architectures. Conventional notch antennas require a feed extending vertically away from the notch antenna which makes a flat 2-D connection between antennas difficult. In this work an Ultra-Wideband Step Notch Array (UWSNA) was designed for ESA applications. The array operates over a 6-12 GHz range using a flat, tile-based 2-D feed network making this array optimal for conformal applications with a minimum of vertical distance. Simulation results and measurements on a small prototype demonstrate the concept.
READ LESS

Summary

Electronically scanned array (ESA) antennas capable of efficiently radiating over an octave of bandwidth provide system designs with more flexibility in multiple mode operation. Communication and radar bands occupy different frequency allocations and the growing research in Ultra-Wideband (UWB) communications make the use of a single ESA to cover these...

READ MORE

Design of overlapped subarrays using an RFIC beamformer

Published in:
IEEE Antennas and Propagation Society Int. Symp., 2007 Digest, 9-15 June 2007, pp. 1791-4.

Summary

Electronically scanned arrays require a minimum number of controls, Nmin, given by the number of orthogonal beams that fill a prescribed scan sector. Most practical antenna arrays require considerably more than Nmin control elements, but overlapped subarray architectures can approach this theoretical limit. Figure 1 shows a block diagram of an overlapped subarray architecture. The overlapped subarray network produces a flattopped sector pattern with low sidelobes that suppress grating lobes outside of the main beam of the subarray pattern. Each radiating element of the array is connected to multiple subarrays, creating an overlapping geometry. It is possible to scan one beam, or a fixed set of contiguous beams, over the main sector of the subarray with a set of Nmin phase shifters. Alternatively, digital receivers can be connected to the Nmin subarrays and multiple simultaneous beams can be formed digitally. Digital subarray architectures using a combination of element level phase shifters and subarray level receivers makes it possible to scan multiple beam clusters over all space. The conventional approach to the design and manufacturing of the overlapped subarray network shown in Figure 1 is challenging and costly due to the complexity of the microwave network. However, the design of the overlapped subarray beamformer using Radio Frequency Integrated Circuits (RFIC) represents a novel approach for implementing an efficient trade-off between the agility and capability of fully digital arrays and the cost effectiveness of analog arrays.
READ LESS

Summary

Electronically scanned arrays require a minimum number of controls, Nmin, given by the number of orthogonal beams that fill a prescribed scan sector. Most practical antenna arrays require considerably more than Nmin control elements, but overlapped subarray architectures can approach this theoretical limit. Figure 1 shows a block diagram of...

READ MORE

SiGe IC-based mm-wave imager

Published in:
2007 IEEE Int. Symp. on Circuits and Systems, 27-30 May 2007, pp. 1975-1978.

Summary

Millimeter-wave radiation and detection offers the possibility of detecting concealed weapons. Passive imaging measures the mm-wave radiation emitted from target objects. A passive mm-wave imager and the designs affecting the overall system performance are discussed. With low power receiver architecture and SiGe ICs, a focal plane based full staring array is feasible and can provide a high thermal resolution, ~1.1K at >10Hz frame rate.
READ LESS

Summary

Millimeter-wave radiation and detection offers the possibility of detecting concealed weapons. Passive imaging measures the mm-wave radiation emitted from target objects. A passive mm-wave imager and the designs affecting the overall system performance are discussed. With low power receiver architecture and SiGe ICs, a focal plane based full staring array...

READ MORE

PMatlab: parallel Matlab library for signal processing applications

Published in:
ICASSP, 32nd IEEE Int. Conf. on Acoustics Speech and Signal Processing, April 2007, pp. IV-1189 - IV-1192.

Summary

MATLAB is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB'S popularity is based on availability of high-level abstractions leading to reduced code development time. Due to the compute intensive nature of scientific computing, these applications often require long running times and would benefit greatly from increased performance offered by parallel computing. pMatlab implements partitioned global address space (PGAS) support via standard operator overloading techniques. The core data structures in pMatlab are distributed arrays and maps, which simplify parallel programming by removing the need for explicit message passing. This paper presents the pMaltab design and results for the HPC Challenge benchmark suite. Additionally, two case studies of pMatlab use are described.
READ LESS

Summary

MATLAB is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB'S popularity is based on availability of high-level abstractions leading to reduced code development time. Due to...

READ MORE

pMapper: automatic mapping of parallel Matlab programs

Published in:
Proc. of the HPCM (High Performance Computing Modernization), Users Group Conf., 2005, 27-30 June 2005, pp. 254-261.

Summary

Algorithm implementation efficiency is key to delivering high-performance computing capabilities to demanding, high throughput DoD signal and image processing applications and simulations. Significant progress has been made in compiler optimization of serial programs, but many applications require parallel processing, which brings with it the difficult task of determining efficient mappings of algorithms to multiprocessor computers. The pMapper infrastructure addresses the problem of performance optimization of multistage MATLAB applications on parallel architectures. pMapper is an automatic performance tuning library written as a layer on top of pMatlab. pMatlab is a parallel Matlab toolbox that provides MATLAB users with global array semantics. While pMatlab abstracts the message-passing interface, the responsibility of generating maps for numerical arrays still falls on the user. A processor map for a numerical array is defined as an assignment of blocks of data to processing elements. Choosing the best mapping for a set of numerical arrays in a program is a nontrivial task that requires significant knowledge of programming languages, parallel computing, and processor architecture. pMapper automates the task of map generation, increasing the ease of programming and productivity. In addition to automating the mapping of parallel Matlab programs, pMapper could be used as a mapping tool for embedded systems. This paper addresses the design details of the pMapper infrastructure and presents preliminary results.
READ LESS

Summary

Algorithm implementation efficiency is key to delivering high-performance computing capabilities to demanding, high throughput DoD signal and image processing applications and simulations. Significant progress has been made in compiler optimization of serial programs, but many applications require parallel processing, which brings with it the difficult task of determining efficient mappings...

READ MORE