Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

Polymorphous computing architecture (PCA) kernel-level benchmarks [revision 1]

June 13, 2005

Project Report

Author:

James M. Lebak

…

Published in:

MIT Lincoln Laboratory Report PCA-KERNEL-1,REV.1

Topic:

computer architecture

R&D area:

R&D group:

Embedded and Open Systems

Summary

This document describes a series of kernel benchmarks for the PCA program. Each kernel benchmark is an operation of importance to DoD sensor applications making use of a PCA architecture. Many of these operations are a part of the composite example applications described elsewhere. The kernel-level benchmarks have been chosen to stress both computation and communication aspects of the architecture. "Computation" aspects include floating-point and integer performance, as well as the memory hierarchy, while the "communication" aspects include the network, the memory hierarchy, and the I/O capabilities. The particular benchmarks chosen are based on the frequency of their use in current and future applications. They are drawn from the areas of signal processing, communication, and information and knowledge processing. The specification of the benchmarks in this document is meant to be high-level and largely independent of the implementation.

READ LESS

Summary

Polymorphous computing architecture (PCA) kernel-level benchmarks [revision 1]

Application of a development time productivity metric to parallel software development

May 15, 2005

Conference Paper

Author:

Andrew P. Funk

…

Published in:

SE-HPCS '05, 2nd Int. Worskhop on Software Engineering for High Performance Computing System Applications, 15 May 2005, pp. 8-12.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Evaluation of High Performance Computing (HPC) systems should take into account software development time productivity in addition to hardware performance, cost, and other factors. We propose a new metric for HPC software development time productivity, defined as the ratio of relative runtime performance to relative programmer effort. This formula has been used to analyze several HPC benchmark codes and classroom programming assignments. The results of this analysis show consistent trends for various programming models. This method enables a high-level evaluation of development time productivity for a given code implementation, which is essential to the task of estimating cost associated with HPC software development.

READ LESS

Summary

Application of a development time productivity metric to parallel software development

Parallel MATLAB for extreme virtual memory

January 1, 2005

Conference Paper

Author:

Hahn G. Kim

…

Published in:

Proc. of the HPCMP Users Group Conf., 27-30 June 2005, pp. 381-387.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Many DoD applications have extreme memory requirements, often with data sets larger than memory on a single computer. Such data sets can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for eXtreme Virtual Memory (pMatlab XVM) library adds out-of-core extensions to the Parallel Matlab (pMatlab) library. The DARPA High Productivity Computing Systems' HPC challenge FFT benchmark has been implemented in C+MPI, pMatlab, pMatlab hand coded for out-of-core and pMatlab XVM. We found that 1) the performance of the C+MPI and pMatlab versions were comparable; 2) the out-of-core versions deliver 80% of the performance of the in-core versions; 3) the out-of-core versions were able to perform a 1 TB (64 billion point) FFT; and 4) the pMatlab XVM program was smaller, easier to implement and verify, and more efficient than its hand coded equivalent. We plan to apply pMatlab XVM to the full HPC challenge benchmark suite. Using next generation hardware, problems sizes a factor of 100 to 1000 times larger should be feasible. We are also transitioning this technology to several DoD signal processing applications. Finally, the flexibility of pMatlab XVM allows hardware designers to experiment with FFT parameters in software before designing hardware for a real-time, ultra-long FFT.

READ LESS

Summary

Parallel MATLAB for extreme virtual memory

Technology requirements for supporting on-demand interactive grid computing

January 1, 2005

Conference Paper

Author:

Albert I. Reuther

…

Published in:

Proc. of the HPCMP Users Group Conf., 27-30 June 2005, pp. 320-327.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

It is increasingly being recognized that a large pool of High Performance Computing (HPC) users requires interactive, on-demand access to HPC resources. How to provide these resources is a significant technical challenge that can be addressed from two directions. The first approach is to adapt existing batch queue based HPC systems to make them more interactive. The second approach is to start with existing interactive desktop environments (e.g., MATLAB) and design a system from the ground up that allows interactive parallel computing. The Lincoln Laboratory Grid (LLGrid) project has taken the latter approach. The LLGrid system has been operational for over a year with a few hundred processors and roughly 70 users, having run over 13,000 interactive jobs and consumed approximately 10,000 processor days of computation. This paper compares the on-demand and interactive computing features of four prominent batch queuing systems: openPBS, Sun GridEngine, Condor, and LSF. It goes on to briefly describe the LLGrid system, and how interactive, on-demand computing was achieved on it by binding to a resource management system. Finally, usage characteristics of the LLGrid system are discussed.

READ LESS

Summary

Technology requirements for supporting on-demand interactive grid computing

Parallel VSIPL++: an open standard software library for high-performance parallel signal processing

January 1, 2005

Journal Article

Author:

James M. Lebak

…

Published in:

Proc. IEEE, Vol. 93, No. 2 , February 2005, pp. 313-330.

Topic:

signal processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Real-time signal processing consumes the majority of the world's computing power. Increasingly, programmable parallel processors are used to address a wide variety of signal processing applications (e.g., scientific, video, wireless, medical, communication, encoding, radar, sonar, and imaging). In programmable systems, the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in allowing the user to write programs at high level, while still achieving performance and preserving the portability of the code across parallel computing hardware platforms. The Parallel Vector, Signal, and Image Processing Library (Parallel VSIPL++) addresses this hurdle by providing high-level C++ array constructs, a simple mechanism for mapping data and functions onto parallel hardware, and a community-defined portable interface. This paper presents an overview of the Parallel VSIPL++ standard as well as a deeper description of the technical foundations and expected performance of the library. Parallel VSIPL++ supports adaptive optimization at many levels. The C++ arrays are designed to support automatic hardware specialization by the compiler. The computation objects (e.g., fast Fourier transforms) are built with explicit setup and run stages to allow for runtime optimization. Parallel arrays and functions in Parallel VSIPL++ also support explicit setup and run stages, which are used to accelerate communication operations. The parallel mapping mechanism provides an external interface that allows optimal mappings to be generated offline and read into the system at runtime. Finally, the standard has been developed in collaboration with high performance embedded computing vendors and is compatible with their proprietary approaches to achieving performance.

READ LESS

Summary

Parallel VSIPL++: an open standard software library for high-performance parallel signal processing

HPC productivity: an overarching view

January 1, 2004

Journal Article

Author:

Jeremy Kepner

Published in:

Int. J. High Perform. Comp. App., Vol. 18, No. 4, Winter 2004, pp. 393-397.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) program is focused on providing a new generation of economically viable high productivity computing systems for national security and for the industrial user community. The value of a high performance computing (HPC) system to a user includes many factors, such as execution time on a particular problem, software development time, direct hardware costs, and indirect administrative and maintenance costs. This special issue, which focuses on HPC productivity, brings together, for the first time, a series of novel papers written by several distinguished authors who share their views on this topic. The topic of productivity in HPC is very new and the authors have been encouraged to speculate. The goal of this first paper is to present an overarching context and framework for the other papers and to define some common ideas that have emerged in considering the problem of HPC productivity. In addition, this paper defines several characteristic HPC workflows that are useful for understanding how users exploit HPC systems, and discusses the role of activity and purpose benchmarks in establishing an empirical basis for HPC productivity.

READ LESS

Summary

HPC productivity: an overarching view

A multi-threaded fast convolver for dynamically parallel image filtering

March 1, 2003

Journal Article

Author:

Jeremy Kepner

Published in:

J. Parallel Distrib. Comput, Vol. 63, No. 3, March 2003, pp. 360-372.

Topic:

image processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments) which must be compensated for in the filtering process by changing the filter across the image. This paper describes a fast (FFT based) method for convolving images with slowly varying filters. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.

READ LESS

Summary

A multi-threaded fast convolver for dynamically parallel image filtering

Cluster detection in databases : the adaptive matched filter algorithm and implementation

January 1, 2003

Journal Article

Author:

Jeremy Kepner

…

Rita Seung Jung Kim

Published in:

Data Mining and Knowledge Discovery, Vol. 7, No. 1, January 2003, pp. 57-79.

Topic:

signal processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Matched filter techniques are a staple of modern signal and image processing. They provide a firm foundation (both theoretical and empirical) for detecting and classifying patterns in statistically described backgrounds. Application of these methods to databases has become increasingly common in certain fields (e.g. astronomy). This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters in large astronomical databases. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.

READ LESS

Summary

Cluster detection in databases : the adaptive matched filter algorithm and implementation

A constrained joint optimization approach to dynamic sensor configuration

November 3, 2002

Conference Paper

Author:

Dana Sinno

…

Daniel E. Kreithen

Published in:

36th Asilomar Conf. on Signals, Systems, and Computers, Vol. 2, 3-6 November 2002, pp. 1179-1183.

Topic:

sensors

R&D area:

R&D group:

Summary

Through intelligent integration of sensing and processing functions, the sensing technology of the future is evolving towards networks of configurable sensors acting in concert. Realizing the potential of collaborative real-time configurable sensor systems presents a number of challenges including the need to address a number of challenges including the need to address the massive global optimization problem resulting from incorporating a large array of control parameters. This paper proposes a systematic approach to addressing complex global optimization problems by constraining the problem to a set of key control parameters and recasting a mission-oriented goal into a tractable joint optimization formula. Using idealized but realistic physical models, a systematic methodology to approach complex multi-dimensional joint optimization problems is used to compute system performance bounds for dynamic sensor configurations.

READ LESS

Summary

A constrained joint optimization approach to dynamic sensor configuration

300x faster Matlab using MatlabMPI

July 18, 2002

Journal Article

Author:

Jeremy Kepner

Published in:

https://arxiv.org/abs/astro-ph/0207389

Topic:

supercomputing

R&D area:

R&D group:

Embedded and Open Systems

Summary

The true costs of high performance computing are currently dominated by software. Addressing these costs requires shifting to high productivity languages such as Matlab. MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI "look and feel" on top of standard Matlab file I/O, resulting in an extremely compact (~250 lines of code) and "pure" implementation which runs anywhere Matlab runs, and on any heterogeneous combination of computers. The performance has been tested on both shared and distributedmemory parallel computers (e.g. Sun, SGI, HP, IBM and Linux). MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~300 using 304 CPUs and ~15% of the theoretical peak (450 Gigaflops) on an IBM SP2 at the Maui High Performance Computing Center. In addition, this entire parallel benchmark application was implemented in 70 software-lines-of-code (SLOC) yielding 0.85 Gigaflop/SLOC or 4.4 CPUs/SLOC, which are the highest values of these software price performance metrics ever achieved for any application. The MatlabMPI software will be made available for download.

READ LESS

Summary

300x faster Matlab using MatlabMPI

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results