Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

High performance computing productivity model synthesis

January 1, 2004

Journal Article

Author:

Jeremy Kepner

Published in:

Int. J. High Perform. Comp. App., Vol. 12, No. 4, Winter 2004, pp. 505-516.

Topic:

high performance computing

R&D area:

R&D group:

ISR Systems and Architectures

Summary

The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing System (HPCS) program is developing systems that deliver increased value to users at a rate commensurate with the rate of improvement in the underlying technologies. For example, if the relevant technology was silicon, the goal of such a system would be to double in productivity (or value) every 18 months, following Moore's law. The key questions are how we define and measure productivity, and what the underlying technologies that affect productivity are. The goal of this paper is to synthesize from several different productivity models a single model that captures the main features of all the models. In addition we will start the process of putting the model on an empirical foundation by incorporating selected results from the software engineering and high performance computing (HPC) communities. An asymptotic analysis of the model is conducted to check that it makes sense in certain special cases. The model is extrapolated to a HPC context and several examples are explored, including HPC centers, HPC users, and interactive grid computing. Finally, the model hints at a profoundly different way of viewing HPC systems, where the user must be included in the equation, and innovative hardware is a key aspect to lowering the very high costs of HPC software.

READ LESS

Summary

High performance computing productivity model synthesis

HPC productivity: an overarching view

January 1, 2004

Journal Article

Author:

Jeremy Kepner

Published in:

Int. J. High Perform. Comp. App., Vol. 18, No. 4, Winter 2004, pp. 393-397.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) program is focused on providing a new generation of economically viable high productivity computing systems for national security and for the industrial user community. The value of a high performance computing (HPC) system to a user includes many factors, such as execution time on a particular problem, software development time, direct hardware costs, and indirect administrative and maintenance costs. This special issue, which focuses on HPC productivity, brings together, for the first time, a series of novel papers written by several distinguished authors who share their views on this topic. The topic of productivity in HPC is very new and the authors have been encouraged to speculate. The goal of this first paper is to present an overarching context and framework for the other papers and to define some common ideas that have emerged in considering the problem of HPC productivity. In addition, this paper defines several characteristic HPC workflows that are useful for understanding how users exploit HPC systems, and discusses the role of activity and purpose benchmarks in establishing an empirical basis for HPC productivity.

READ LESS

Summary

HPC productivity: an overarching view

A multi-threaded fast convolver for dynamically parallel image filtering

March 1, 2003

Journal Article

Author:

Jeremy Kepner

Published in:

J. Parallel Distrib. Comput, Vol. 63, No. 3, March 2003, pp. 360-372.

Topic:

image processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments) which must be compensated for in the filtering process by changing the filter across the image. This paper describes a fast (FFT based) method for convolving images with slowly varying filters. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.

READ LESS

Summary

A multi-threaded fast convolver for dynamically parallel image filtering

Cluster detection in databases : the adaptive matched filter algorithm and implementation

January 1, 2003

Journal Article

Author:

Jeremy Kepner

…

Rita Seung Jung Kim

Published in:

Data Mining and Knowledge Discovery, Vol. 7, No. 1, January 2003, pp. 57-79.

Topic:

signal processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Matched filter techniques are a staple of modern signal and image processing. They provide a firm foundation (both theoretical and empirical) for detecting and classifying patterns in statistically described backgrounds. Application of these methods to databases has become increasingly common in certain fields (e.g. astronomy). This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters in large astronomical databases. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.

READ LESS

Summary

Cluster detection in databases : the adaptive matched filter algorithm and implementation

A constrained joint optimization approach to dynamic sensor configuration

November 3, 2002

Conference Paper

Author:

Dana Sinno

…

Daniel E. Kreithen

Published in:

36th Asilomar Conf. on Signals, Systems, and Computers, Vol. 2, 3-6 November 2002, pp. 1179-1183.

Topic:

sensors

R&D area:

R&D group:

Summary

Through intelligent integration of sensing and processing functions, the sensing technology of the future is evolving towards networks of configurable sensors acting in concert. Realizing the potential of collaborative real-time configurable sensor systems presents a number of challenges including the need to address a number of challenges including the need to address the massive global optimization problem resulting from incorporating a large array of control parameters. This paper proposes a systematic approach to addressing complex global optimization problems by constraining the problem to a set of key control parameters and recasting a mission-oriented goal into a tractable joint optimization formula. Using idealized but realistic physical models, a systematic methodology to approach complex multi-dimensional joint optimization problems is used to compute system performance bounds for dynamic sensor configurations.

READ LESS

Summary

A constrained joint optimization approach to dynamic sensor configuration

300x faster Matlab using MatlabMPI

July 18, 2002

Journal Article

Author:

Jeremy Kepner

Published in:

https://arxiv.org/abs/astro-ph/0207389

Topic:

supercomputing

R&D area:

R&D group:

Embedded and Open Systems

Summary

The true costs of high performance computing are currently dominated by software. Addressing these costs requires shifting to high productivity languages such as Matlab. MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI "look and feel" on top of standard Matlab file I/O, resulting in an extremely compact (~250 lines of code) and "pure" implementation which runs anywhere Matlab runs, and on any heterogeneous combination of computers. The performance has been tested on both shared and distributedmemory parallel computers (e.g. Sun, SGI, HP, IBM and Linux). MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~300 using 304 CPUs and ~15% of the theoretical peak (450 Gigaflops) on an IBM SP2 at the Maui High Performance Computing Center. In addition, this entire parallel benchmark application was implemented in 70 software-lines-of-code (SLOC) yielding 0.85 Gigaflop/SLOC or 4.4 CPUs/SLOC, which are the highest values of these software price performance metrics ever achieved for any application. The MatlabMPI software will be made available for download.

READ LESS

Summary

300x faster Matlab using MatlabMPI

Detecting clusters of galaxies in the Sloan Digital Sky Survey. I. Monte Carlo comparison of cluster detection algorithms

January 1, 2002

Journal Article

Author:

Rita Seung Jung Kim

…

Published in:

Astron. J., Vol. 123, No. 1, January 2002, pp. 20-36.

Topic:

space

R&D area:

R&D group:

Embedded and Open Systems

Summary

We present a comparison of three cluster-finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg(2) region of Sloan Digital Sky Survey (SDSS) imaging data: the matched filter (MF), the adaptive matched filter (AMF), and a color-magnitude filtered Voronoi tessellation technique (VTT). Among the two matched filters, we find that the MF is more efficient in detecting faint clusters, whereas the AMF evaluates the redshifts and richnesses more accurately, therefore suggesting a hybrid method (HMF) that combines the two. The HMF outperforms the VTT when using a background that is uniform, but it is more sensitive to the presence of a nonuniform galaxy background than is the VTT; this is due to the assumption of a uniform background in the HMF model. We thus find that for the detection thresholds we determine to be appropriate for the SDSS data, the performance of both algorithms are similar; we present the selection function for each method evaluated with these thresholds as a function of redshift and richness. For simulated clusters generated with a Schechter luminosity function (M(*r) = -21.5 and (a = -1.1), both algorithms are complete for Abell richness >~ clusters up to z ~0.4 for a sample magnitude limited to r = 21. While the cluster parameter evaluation shows a mild correlation with the local background density, the detection efficiency is not significantly affected by the background fluctuations, unlike previous shallower surveys.

READ LESS

Summary

Detecting clusters of galaxies in the Sloan Digital Sky Survey. I. Monte Carlo comparison of cluster detection algorithms

Discrete optimization using decision-directed learning for distributed networked computing

January 1, 2002

Conference Paper

Author:

Joel I. Goodman

…

Published in:

36th Asilomar Conf. on Signals, Systems and Computers, Vol. 2, 3-6 November 2002, pp. 1189-1196.

Topic:

high performance computing

R&D area:

R&D group:

Summary

Decision-directed learning (DDL) is an iterative discrete approach to finding a feasible solution for large-scale combinatorial optimization problems. DDL is capable of efficiently formulating a solution to network scheduling problems that involve load limiting device utilization, selecting parallel configurations for software applications and host hardware using a minimum set of resources, and meeting time-to-result performance requirements in a dynamic network environment. This paper quantifies the algorithms that constitute DDL and compares its performance to other popular combinatorial self-directed real-time networked resource configuration for dynamically building a mission specific signal-processor for real-time distributed and parallel applications.

READ LESS

Summary

Discrete optimization using decision-directed learning for distributed networked computing

The effect of personality type on the usage of a multimedia engineering education system

January 1, 2002

Journal Article

Author:

Albert I. Reuther

…

D. G. Meyer

Published in:

32nd Annual ASEE/IEEE Frontiers in Education Conf., 6-9 November 2002, pp. T3A-7 - T3A-12.

Topic:

computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Multimedia education has quickly entered our classrooms and offices providing tutorials and lessons on many different topics. The assumption that most people interact with these multimedia systems in similar ways can easily be made, but are these assumptions valid? What factors determine whether students will embrace computer-based multimedia-augmented learning? One factor may be the student's personality type. This paper explores the reasons why some students may enjoy learning using computer-based educational delivery systems while others may have absolutely no enthusiasm for this type of learning and how that enthusiasm may relate to the students' personality types.

READ LESS

Summary

The effect of personality type on the usage of a multimedia engineering education system

PVL: An Object Oriented Software Library for Parallel Signal Processing (Abstract)

January 1, 2002

Conference Paper

Author:

Edward M. Rutledge

…

Jeremy Kepner

Published in:

CLUSTER '01, 2001 IEEE Int. Conf. on Cluster Computing, 8-11 October 2001, p. 74.

Topic:

signal processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Real-time signal processing consumes the majority of the world's computing power Increasingly, programmable parallel microprocessors are used to address a wide variety of signal processing applications (e.g. scientific, video, wireless, medical, communication, encoding, radar, sonar and imaging). In programmable systems the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in mapping (i.e., placement and routing) of an algorithm onto a parallel computer in a general manner that preserves software portability. We have developed the Parallel Vector Library (PVL) to allow signal processing algorithms to be written using high level Matlab like constructs that are independent of the underlying parallel mapping. Programs written using PVL can be ported to a wide range of parallel computers without sacrificing performance. Furthermore, the mapping concepts in PVL provide the infrastructure for enabling new capabilities such as fault tolerance, dynamic scheduling and self-optimization. This presentation discusses PVL with particular emphasis on quantitative comparisons with standard parallel signal programming practices.

READ LESS

Summary

PVL: An Object Oriented Software Library for Parallel Signal Processing (Abstract)

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results