Publications

Refine Results

(Filters Applied) Clear All

Scalable cryptographic authentication for high performance computing

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that uses web services among nodes in the HPC may need to perform authentication between nodes of the same job or a job may need to reach out to data sources outside the HPC. Traditional authentication mechanisms such as passwords or digital certificates encounter issues with the distributed and potentially disconnected nature of HPC systems. Distributing and storing plain-text passwords or cryptographic keys among nodes in a HPC system without special protection is a poor security practice. Systems that reach back to the user's terminal for access to the authenticator are possible, but only in fully interactive supercomputing where connectivity to the user's terminal can be guaranteed. Point solutions can be enabled for these use cases, such as software-based role or self-signed certificates, however they require significant expertise in digital certificates to configure. A more general solution is called for that is both secure and easy to use. This paper presents an overview of a solution implemented on the interactive, on-demand LLGrid computing system at MIT Lincoln Laboratory and its use to solve one such authentication problem.
READ LESS

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that...

READ MORE

Scalable cryptographic authentication for high performance computing

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that uses web services among nodes in the HPC may need to perform authentication between nodes of the same job or a job may need to reach out to data sources outside the HPC. Traditional authentication mechanisms such as passwords or digital certificates encounter issues with the distributed and potentially disconnected nature of HPC systems. Distributing and storing plain-text passwords or cryptographic keys among nodes in a HPC system without special protection is a poor security practice. Systems that reach back to the user's terminal for access to the authenticator are possible, but only in fully interactive supercomputing where connectivity to the user's terminal can be guaranteed. Point solutions can be enabled for these use cases, such as software-based role or self-signed certificates, however they require significant expertise in digital certificates to configure. A more general solution is called for that is both secure and easy to use. This paper presents an overview of a solution implemented on the interactive, on-demand LLGrid computing system at MIT Lincoln Laboratory and its use to solve one such authentication problem.
READ LESS

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that...

READ MORE

Driving big data with big compute

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.
READ LESS

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the...

READ MORE

A scalable signal processing architecture for massive graph analysis

Published in:
ICASSP 2012, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 25-30 March 2012, pp. 5329-32.

Summary

In many applications, it is convenient to represent data as a graph, and often these datasets will be quite large. This paper presents an architecture for analyzing massive graphs, with a focus on signal processing applications such as modeling, filtering, and signal detection. We describe the architecture, which covers the entire processing chain, from data storage to graph construction to graph analysis and subgraph detection. The data are stored in a new format that allows easy extraction of graphs representing any relationship existing in the data. The principal analysis algorithm is the partial eigendecomposition of the modularity matrix, whose running time is discussed. A large document dataset is analyzed, and we present subgraphs that stand out in the principal eigenspace of the time varying graphs, including behavior we regard as clutter as well as small, tightly-connected clusters that emerge over time.
READ LESS

Summary

In many applications, it is convenient to represent data as a graph, and often these datasets will be quite large. This paper presents an architecture for analyzing massive graphs, with a focus on signal processing applications such as modeling, filtering, and signal detection. We describe the architecture, which covers the...

READ MORE

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language "SQL" databases). D4M allows linear algebra to be readily applied to databases. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. This work describes the D4M technology and its application and performance.
READ LESS

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but...

READ MORE

Linear algebraic notation and definitions

Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 13-18.

Summary

This chapter presents notation, definitions, and conventions for graphs, matrices, arrays, and operations upon them.
READ LESS

Summary

This chapter presents notation, definitions, and conventions for graphs, matrices, arrays, and operations upon them.

READ MORE

Fundamental Questions in the Analysis of Large Graphs

Summary

Graphs are a general approach for representing information that spans the widest possible range of computing applications. They are particularly important to computational biology, web search, and knowledge discovery. As the sizes of graphs increase, the need to apply advanced mathematical and computational techniques to solve these problems is growing dramatically. Examining the mathematical and computational foundations of the analysis of large graphs generally leads to more questions than answers. This book concludes with a discussion of some of these questions.
READ LESS

Summary

Graphs are a general approach for representing information that spans the widest possible range of computing applications. They are particularly important to computational biology, web search, and knowledge discovery. As the sizes of graphs increase, the need to apply advanced mathematical and computational techniques to solve these problems is growing...

READ MORE

Visualizing Large Kronecker Graphs

Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 241-250.

Summary

Kronecker graphs have been shown to be one of the most promising models for real-world networks. Visualization of Kronecker graphs is an important challenge. This chapter describes an interactive framework to assist scientists and engineers in generating, analyzing, and visualizing Kronecker graphs with as little effort as possible.
READ LESS

Summary

Kronecker graphs have been shown to be one of the most promising models for real-world networks. Visualization of Kronecker graphs is an important challenge. This chapter describes an interactive framework to assist scientists and engineers in generating, analyzing, and visualizing Kronecker graphs with as little effort as possible.

READ MORE

Subgraph Detection

Author:
Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 115-133.

Summary

Detecting subgraphs of interest in larger graphs is the goal of many graph analysis techniques. The basis of detection theory is computing the probability of a “foreground” with respect to a model of the “background” data. Hidden Markov Models represent one possible foreground model for patterns of interaction in a graph. Likewise, Kronecker graphs are one possible model for power law background graphs. Combining these models allows estimates of the signal to noise ratio, probability of detection, and probability of false alarm for different classes of vertices in the foreground. These estimates can then be used to construct filters for computing the probability that a background graph contains a particular foreground graph. This approach is applied to the problem of detecting a partially labeled tree graph in a power law background graph. One feature of this method is the ability to a priori estimate the number of vertices that will be detected via the filter.
READ LESS

Summary

Detecting subgraphs of interest in larger graphs is the goal of many graph analysis techniques. The basis of detection theory is computing the probability of a “foreground” with respect to a model of the “background” data. Hidden Markov Models represent one possible foreground model for patterns of interaction in a...

READ MORE

The Kronecker theory of power law graphs

Author:
Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 205-220.

Summary

An analytical theory of power law graphs is presented based on the Kronecker graph generation technique. Explicit, stochastic, and instance Kronecker graphs are used to highlight different properties. The analysis uses Kronecker exponentials of complete bipartite graphs to formulate the substructure of such graphs. The Kronecker theory allows various high-level quantities (e.g., degree distribution, betweenness centrality, diameter, eigenvalues, and iso-parametric ratio) to be computed directly from the model parameters.
READ LESS

Summary

An analytical theory of power law graphs is presented based on the Kronecker graph generation technique. Explicit, stochastic, and instance Kronecker graphs are used to highlight different properties. The analysis uses Kronecker exponentials of complete bipartite graphs to formulate the substructure of such graphs. The Kronecker theory allows various high-level...

READ MORE