Publications

Refine Results

(Filters Applied) Clear All

Improving big data visual analytics with interactive virtual reality

Published in:
HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Summary

For decades, the growth and volume of digital data collection has made it challenging to digest large volumes of information and extract underlying structure. Coined 'Big Data', massive amounts of information has quite often been gathered inconsistently (e.g from many sources, of various forms, at different rates, etc.). These factors impede the practices of not only processing data, but also analyzing and displaying it in an efficient manner to the user. Many efforts have been completed in the data mining and visual analytics community to create effective ways to further improve analysis and achieve the knowledge desired for better understanding. Our approach for improved big data visual analytics is two-fold, focusing on both visualization and interaction. Given geo-tagged information, we are exploring the benefits of visualizing datasets in the original geospatial domain by utilizing a virtual reality platform. After running proven analytics on the data, we intend to represent the information in a more realistic 3D setting, where analysts can achieve an enhanced situational awareness and rely on familiar perceptions to draw in-depth conclusions on the dataset. In addition, developing a human-computer interface that responds to natural user actions and inputs creates a more intuitive environment. Tasks can be performed to manipulate the dataset and allow users to dive deeper upon request, adhering to desired demands and intentions. Due to the volume and popularity of social media, we developed a 3D tool visualizing Twitter on MIT's campus for analysis. Utilizing emerging technologies of today to create a fully immersive tool that promotes visualization and interaction can help ease the process of understanding and representing big data.
READ LESS

Summary

For decades, the growth and volume of digital data collection has made it challenging to digest large volumes of information and extract underlying structure. Coined 'Big Data', massive amounts of information has quite often been gathered inconsistently (e.g from many sources, of various forms, at different rates, etc.). These factors...

READ MORE

Enabling on-demand database computing with MIT SuperCloud database management system

Summary

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable.
READ LESS

Summary

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job...

READ MORE

Big data strategies for data center infrastructure management using a 3D gaming platform

Summary

High Performance Computing (HPC) is intrinsically linked to effective Data Center Infrastructure Management (DCIM). Cloud services and HPC have become key components in Department of Defense and corporate Information Technology competitive strategies in the global and commercial spaces. As a result, the reliance on consistent, reliable Data Center space is more critical than ever. The costs and complexity of providing quality DCIM are constantly being tested and evaluated by the United States Government and companies such as Google, Microsoft and Facebook. This paper will demonstrate a system where Big Data strategies and 3D gaming technology is leveraged to successfully monitor and analyze multiple HPC systems and a lights-out modular HP EcoPOD 240a Data Center on a singular platform. Big Data technology and a 3D gaming platform enables the relative real time monitoring of 5000 environmental sensors, more than 3500 IT data points and display visual analytics of the overall operating condition of the Data Center from a command center over 100 miles away. In addition, the Big Data model allows for in depth analysis of historical trends and conditions to optimize operations achieving even greater efficiencies and reliability.
READ LESS

Summary

High Performance Computing (HPC) is intrinsically linked to effective Data Center Infrastructure Management (DCIM). Cloud services and HPC have become key components in Department of Defense and corporate Information Technology competitive strategies in the global and commercial spaces. As a result, the reliance on consistent, reliable Data Center space is...

READ MORE

Portable Map-Reduce utility for MIT SuperCloud environment

Summary

The MIT Map-Reduce utility has been developed and deployed on the MIT SuperCloud to support scientists and engineers at MIT Lincoln Laboratory. With the MIT Map-Reduce utility, users can deploy their applications quickly onto the MIT SuperCloud infrastructure. The MIT Map-Reduce utility can work with any applications without the need for any modifications. For improved performance, the MIT Map-Reduce utility provides an option to consolidate multiple input data files per compute task as a single stream of input with minimal changes to the target application. This enables users to reduce the computational overhead associated with the cost of multiple application starting up when dealing with more than one piece of input data per compute task. With a small change in a sample MATLAB image processing application, we have observed approximately 12x speed up by reducing the application startup overhead. Currently the MIT Map-Reduce utility can work with several schedulers such as SLURM, Grid Engine and LSF.
READ LESS

Summary

The MIT Map-Reduce utility has been developed and deployed on the MIT SuperCloud to support scientists and engineers at MIT Lincoln Laboratory. With the MIT Map-Reduce utility, users can deploy their applications quickly onto the MIT SuperCloud infrastructure. The MIT Map-Reduce utility can work with any applications without the need...

READ MORE

Parallel vectorized algebraic AES in MATLAB for rapid prototyping of encrypted sensor processing algorithms and database analytics

Published in:
HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Summary

The increasing use of networked sensor systems and networked databases has led to an increased interest in incorporating encryption directly into sensor algorithms and database analytics. MATLAB is the dominant tool for rapid prototyping of sensor algorithms and has extensive database analytics capabilities. The advent of high level and high performance Galois Field mathematical environments allows encryption algorithms to be expressed succinctly and efficiently. This work leverages the Galois Field primitives found the MATLAB Communication Toolbox to implement a mode of the Advanced Encrypted Standard (AES) based on first principals mathematics. The resulting implementation requires 100x less code than standard AES implementations and delivers speed that is effective for many design purposes. The parallel version achieves speed comparable to native OpenSSL on a single node and is sufficient for real-time prototyping of many sensor processing algorithms and database analytics.
READ LESS

Summary

The increasing use of networked sensor systems and networked databases has led to an increased interest in incorporating encryption directly into sensor algorithms and database analytics. MATLAB is the dominant tool for rapid prototyping of sensor algorithms and has extensive database analytics capabilities. The advent of high level and high...

READ MORE

Using a power law distribution to describe big data

Published in:
HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Summary

The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This difficulty often wastes valuable researcher and computational time by expending resources on uninteresting parts of data. Social sensors, or sensors which produce data based on human activity, such as Wikipedia, Twitter, and Facebook have an underlying structure which can be thought of as having a Power Law distribution. Such a distribution implies that few nodes generate large amounts of data. In this article, we propose a technique to take an arbitrary dataset and compute a power law distributed background model that bases its parameters on observed statistics. This model can be used to determine the suitability of using a power law or automatically identify high degree nodes for filtering and can be scaled to work with big data.
READ LESS

Summary

The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This...

READ MORE

Iris biometric security challenges and possible solutions: for your eyes only? Using the iris as a key

Summary

Biometrics were originally developed for identification, such as for criminal investigations. More recently, biometrics have been also utilized for authentication. Most biometric authentication systems today match a user's biometric reading against a stored reference template generated during enrollment. If the reading and the template are sufficiently close, the authentication is considered successful and the user is authorized to access protected resources. This binary matching approach has major inherent vulnerabilities. An alternative approach to biometric authentication proposes to use fuzzy extractors (also known as biometric cryptosystems), which derive cryptographic keys from noisy sources, such as biometrics. In theory, this approach is much more robust and can enable cryptographic authorization. Unfortunately, for many biometrics that provide high-quality identification, fuzzy extractors provide no security guarantees. This gap arises in part because of an objective mismatch. The quality of a biometric identification is typically measured using false match rate (FMR) versus false nonmatch rate (FNMR). As a result, biometrics have been extensively optimized for this metric. However, this metric says little about the suitability of a biometric for key derivation. In this article, we illustrate a metric that can be used to optimize biometrics for authentication. Using iris biometrics as an example, we explore possible directions for improving processing and representation according to this metric. Finally, we discuss why strong biometric authentication remains a challenging problem and propose some possible future directions for addressing these challenges.
READ LESS

Summary

Biometrics were originally developed for identification, such as for criminal investigations. More recently, biometrics have been also utilized for authentication. Most biometric authentication systems today match a user's biometric reading against a stored reference template generated during enrollment. If the reading and the template are sufficiently close, the authentication is...

READ MORE

Missing the point(er): on the effectiveness of code pointer integrity

Summary

Memory corruption attacks continue to be a major vector of attack for compromising modern systems. Numerous defenses have been proposed against memory corruption attacks, but they all have their limitations and weaknesses. Stronger defenses such as complete memory safety for legacy languages (C/C++) incur a large overhead, while weaker ones such as practical control flow integrity have been shown to be ineffective. A recent technique called code pointer integrity (CPI) promises to balance security and performance by focusing memory safety on code pointers thus preventing most control-hijacking attacks while maintaining low overhead. CPI protects access to code pointers by storing them in a safe region that is protected by instruction level isolation. On x86-32, this isolation is enforced by hardware; on x86-64 and ARM, isolation is enforced by information hiding. We show that, for architectures that do not support segmentation in which CPI relies on information hiding, CPI's safe region can be leaked and then maliciously modified by using data pointer overwrites. We implement a proof-of-concept exploit against Nginx and successfully bypass CPI implementations that rely on information hiding in 6 seconds with 13 observed crashes. We also present an attack that generates no crashes and is able to bypass CPI in 98 hours. Our attack demonstrates the importance of adequately protecting secrets in security mechanisms and the dangers of relying on difficulty of guessing without guaranteeing the absence of memory leaks.
READ LESS

Summary

Memory corruption attacks continue to be a major vector of attack for compromising modern systems. Numerous defenses have been proposed against memory corruption attacks, but they all have their limitations and weaknesses. Stronger defenses such as complete memory safety for legacy languages (C/C++) incur a large overhead, while weaker ones...

READ MORE

Unifying leakage classes: simulatable leakage and pseudoentropy

Published in:
8th Int. Conf. Information-Theoretic Security (ICITS 2015), 2-5 May 2015 in Lecture Notes in Computer Science (LNCS), Vol. 9063, 2015, pp. 69-86.

Summary

Leakage resilient cryptography designs systems to withstand partial adversary knowledge of secret state. Ideally, leakage-resilient systems withstand current and future attacks; restoring confidence in the security of implemented cryptographic systems. Understanding the relation between classes of leakage functions is an important aspect. In this work, we consider the memory leakage model, where the leakage class contains functions over the system's entire secret state. Standard limitations include functions over the system's entire secret state. Standard limitations include functions with bounded output length, functions that retain (pseudo) entropy in the secret, and functions that leave the secret computationally unpredictable. Standaert, Pereira, and Yu (Crypto, 2013) introduced a new class of leakage functions they call simulatable leakage. A leakage function is simulatable if a simulator can produce indistinguishable leakage without access to the true secret state. We extend their notion to general applications and consider two versions. For weak simulatability: the simulated leakage must be indistinguishable from the true leakage in the presence of public information. For strong simulatability, this requirement must also hold when the distinguisher has access to the true secret state. We show the following: --Weakly simulatable functions retain computational unpredictability. --Strongly simulatability functions retain pseudoentropy. --There are bounded length functions that are not weakly simulatable. --There are weakly simulatable functions that remove pseudoentropy. --There are leakage functions that retain computational unpredictability are not weakly simulatable.
READ LESS

Summary

Leakage resilient cryptography designs systems to withstand partial adversary knowledge of secret state. Ideally, leakage-resilient systems withstand current and future attacks; restoring confidence in the security of implemented cryptographic systems. Understanding the relation between classes of leakage functions is an important aspect. In this work, we consider the memory leakage...

READ MORE

Computing on Masked Data to improve the security of big data

Published in:
HST 2015, IEEE Int. Conf. on Technologies for Homeland Security, 14-16 April 2015.

Summary

Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such untrusted servers or databases. Advances in cryptographic techniques and database technologies provide the necessary security functionality but rely on a computational model in which the cloud is used solely for storage and retrieval. Much of big data computation and analytics make use of signal processing fundamentals for computation. As the trend of moving data storage and computation to the cloud increases, homeland security missions should understand the impact of security on key signal processing kernels such as correlation or thresholding. In this article, we propose a tool called Computing on Masked Data (CMD), which combines advances in database technologies and cryptographic tools to provide a low overhead mechanism to offload certain mathematical operations securely to the cloud. This article describes the design and development of the CMD tool.
READ LESS

Summary

Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such...

READ MORE