Publications
Enforced sparse non-negative matrix factorization
Summary
Summary
Non-negative matrix factorization (NMF) is a dimensionality reduction algorithm for data that can be represented as an undirected bipartite graph. It has become a common method for generating topic models of text data because it is known to produce good results, despite its relative simplicity of implementation and ease of...
LLMapReduce: multi-level map-reduce for high performance data analysis
Summary
Summary
The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing...
Storage and Database Management for Big Data
Summary
Summary
The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and user calls for innovative tools that address the challenges faced by big data volume, velocity, and verity. While there has been great progress in the world...
D4M and large array databases for management and analysis of large biomedical imaging data
Summary
Summary
Advances in medical imaging technologies have enabled the acquisition of increasingly large datasets. Current state-of-the-art confocal or multi-photon imaging technology can produce biomedical datasets in excess of 1 TB per dataset. Typical approaches for analyzing large datasets rely on downsampling the original datasets or leveraging distributed computing resources where small...
Scalability of VM provisioning systems
Summary
Summary
Virtual machines and virtualized hardware have been around for over half a century. The commoditization of the x86 platform and its rapidly growing hardware capabilities have led to recent exponential growth in the use of virtualization both in the enterprise and high performance computing (HPC). The startup time of a...
Percolation model of insider threats to assess the optimum number of rules
Summary
Summary
Rules, regulations, and policies are the basis of civilized society and are used to coordinate the activities of individuals who have a variety of goals and purposes. History has taught that over-regulation (too many rules) makes it difficult to compete and under-regulation (too few rules) can lead to crisis. This...
Improving big data visual analytics with interactive virtual reality
Summary
Summary
For decades, the growth and volume of digital data collection has made it challenging to digest large volumes of information and extract underlying structure. Coined 'Big Data', massive amounts of information has quite often been gathered inconsistently (e.g from many sources, of various forms, at different rates, etc.). These factors...
Enabling on-demand database computing with MIT SuperCloud database management system
Summary
Summary
The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job...
Big data strategies for data center infrastructure management using a 3D gaming platform
Summary
Summary
High Performance Computing (HPC) is intrinsically linked to effective Data Center Infrastructure Management (DCIM). Cloud services and HPC have become key components in Department of Defense and corporate Information Technology competitive strategies in the global and commercial spaces. As a result, the reliance on consistent, reliable Data Center space is...
Portable Map-Reduce utility for MIT SuperCloud environment
Summary
Summary
The MIT Map-Reduce utility has been developed and deployed on the MIT SuperCloud to support scientists and engineers at MIT Lincoln Laboratory. With the MIT Map-Reduce utility, users can deploy their applications quickly onto the MIT SuperCloud infrastructure. The MIT Map-Reduce utility can work with any applications without the need...