Publications

Refine Results

(Filters Applied) Clear All

Genetic sequence matching using D4M big data approaches

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.
READ LESS

Summary

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method...

READ MORE

Big Data dimensional analysis

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data variety is automatically understanding the underlying structures and patterns of the data. Such an understanding is required as a pre-requisite to the application of advanced analytics to the data. Further, big data sets often contain anomalies and errors that are difficult to know a priori. Current approaches to understanding data structure are drawn from the traditional database ontology design. These approaches are effective, but often require too much human involvement to be effective for the volume, velocity and variety of data encountered by big data systems. Dimensional Data Analysis (DDA) is a proposed technique that allows big data analysts to quickly understand the overall structure of a big dataset, determine anomalies. DDA exploits structures that exist in a wide class of data to quickly determine the nature of the data and its statistical anomalies. DDA leverages existing schemas that are employed in big data databases today. This paper presents DDA, applies it to a number of data sets, and measures its performance. The overhead of DDA is low and can be applied to existing big data systems without greatly impacting their computing requirements.
READ LESS

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data...

READ MORE

Adaptive optics program at TMT

Summary

The TMT first light Adaptive Optics (AO) facility consists of the Narrow Field Infra-Red AO System (NFIRAOS) and the associated Laser Guide Star Facility (LGSF). NFIRAOS is a 60 x 60 laser guide star (LGS) multi-conjugate AO (MCAO) system, which provides uniform, diffraction-limited performance in the J, H, and K bands over 17-30 arc sec diameter fields with 50 per cent sky coverage at the galactive pole, as required to support the TMT science cases. NFIRAOS includes two deformable mirrors, six laser guide star wavefront sensors, and three low-order, infrared, natural guide star wavefront sensors within each client instument. The first light LGSF system includes six sodium lasers required to generate the NFIRAOS laser guide stars. In this paper, we will provide an update on the progress in designing, modeling, and validating the TMT first light AO systems and their components over the last two years. This will include pre-final design and prototyping for the deformable mirrors, fabrication and tests for the visible detectors, benchmarking and comparison of different algorithms and processing architecture for the Real Time Controller (RTC) and development tests of prototype candidate lasers. Comprehensive and detailed AO modeling is continuing to support the design and development of the first light AO facility. Main modeling topics studied during the last two years include further studies in the area of wavefront error budget, sky coverage, high precision astrometry for the galactic center and other observations, high contrast imaging with NFIRAOS and its first light instruments, Point Spread Function (PSF) reconstruction for LGS MCAO, LGS photon return and sophisticated low order mode temporal filtering.
READ LESS

Summary

The TMT first light Adaptive Optics (AO) facility consists of the Narrow Field Infra-Red AO System (NFIRAOS) and the associated Laser Guide Star Facility (LGSF). NFIRAOS is a 60 x 60 laser guide star (LGS) multi-conjugate AO (MCAO) system, which provides uniform, diffraction-limited performance in the J, H, and K...

READ MORE

Detecting small asteroids with the Space Surveillance Telescope

Summary

The ability of the Space Surveillance Telescope (SST) to find small (2-15 m diameter) NEAs suitable for the NASA asteroid retrieval mission is investigated. Orbits from a simulated population of targetable small asteroids were propagated and observations with the SST were simulated. Different search patterns and telescope time allocation cases were considered, as well as losses due to FOV gaps and weather. It is concluded that a full-time, dedicated survey at the SST is likely necessary to find a useful population of these NEAs within the mission launch timeframe, especially if an object must be observed on >1 night at SST to qualify as a detection. The simulations were also performed for an identical telescope in the southern hemisphere, which is found to produce results very similar to the SST in New Mexico due to significant (~80%) overlap in the population of objects detected at each site. In addition to considering the SST's ability to detect small NEAs, a parallel study was performed focusing on >100 m diameter objects. This work shows that even with limited telescope time (3 nights per month) a substantial number of these larger objects would be detected.
READ LESS

Summary

The ability of the Space Surveillance Telescope (SST) to find small (2-15 m diameter) NEAs suitable for the NASA asteroid retrieval mission is investigated. Orbits from a simulated population of targetable small asteroids were propagated and observations with the SST were simulated. Different search patterns and telescope time allocation cases...

READ MORE

Comparisons between the extended Kalman filter and the state-dependent Riccati estimator

Summary

The state-dependent Riccati equation-based estimator is becoming a popular estimation tool for nonlinear systems since it does not use system linearization. In this paper, the state-dependent Riccati equation-based estimator is compared with the widely used extended Kalman filter for three simple examples that appear in the open literature. It is demonstrated that, by simulation, the state-dependent Riccati equation-based estimator at best has comparable results to the extended Kalman filter but is often worse than the extended Kalman filter. In some cases, the state-dependent Riccati equation-based estimator does not converge, even though the system considered satisfies all the mathematical constraints on controllability and observability. Sufficient detail is presented in the paper so that the interested reader cannot only duplicate the results but perhaps make suggestions on how to get the state-dependent Riccati equation-based estimator to perform better.
READ LESS

Summary

The state-dependent Riccati equation-based estimator is becoming a popular estimation tool for nonlinear systems since it does not use system linearization. In this paper, the state-dependent Riccati equation-based estimator is compared with the widely used extended Kalman filter for three simple examples that appear in the open literature. It is...

READ MORE

VizLinc: integrating information extraction, search, graph analysis, and geo-location for the visual exploration of large data sets

Published in:
Proc. KDD 2014 Workshop on Interactive Data Exploration and Analytics, IDEA, 24 August 2014, pp. 10-18.

Summary

In this demo paper we introduce VizLinc; an open-source software suite that integrates automatic information extraction, search, graph analysis, and geo-location for interactive visualization and exploration of large data sets. VizLinc helps users in: 1) understanding the type of information the data set under study might contain, 2) finding patterns and connections between entities, and 3) narrowing down the corpus to a small fraction of relevant documents that users can quickly read. We apply the tools offered by VizLinc to a subset of the New York Times Annotated Corpus and present use cases that demonstrate VizLinc's search and visualization features.
READ LESS

Summary

In this demo paper we introduce VizLinc; an open-source software suite that integrates automatic information extraction, search, graph analysis, and geo-location for interactive visualization and exploration of large data sets. VizLinc helps users in: 1) understanding the type of information the data set under study might contain, 2) finding patterns...

READ MORE

Content+context=classification: examining the roles of social interactions and linguist content in Twitter user classification

Published in:
Proc. Second Workshop on Natural Language Processing for Social Media, SocialNLP, 24 August 2014, pp. 59-65.

Summary

Twitter users demonstrate many characteristics via their online presence. Connections, community memberships, and communication patterns reveal both idiosyncratic and general properties of users. In addition, the content of tweets can be critical for distinguishing the role and importance of a user. In this work, we explore Twitter user classification using context and content cues. We construct a rich graph structure induced by hashtags and social communications in Twitter. We derive features from this graph structure - centrality, communities, and local flow of information. In addition, we perform detailed content analysis on tweets looking at offensiveness and topics. We then examine user classification and the role of feature types (context, content) and learning methods (propositional, relational) through a series of experiments on annotated data. Our work contrasts with prior approaches in that we use relational learning and alternative, non-specialized feature sets. Our goal is to understand how both content and context are predictive of user characteristics. Experiments demonstrate that the best performance for user classification uses relational learning with varying content and context features.
READ LESS

Summary

Twitter users demonstrate many characteristics via their online presence. Connections, community memberships, and communication patterns reveal both idiosyncratic and general properties of users. In addition, the content of tweets can be critical for distinguishing the role and importance of a user. In this work, we explore Twitter user classification using...

READ MORE

Effective Entropy: security-centric metric for memory randomization techniques

Published in:
Proc. 7th USENIX Conf. on Cyber Security Experimentation and Test, CSET, 20 August 2014.

Summary

User space memory randomization techniques are an emerging field of cyber defensive technology which attempts to protect computing systems by randomizing the layout of memory. Quantitative metrics are needed to evaluate their effectiveness at securing systems against modern adversaries and to compare between randomization technologies. We introduce Effective Entropy, a measure of entropy in user space memory which quantitatively considers an adversary's ability to leverage low entropy regions of memory via absolute and dynamic intersection connections. Effective Entropy is indicative of adversary workload and enables comparison between different randomization techniques. Using Effective Entropy, we present a comparison of static Address Space Layout Randomization (ASLR), Position Independent Executable (PIE) ASLR, and a theoretical fine grain randomization technique.
READ LESS

Summary

User space memory randomization techniques are an emerging field of cyber defensive technology which attempts to protect computing systems by randomizing the layout of memory. Quantitative metrics are needed to evaluate their effectiveness at securing systems against modern adversaries and to compare between randomization technologies. We introduce Effective Entropy, a...

READ MORE

Chemical aerosol detection and identification using Raman scattering

Published in:
J. Raman Spectrosc., Vol. 45, No. 8, August 2014, pp. 677-9.

Summary

Early warning of the presence of chemical agent aerosols is an important component in the defense against such agents. A Raman spectrometer has been constructed for the purpose of detecting and identifying chemical aerosols. We report the detection and identification of a low-concentration chemical aerosol in atmospheric air using 532-nm continuous wave laser Raman scattering. We have demonstrated the Raman scattering detection and identification of an aerosol of isovanillin of mass concentration of 1.8 ng/cm^3 with a signal-to-noise ratio of about 19 in 30 s for the 116-cm^-1 mode with a Raman cross section of 3.3 x 10^-28 cm^2 using 8-W double-pass laser power.
READ LESS

Summary

Early warning of the presence of chemical agent aerosols is an important component in the defense against such agents. A Raman spectrometer has been constructed for the purpose of detecting and identifying chemical aerosols. We report the detection and identification of a low-concentration chemical aerosol in atmospheric air using 532-nm...

READ MORE

Silicon photonics devices for integrated analog signal processing and sampling

Published in:
Nanophotonics, Vol. 3, No. 4-5, 1 August 2014, pp. 313-27.

Summary

Silicon photonics offers the possibility of a reduction in size weight and power for many optical systems, and could open up the ability to build optical systems with complexities that would otherwise be impossible to achieve. Silicon photonics is an emerging technology that has already been inserted into commercial communication products. This technology has also been applied to analog signal processing applications. MIT Lincoln Laboratory in collaboration with groups at MIT has developed a toolkit of silicon photonic devices with a focus on the needs of analog systems. This toolkit includes low-loss waveguides, a high-speed modulator, ring resonator based filter bank, and all-silicon photodiodes. The components are integrated together for a hybrid photonic and electronic analog-to-digital converter. The development and performance of these devices will be discussed. Additionally, the linear performance of these devices, which is important for analog systems, is also investigated.
READ LESS

Summary

Silicon photonics offers the possibility of a reduction in size weight and power for many optical systems, and could open up the ability to build optical systems with complexities that would otherwise be impossible to achieve. Silicon photonics is an emerging technology that has already been inserted into commercial communication...

READ MORE