Publications
Neural network topologies for sparse training
Summary
Summary
The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to store and train them. Research over the past few decades has explored the prospect of sparsifying DNNs before, during, and after training by pruning edges from the underlying topology. The resulting neural network is known...
Colorization of H&E stained tissue using deep learning
Summary
Summary
Histopathology is a critical tool in the diagnosis and stratification of cancer. Digital Pathology involves the scanning of stained and fixed tissue samples to produce high-resolution images that can be used for computer-aided diagnosis and research. A common challenge in digital pathology related to the quality and characteristics of staining...
Lessons learned from a decade of providing interactive, on-demand high performance computing to scientists and engineers
Summary
Summary
For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively...
On large-scale graph generation with validation of diverse triangle statistics at edges and vertices
Summary
Summary
Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to...
Dynamically correlating network terrain to organizational missions
Summary
Summary
A precondition for assessing mission resilience in a cyber context is identifying which cyber assets support the mission. However, determining the asset dependencies of a mission is typically a manual process that is time consuming, labor intensive and error-prone. Automating the process of mapping between network assets and organizational missions...
Streaming graph challenge: stochastic block partition
Summary
Summary
An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...
A linear algebra approach to fast DNA mixture analysis using GPUs
Summary
Summary
Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis...
Benchmarking data analysis and machine learning applications on the Intel KNL many-core processor
Summary
Summary
Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher...
Static graph challenge: subgraph isomorphism
Summary
Summary
The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges...
Performance measurements of supercomputing and cloud storage solutions
Summary
Summary
Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in...