Publications
Tagged As
Topological effects on attacks against vertex classification
Summary
Summary
Vertex classification is vulnerable to perturbations of both graph topology and vertex attributes, as shown in recent research. As in other machine learning domains, concerns about robustness to adversarial manipulation can prevent potential users from adopting proposed methods when the consequence of action is very high. This paper considers two...
Survey and benchmarking of machine learning accelerators
Summary
Summary
Advances in multicore processors and accelerators have opened the flood gates to greater exploration and application of machine learning techniques to a variety of applications. These advances, along with breakdowns of several trends including Moore's Law, have prompted an explosion of processors and accelerators that promise even greater computational and...
Detecting food safety risks and human trafficking using interpretable machine learning methods
Summary
Summary
Black box machine learning methods have allowed researchers to design accurate models using large amounts of data at the cost of interpretability. Model interpretability not only improves user buy-in, but in many cases provides users with important information. Especially in the case of the classification problems addressed in this thesis...
Detection and characterization of human trafficking networks using unsupervised scalable text template matching
Summary
Summary
Human trafficking is a form of modern-day slavery affecting an estimated 40 million victims worldwide, primarily through the commercial sexual exploitation of women and children. In the last decade, the advertising of victims has moved from the streets to websites on the Internet, providing greater efficiency and anonymity for sex...
Classifier performance estimation with unbalanced, partially labeled data
Summary
Summary
Class imbalance and lack of ground truth are two significant problems in modern machine learning research. These problems are especially pressing in operational contexts where the total number of data points is extremely large and the cost of obtaining labels is very high. In the face of these issues, accurate...
Benchmarking data analysis and machine learning applications on the Intel KNL many-core processor
Summary
Summary
Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher...
Twitter language identification of similar languages and dialects without ground truth
Summary
Summary
We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID). Our method combines geolocation, original Twitter LID labels, and Amazon Mechanical Turk to resolve missing and unreliable labels. We are the first to compare LID classification performance using the MIRA...
WSR-88D chaff detection and characterization using an optimized hydrometeor classification algorithm
Summary
Summary
Chaff presents multiple issues for aviation, air traffic controllers, and the FAA, including false weather identification and areas where flight paths may need to be altered. Chaff is a radar countermeasure commonly released from aircraft across the United States and is comprised of individual metallic strands designed to reflect certain...
Predicting and analyzing factors in patent litigation
Summary
Summary
Patent litigation is an expensive and time-consuming process. To minimize its impact on the participants in the patent lifecycle, automatic determination of litigation potential is a compelling machine learning application. In this paper, we consider preliminary methods for the prediction of a patent being involved in litigation using metadata, content...
Making #sense of #unstructured text data
Summary
Summary
Automatic extraction of intelligent and useful information from data is one of the main goals in data science. Traditional approaches have focused on learning from structured features, i.e., information in a relational database. However, most of the data encountered in practice are unstructured (i.e., social media posts, forums, emails and...