Currently, cybersecurity analysts manually sift through vast quantities of online data to discover online conversations that hint at a cyber threat. Our CHARIOT* software seeks to eliminate the time that analysts spend searching through online data by automatically flagging content that likely contains cyber discussions and discarding the vast majority of conversations that don't.
To interpret online data, such as posts on Reddit, Twitter, and other online forums, CHARIOT uses a text classifier trained by our researchers to recognize cyber-related terms. The classifier examines the total number of times a word appears and the distribution of the word throughout an online discussion. Discussions that are determined to contain significant amounts of cyber-related content are flagged and prioritized for a human analyst, who can determine if those discussions are indicative of potential cyber attacks.
CHARIOT software greatly reduces the amount of time analysts need to find data of interest. For example, to review a dataset of 100,000 posts without the help of CHARIOT, an analyst might find one relevant document per day. But by using CHARIOT software to filter the same dataset, an analyst could discover 74 relevant documents per day.
*Cyber HLT (Human Language Technology) Analysis, Reasoning, and Inference for Online Threats