Summary
Previous methods of network anomaly detection have focused on defining a temporal model of what is "normal," and flagging the "abnormal" activity that does not fit into this pre-trained construct. When monitoring traffic to and from IP addresses on a large network, this problem can become computationally complex, and potentially intractable, as a state model must be maintained for each address. In this paper, we present a method of detecting anomalous network activity without providing any historical context. By exploiting the size of the network along with the minimal overhead of NetFlow data, we are able to model groups of hosts performing similar functions to discover anomalous behavior. As a collection, these anomalies can be further described with a few high-level characterizations and we provide a means for creating and labeling these categories. We demonstrate our method on a very large-scale network consisting of 30 million unique addresses, focusing specifically on traffic related to web servers.