Publication Abstract

Lippmann, R. P., R. K. Cunningham, D. J. Fried, S. L. Garfinkel, A. S. Gorton, I. Graf, K. R. Kendall, D. J. McClung, D. J. Weber, S. E. Webster, D. Wyschogrod, M. A. Zissman, The 1998 DARPA/AFRL Off-Line Intrusion Detection Evaluation, First International Workshop on Recent Advances in Intrusion Detection, Louvain-la-Neuve, Belgium, 1998.

Abstract

The 1998 intrusion detection off-line evaluation is the first of an ongoing series of yearly evaluations being conducted by MIT Lincoln Laboratory under DARPA ITO and Air Force Research Laboratory sponsorship. These evaluations will contribute significantly to the intrusion detection research field by providing direction for research efforts and calibration of current technical capabilities.  The evaluation is designed to be simple, to focus on core technology issues, and to encourage the widest possible participation by eliminating security and privacy concerns and by providing data types that are used by the majority of intrusion detection systems. Together with the real-time intrusion detection evaluation being coordinated by the Air Force Research Laboratory directly, this evaluation is designed to foster research progress, with the following four goals:
 
  1. Exploring promising new ideas in intrusion detection.
  2. Developing advanced technology incorporating these ideas.
  3. Measuring the performance of this technology.
  4. Comparing the performance of various newly developed and
     existing systems in a systematic, careful way.

Evaluations measure the ability of intrusion detection systems to detect attacks on computer systems and networks.  This year's task focuses on UNIX workstations and the goal is to determine whether any of the following attack events occurred or were attempted during a given network session:

  1. Denial of service
  2. Unauthorized access from a remote machine
  3. Unauthorized access to local superuser privileges by a local
     unprivileged user
  4. Surveillance and probing
  5. Anomalous user behavior

Network sessions used for scoring are complete TCP/IP connections which correspond to interactions using many services including telnet, HTTP, SMTP, FTP, finger, rlogin, and others.  Sessions are generated automatically using a simulation network with more than 120 simulated hosts, more than 1,000 simulated users, and with realistic traffic patterns similar to those seen at a military base. Hundreds of attacks representing more than 25 different types of attacks, with different levels of stealthiness, and different actions following breakins are injected into normal traffic at known times and locations.

This evaluation is carefully designed to measure false alarm rates for recent attacks as well as detection rates. For each session, an intrusion detection system will be required to produce a score, indicating the relative likelihood that an attack occurred during the session. Thus, it will be possible to generate receiver operating characteristic (ROC) curves, which plot detection versus false alarm probabilities. ROC curves can be used to determine performance for any possible operating point of an intrusion detection system. Statistics based on these curves will be used to compare systems for different services and different types of attacks.

Prior to the evaluation, a set of training data has been made available to the participating sites. These data are being used to configure intrusion detection systems and train free parameters. Generally, the types of training data provided will be those that are used by most of today's commercial and research intrusion detection systems, e.g.  TCP network packets and audit logs produced by Sun's Basic Security Module.  These data will be generated on a simulation network. Both normal use and attack sessions will be present.  A separate set of test data will be used to measure performance of each intrusion detection system being evaluated.

Some intrusion detection systems are designed specifically to detect anomalous user, system, and network behavior. We have inserted such anomalous behavior into the test and training data to evaluate such systems.