paper

Performance Evaluation for Automated Threat Detection

Volume Number:
2
Issue Number:
2
Pages:
Starting page
77
Ending page
98
Publication Date:
Publication Date
7 December 2007
Author(s)
R. Schrag, M. Takikawa, P. Goger, J. Eilbert

paper Menu

Abstract

We have developed a performance evaluation laboratory (PE Lab) to assess automated technologies that fuse fragmentary, partial information about individuals’ activities to detect modeled terrorist threat individuals, groups, and events whose evidence traces are embedded in a background dominated by evidence from similarly modeled non-threat phenomena. We have developed the PE Lab’s main components–a test dataset generator and a hypothesis scorer–to address two key challenges of counter-terrorism threat detection performance evaluation:

  • Acquiring adequate test data to support systematic experimentation; and
  • Scoring structured hypotheses that reflect modeled threat objects’ attribute values and inter-relationships.

The generator is parameterized so that the threat detection problem’s difficulty may be varied along multiple dimensions (e.g., dataset size, signal-to-noise ratio, evidence corruption level). We describe and illustrate, using a case study, our methodology for constraint-based experiment design and non-parametric statistical analysis to identify which among varied dataset characteristics most influence a given technology’s performance on a given detection task. The scorer generalizes metrics (precision, recall, F-value, area under curve) traditional in information retrieval to accommodate partial matching over structured case hypothesis objects with weighted attributes. Threat detection technologies may process time-stamped evidence in either batch, forensic mode (to tender threat event hypotheses retrospectively) or in incremental, warning mode (to tender event hypotheses prospectively–as “alerts”). Alerts present additional scoring issues (e.g., timeliness) for which we discuss practical techniques. PE Lab technology should be similarly effective for information fusion or situation assessment technologies applied in other domains (besides counter-terrorism), where performance evaluation presents similar challenges.