The basic idea of active testing (see Active Testing: An Efficient and Robust Framework for Estimating Accuracy) is to intelligently update the labels of a noisily-labeled test set to improve estimation of the performance metric for a particular system. The noisy labels may come, for example, from crowdsourced workers and as a result are not expected to have high accuracy. The required inputs to use active testing are:

A system under test,

A performance metric (accuracy, precision, recall, etc.) of interest,

A test dataset where each item has at least one noisy label and a score from the system under test,

Access to a vetter that can provide (hopefully) high quality labels,

Active testing has two main steps. In the first step, items from the test dataset are queried (according to some query_strategy) and sent to the vetter to receive a label. In the second step, some combination of the system scores, the noisy labels, the vetted labels, and the features are used to estimate the performance metric of interest.

This package implements a variety of query strategies and two metric estimation strategies. See the notebooks folder for examples and more detailed explanations of how the package works. We recommend looking at them in this order

Introduction and Basic Use

Metric Estimators

Query Strategies

DPP

Prototypical Vetting

Text Data

Image Data