Automated visual object detection is an important capability in reducing the burden on human operators in many DoD applications. To train modern deep learning algorithms to recognize desired objects, the algorithms must be "fed" more than 1000 labeled images (for 55%–85% accuracy according to project Maven - Oct 2017 O6, Working Group slide 27) of each particular object. The task of labeling training data for use in machine learning algorithms is human intensive, requires special software, and takes a great deal of time. Estimates from ImageNet, a widely used and publicly available visual object detection dataset, indicate that humans generated four annotations per minute in the overall production of ImageNet annotations. DoD's need is to reduce direct object-by-object human labeling particularly in the video domain where data quantity can be significant. The Augmented Annotations System addresses this need by leveraging a small amount of human annotation effort to propagate human initiated annotations through video to build an initial labeled dataset for training an object detector, and utilizing an automated object detector in an iterative loop to assist humans in pre-annotating new datasets.