Audio-visual identity grounding for enabling cross media search

June 23, 2014

Conference Paper

Author:

Kevin Brady

Published in:

IEEE Computer Vision and Pattern Recognition Big Data Workshop, 23 June 2014.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Audio-visual identity grounding for enabling cross media search

Summary

Automatically searching for media clips in large heterogeneous datasets is an inherently difficult challenge, and nearly impossibly so when searching across distinct media types (e.g. finding audio clips that match an image). In this paper we introduce the exploitation of identity grounding for enabling this cross media search and exploration capability. Through the use of grounding we leverage one media channel (e.g. visual identity) as a noisy label for training a model in a different channel (e.g. audio speaker model). Finally, we demonstrate this search capability using images from the Labeled Faces in the Wild (LFW) dataset to query audio files that have been extracted from the YouTube Faces (YTF) dataset.