Human-Debugging of Machine Visual Recognition

Tuesday, January 10, 2012 - 3:47pm


Thursday, January 19, 2012
11:00 – 12:00 PM
Computer Science Conference Room, Harold Frank Hall Rm. 1132

HOST: B.S. Manjunath

SPEAKER: Devi Parikh
Research Assistant Professor, Toyota Technological Institute at Chicago

Title: Human-Debugging of Machine Visual Recognition


The problem of visual recognition is central towards the goal of
automatic image understanding. While a wide range of efforts have been
made in the computer vision community addressing different aspects of
various recognition problems, machine performance remains
unsatisfactory. Fortunately, we have access to a working system whose
performance we wish to replicate – the human visual recognition system!
It only seems natural to leverage it towards the goal of reliable
machine visual recognition.

In this talk, I will give an overview of our recently-introduced
“human-debugging” paradigm. It involves replacing various components of
a machine vision pipeline with human subjects, and examining the
resultant effect on recognition performance. Meaningful comparisons
identify the aspects of machine vision approaches that require future
research efforts. I will present several of our efforts within this
framework that address image classification (CVPR’10, ICCV’11), object
recognition (CVPR’08, PAMI’11, ICCV’11) and person detection (CVPR’11).
Besides computer vision, human-debugging is also broadly applicable to
other areas in AI such as speech recognition and machine translation.

For image classification, I will describe our work on evaluating the
relative importance of image representations, learning algorithms and
amounts of training data. We found image representation to be the most
important factor. We further evaluated the relative importance of local
and global information in images, and found that further advancement in
modeling global information in images is crucial. For object
recognition, we studied the roles of appearance and contextual
information for machine and human recognition. Inspired by our findings,
we proposed a novel contextual cue that exploits unlabeled regions in
images, which are often ignored by existing contextual models. Our
proposed cue significantly boosts performance of a slew of existing
object detectors. Finally, for person detection we analyzed a
state-of-art parts-based person detection model and found part-detection
to be the weakest link.


Devi Parikh is a Research Assistant Professor at TTI-Chicago, an
academic computer science institute affiliated with University of
Chicago. She received her M.S. and Ph.D. degrees from the Electrical and
Computer Engineering department at Carnegie Mellon University in 2007
and 2009 respectively, advised by Tsuhan Chen. She received her B.S. in
Electrical and Computer Engineering from Rowan University in 2005.

Her research interests include computer vision and AI in general.
Recently, she has been involved in leveraging human-machine
collaborations for building smarter machines. She was a recipient of the
Carnegie Mellon Dean’s Fellowship, National Science Foundation Graduate
Research Fellowship, and the 2011 Marr Prize awarded at ICC