Visual recognition and reconstruction in the three- dimensional world

Tuesday, January 27, 2009 - 6:34pm

3:30 – 4:30
Computer Science Conference Room, Harold Frank Hall Rm. 1132


SPEAKER: Silvio Savarese
Electrical & Computer Engineering, University of Michigan

Title: Visual recognition and reconstruction in the three- dimensional world


The ability to interpret the semantic of objects and actions, their
individual geometric attributes as well as their spatial and temporal
relationships within the environment is essential for an intelligent
visual system and extremely valuable in numerous applications. In visual
recognition, the problem of categorizing generic objects is a highly
challenging one. Single objects vary in appearances and shapes under
various photometric (e.g. illumination) and geometric (e.g. scale, view
point, occlusion, etc.) transformations. Largely due to the difficulty
of this problem, most of the current research in object categorization
has focused on modeling object classes in single (or nearly single)
views. But our world is fundamentally 3D and it is crucial that we
design models and algorithms that can handle such appearance and pose
variability. In the first part of the talk I introduce a novel framework
for learning and recognizing 3D object categories and their poses. Our
approach is to capture a compact model of an object category by linking
together diagnostic parts of the objects from different viewing points.
The resulting model is a summarization of both the appearance and
geometry information of the object class. Unlike earlier attempts for 3D
object categorization, our framework requires minimal supervision and
has the ability to synthesize unseen views of an object category. Our
results on categorization show superior performances to state-of-the-art
algorithms on the largest dataset up to date. In the second part, I
present a new framework for modeling the overall geometrical and
temporal organization of scenes. This is done by learning the typical
distribution of spatial and temporal relationships among elements in
scenes. Our model is extremely compact and can be learned in an
unsupervised fashion. Experiments demonstrate that the added ability of
modeling such spatial and temporal relationships is useful in several
recognition tasks, such as scene/object categorization and human action
classification. I will conclude the talk with final remarks on the
relevance of the proposed research for a number of applications in
mobile vision.


Silvio Savarese is an Assistant Professor of Electrical Engineering at
the University of Michigan, Ann Arbor. He earned his PhD in Electrical
Engineering from the California Institute of Technology in 2005. He
joined the University of Illinois at Urbana-Champaign from 2005 to 2008
as a Beckman Institute Fellow. In 2002 he was a recipient of the Walker
von Brimer Award for outstanding research initiative. His research
interests include computer vision, object and scene recognition, shape
representation and reconstruction, human visual perception and visual