WEDNESDAY, OCTOBER 29, 2008
Reception 3:00 – 3:30
Talk 3:30 – 4:30
PLACE: Marine Science Building Auditorium, Room 1302
HOST: DIVY AGRAWAL
SPEAKER: ALON HALEVY, Google
Title: Bringing (Web) Databases to the Masses
Abstract: The World-Wide Web is considered to be primarily a collection
of unstructured text documents. However, the Web also contains vast
collections of structured data. For example, there are millions of
databases that can only be accessed by posing queries to HTML forms, and
there are hundreds of millions of structured data tables embedded in Web
documents. However, users have not been able to effectively tap into
these incredible resources.
I will begin by describing two efforts at Google whose aim is to bring
relevant structured data to web users. In the first, I will describe how
we crawled the content of millions of databases behind forms and now
serve content from these databases to over 1000 queries per second. In
the second, I will describe what can be done with a collection of 150
million high-quality data tables, 5 orders of magnitude greater than any
previous collection ever managed.
I will then take a step back and consider how we need to change our
approach to database management in order to better handle data on the
Web and other collections of loosely coupled repositories of structured
and unstructured data. I will describe the principles of dataspace
systems that emphasize pay-as-you-go data management, rather than
requiring upfront data modeling and schema creation.
Bio: Alon Halevy heads the Structured Data Management Research group at
Google. Prior to that, he was a professor of Computer Science at the
University of Washington in Seattle, where he conducted research on data
integration, XML, personal information management, and peer-to-peer
databases. In 1999, Dr. Halevy co-founded Nimble Technology, one of the
first companies in the Enterprise Information Integration space, and in
2004, Dr. Halevy founded Transformic Inc., a company for that created
search engines for the deep web, which was acquired by Google. Dr.
Halevy is a Fellow of the ACM, received the the Presidential Early
Career Award for Scientists and Engineers (PECASE) in 2000, and was a
Sloan Fellow (1999-2000). In 2006 he received the 10-year VLDB Best
Paper Award for his work on the Information Manifold System. He received
his Ph.D in Computer Science from Stanford University in 1993.