UCSB Computer Science Department Presents Distinguished Lecture:

Friday, September 12, 2008 - 9:22am

Reception 3:00 – 3:30
Talk 3:30 – 4:30
PLACE: Marine Science Building Auditorium, Room 1302

    Title: Bringing (Web) Databases to the Masses

    Abstract: The World-Wide Web is considered to be primarily a collection
    of unstructured text documents. However, the Web also contains vast
    collections of structured data. For example, there are millions of
    databases that can only be accessed by posing queries to HTML forms, and
    there are hundreds of millions of structured data tables embedded in Web
    documents. However, users have not been able to effectively tap into
    these incredible resources.

    I will begin by describing two efforts at Google whose aim is to bring
    relevant structured data to web users. In the first, I will describe how
    we crawled the content of millions of databases behind forms and now
    serve content from these databases to over 1000 queries per second. In
    the second, I will describe what can be done with a collection of 150
    million high-quality data tables, 5 orders of magnitude greater than any
    previous collection ever managed.

    I will then take a step back and consider how we need to change our
    approach to database management in order to better handle data on the
    Web and other collections of loosely coupled repositories of structured
    and unstructured data. I will describe the principles of dataspace
    systems that emphasize pay-as-you-go data management, rather than
    requiring upfront data modeling and schema creation.

    Bio: Alon Halevy heads the Structured Data Management Research group at
    Google. Prior to that, he was a professor of Computer Science at the
    University of Washington in Seattle, where he conducted research on data
    integration, XML, personal information management, and peer-to-peer
    databases. In 1999, Dr. Halevy co-founded Nimble Technology, one of the
    first companies in the Enterprise Information Integration space, and in
    2004, Dr. Halevy founded Transformic Inc., a company for that created
    search engines for the deep web, which was acquired by Google. Dr.
    Halevy is a Fellow of the ACM, received the the Presidential Early
    Career Award for Scientists and Engineers (PECASE) in 2000, and was a
    Sloan Fellow (1999-2000). In 2006 he received the 10-year VLDB Best
    Paper Award for his work on the Information Manifold System. He received
    his Ph.D in Computer Science from Stanford University in 1993.