A Berkeley View of Big Data

Tuesday, April 12, 2011 - 4:16pm


10:30 AM – 11:00 AM Reception
11:00 AM – 12:00 PM Talk
Engineering Sciences Building, Room 1001

HOST: Fred Chong

SPEAKER: David Patterson
Professor, UC Berkeley

Title: “A Berkeley View of Big Data”

The compound annual growth rate of worldwide data is 60%, with
the estimate for 2010 being 1 ZettaByte (1,000,000,000 terabytes). For
example, Facebook receives 200 to 400 terabytes of new data and records
130 terabytes of logs every day.

The problem of Big Data is not only that it is capacious, but that it is
also heterogeneous, dirty, and growing even faster than the improvement
in disk capacity. One challenge is then to derive value by answering ad
hoc questions in a timely fashion that justifies the preservation of Big

A group of us from databases, machine learning, networking, and systems
just started a new lab at UC Berkeley to tackle this challenge. The
AMPLab is working at the intersection of three trends: statistical
machine learning (Algorithms), cloud computing (Machines), and
crowdsourcing (People) (see figure below).

We hope to develop a new generation of scalable machine learning
algorithms, data management tools for large-scale and heterogeneous
datasets, datacenter-friendly programming models, and an improved
computational infrastructure. The project will test these ideas out on
several real-world applications. The Founding Sponsors of the AMPLab are
Google and SAP. Amazon Web Services, eBay, Huawei, IBM, Intel,
Microsoft, NEC, NetApp, VMware, and Cloudera are also sponsors.

This talk will cover our current definition of the Big Data problem,
what we perceive as the technical challenges and opportunities of Big
Data, and some initial directions and preliminary results of the AMP Lab.

David Patterson is the Pardee Professor of Computer Science at the
University of California at Berkeley, which he joined after graduating
from UCLA in 1977. Dave’s research style is to identify critical
questions for the IT industry and gather inter-disciplinary groups of
faculty and graduate students to answer them. The answer is typically
embodied in demonstration systems, and these demonstration systems are
later mirrored in commercial products. In addition to research impact,
these projects train leaders of our field. The best known projects were
Reduced Instruction Set Computers (RISC), Redundant Array of Inexpensive
Disks (RAID), and Networks of Workstations (NOW). A measure of the
success of projects is the list of awards won by Patterson and as his
teammates: the C & C Prize, the IEEE von Neumann Medal, the IEEE Johnson
Storage Award, the SIGMOD Test of Time award, and the Katayanagi Prize.
He was also elected to the American Academy of Arts and Sciences,
National Academy of Engineering, National Academy of Sciences, and the
Silicon Valley Engineering Hall of Fame. Most recently, he was named a
Fellow of the Computer History Museum. The full list includes about 20
awards for research, teaching, and service.

In his spare time he coauthored five books, including two with John
Hennessy, who is President of Stanford University. Patterson also served
as Chair of the Computer Science Division at UC Berkeley, Chair of the
Computing Research Association and President of ACM.