UCSB COMPUTER SCIENCE DEPARTMENT PRESENTS:

Wednesday, February 8, 2012

3:30 – 4:30 PM

Computer Science Conference Room, Harold Frank Hall Rm. 1132

HOST: Giovanni Vigna

SPEAKER: Paul Valiant

Postdoctoral Fellow, Department of Computer Science, UC Berkeley

Title: The Challenge of Data Efficiency: Vignettes from Statistics and

Evolution

Abstract:

As the amount of data that can be brought to bear on a task grows, so

too does the difficulty and importance of making the best use of our

data. In this talk we will consider two settings of this problem, one

from statistics and one from the natural world.

A basic question in statistics is to determine how much data is needed

to estimate various properties of one or more probability distributions.

We consider the fundamental properties of support size, entropy, and the

distance between two distributions. We present a unified new approach to

these problems which yields the first explicit sublinear algorithms:

given one or more probability distributions on a domain of size n, we

show how to estimate each of these properties using n/log n samples,

where n corresponds to the support size of the distribution. Our

approach leverages the techniques of linear programming to yield a

single algorithm to estimate all three properties, and, indeed, all

properties from a wider class that contains them. Further, we construct

a new discrete version of the multivariate central limit theorem to show

that n/log n samples are in fact needed to estimate these properties and

thus show that our algorithm uses the optimal number of samples, to

within constant factor.

In the second half of the talk, we take our inspiration from biological

evolution — a natural process that, from vast amounts of data and

complexity yields surprisingly effective results. In this talk we

formulate a notion of “evolvability” for functions with domain and range

that are real-valued vectors, an appropriate way of expressing many

natural biological processes such as protein expression regulation. We

show that linear and fixed-degree polynomial functions are evolvable in

the following dually robust sense: There is a single evolution algorithm

that for all convex loss functions converges for all distributions. We

further examine the scope of the evolvability model and discuss future

directions towards a computational understanding of evolution.

Bio:

Paul Valiant is a postdoctoral fellow at the University of California,

Berkeley. He earned his PhD from MIT in 2008 under the supervision of

Silvio Micali and previously graduated from Stanford University in 2004

with a BS in mathematics and physics. His research investigates

applications of the computational perspective to other areas of science,

particularly focusing on biology, statistics, and mathematics. He was a

co-winner of the Machtey award for best student paper at FOCS 2005,

winner of the best student paper award at the Theoretical Cryptography

Conference in 2008, and was awarded the NSF Mathematical Sciences

Postdoctoral Research Fellowship in 2009.