The Challenge of Data Efficiency: Vignettes from Statistics and Evolution

Thursday, February 2, 2012 - 11:30am


Wednesday, February 8, 2012
3:30 – 4:30 PM
Computer Science Conference Room, Harold Frank Hall Rm. 1132

HOST: Giovanni Vigna

SPEAKER: Paul Valiant
Postdoctoral Fellow, Department of Computer Science, UC Berkeley

Title: The Challenge of Data Efficiency: Vignettes from Statistics and


As the amount of data that can be brought to bear on a task grows, so
too does the difficulty and importance of making the best use of our
data. In this talk we will consider two settings of this problem, one
from statistics and one from the natural world.

A basic question in statistics is to determine how much data is needed
to estimate various properties of one or more probability distributions.
We consider the fundamental properties of support size, entropy, and the
distance between two distributions. We present a unified new approach to
these problems which yields the first explicit sublinear algorithms:
given one or more probability distributions on a domain of size n, we
show how to estimate each of these properties using n/log n samples,
where n corresponds to the support size of the distribution. Our
approach leverages the techniques of linear programming to yield a
single algorithm to estimate all three properties, and, indeed, all
properties from a wider class that contains them. Further, we construct
a new discrete version of the multivariate central limit theorem to show
that n/log n samples are in fact needed to estimate these properties and
thus show that our algorithm uses the optimal number of samples, to
within constant factor.

In the second half of the talk, we take our inspiration from biological
evolution — a natural process that, from vast amounts of data and
complexity yields surprisingly effective results. In this talk we
formulate a notion of “evolvability” for functions with domain and range
that are real-valued vectors, an appropriate way of expressing many
natural biological processes such as protein expression regulation. We
show that linear and fixed-degree polynomial functions are evolvable in
the following dually robust sense: There is a single evolution algorithm
that for all convex loss functions converges for all distributions. We
further examine the scope of the evolvability model and discuss future
directions towards a computational understanding of evolution.


Paul Valiant is a postdoctoral fellow at the University of California,
Berkeley. He earned his PhD from MIT in 2008 under the supervision of
Silvio Micali and previously graduated from Stanford University in 2004
with a BS in mathematics and physics. His research investigates
applications of the computational perspective to other areas of science,
particularly focusing on biology, statistics, and mathematics. He was a
co-winner of the Machtey award for best student paper at FOCS 2005,
winner of the best student paper award at the Theoretical Cryptography
Conference in 2008, and was awarded the NSF Mathematical Sciences
Postdoctoral Research Fellowship in 2009.