Facilitating Complex Scientific Analytics in the Cloud

Friday, April 29, 2011 - 9:37am


Wednesday, May 18, 2011
11:00 AM – 12:00 PM
Computer Science Conference Room, Harold Frank Hall Rm. 1132

HOST: Divy Agrawal

SPEAKER: Magdalena Balazinska
Computer Science and Engineering, U. of Washington


Facilitating Complex Scientific Analytics in the Cloud


Scientists today have the ability to generate data at an unprecedented scale and rate. As a result, they must increasingly turn to parallel data processing engines to perform their analyses. However, the simple execution model of these engines can make it difficult to implement efficient algorithms for scientific analytics. In particular, many scientific analytics require the extraction of features from data represented as either a multidimensional array or points in a multidimensional space. These applications exhibit significant computational skew, where the runtime of different partitions depends on more than just input size and can therefore vary dramatically and unpredictably. In this talk, we present SkewReduce, a new system implemented on top of Hadoop that enables users to easily express feature extraction analyses and execute them efficiently. At the heart of the SkewReduce system is an optimizer, parameterized by user-defined cost functions, that determines how best to partition the input data to minimize computational skew. Experiments on real data from two different science domains demonstrate that our approach can improve execution times by a factor of up to 8 compared to a naive MapReduce implementation.


Magdalena Balazinska is an Assistant Professor in the department of Computer Science and Engineering at the University of Washington. Magdalena’s research interests are broadly in the fields of databases and distributed systems. Her current research focuses on data intensive scalable computing, sensor and scientific data management, and cloud computing. Magdalena holds a PhD from the Massachusetts Institute of Technology (2006). She is a Microsoft Research New Faculty Fellow (2007), received an NSF CAREER Award (2009), a 10-year most influential paper award (2010), an HP Labs Research Innovation Award (2009-2011), a Rogel Faculty Support Award (2006), and a Microsoft Research Graduate Fellowship (2003-2005).