Scalable Search and Ranking for Scientific Data

Tuesday, February 16, 2010 - 10:07am


Monday, March 1, 2010
3:00 – 4:00pm
Computer Science Conference Room, Harold Frank Hall Rm. 1132

HOST: Amr El Abbadi

SPEAKER: Mirek Riedewald
College of Computer and Information Science
Associate Professor, Northeastern University


Scalable Search and Ranking for Scientific Data


As the amount and complexity of data in many scientific disciplines increases rapidly, new tools are needed to support exploratory analysis and scientific discovery. Our work is motivated by a major challenge we experienced in collaborations with domain scientists – finding interesting relationships between the attributes (a.k.a. variables) of a complex process. Such relationships, which we generally refer to as patterns, form the basis for new hypotheses and hence facilitate discovery. We argue that data management research is essential for all aspects of scalable pattern search and ranking, ranging from an easy-to-use query language and a formal language for representing search preferences to distributed implementation of the search process. In addition to a system vision and research challenges we will also discuss our current results, including a formal preference language and techniques for efficient generation of model summaries, which are the basis for pattern discovery.


Mirek Riedewald received a Ph.D. in computer science from the University of California at Santa Barbara in 2002. After spending some time as a researcher at Cornell University and as a visiting researcher at Microsoft Research, he is now an Associate Professor at Northeastern University. Dr. Riedewald’s research interests are in databases and data mining, with an emphasis on designing scalable techniques for data-driven science. Most sciences already are producing an abundance of data, and analyzing this data has become a major challenge. This creates exciting opportunities for developing novel approaches that will have an impact both in computer science as well as in the domain sciences. Dr. Riedewald is developing techniques for distributed data analysis, for mining observational data, and for real-time processing of massive data streams. He has a track record of successful collaborations with scientists from different domains, including ornithology, physics, mechanical and aerospace engineering, and astronomy. His work has been published in the premier peer-reviewed data management research venues like ACM SIGMOD, VLDB, IEEE ICDE, and IEEE TKDE, as well as in domain science journals. For more details, please see