PhD Defense: On Consistency and Performance in Very Large Scale Databases

Tuesday, May 25, 2010 - 2:42pm

PhD Defense
Shyam Antony
Tuesday, June 1st, 2010
2:00 – Room 4164 HFH

Committee: Divy Agrawal (co-chair), Amr El Abbadi (co-chair), Jianwen Su, Kevin Almeroth

Title: On Consistency and Performance in Very Large Scale Databases

Large scale internet services serve a very large number of concurrent users and involve online access to peta bytes of data. Such workloads were traditionally handled by OLTP database engines. However, such systems have proven to be inadequate in handling very large scale workloads. A primary scalability bottleneck of OLTP databases is their highly generalized concurrency control mechanisms incorporating such advanced features such as cursor stabilized predicates, serializability and mixed isolation levels. Designers of internet scale services have responded by designing systems that are scalable but support only very primitive notions of data consistency. However, emerging and potential applications need a reasonable balance between such extremes.

In this thesis, we provide two distinct approaches to provide a reasonable reconciliation between scalability and data consistency. We propose LSTP, a scalable transactional protocol, that shows how to achieve relaxed isolation guarantees similar to snapshot isolation and still avoid phantom problems over range queries. We also propose KGP, a distributed systems protocol, that enhances consistency while retaining scalability by restricting consistent access to dynamically changing groups of small subsets of the entire dataset.

Internally, online internet services generate an enormous volume of auxiliary data. To maximize the utility of this rapidly accumulating volume of data, it is necessary to formulate efficient and dynamic ways to quickly analyze the data. We show how multidimensional range queries can be executed with high scalability and efficiency over such data sources. We also describe the design and implementation of a system that provides skyline queries over user configurable aggregates as a rapid, progressive and useful way of highlighting interesting aggregates of the data.

Everyone Welcome.