Learning in Query Optimization

Thursday, November 15, 2007 - 9:28am

VOLKER MARKL, IBM Almaden Research Center
TIME: 3:30 – 4:30 p.m.
PLACE: Computer Science Conference Room, Harold Frank Hall Room 1132


Database Systems let users specify queries in a declarative language like SQL. Most modern DBMS optimizers rely upon a cost model to choose the best query execution plan (QEP) for any given query. Cost estimates are heavily dependent upon the optimizer’s estimates for the number of rows that will result at each step of the QEP for complex queries involving many predicates and/or operations. These estimates, in turn, rely upon statistics on the database and modeling assumptions that may or may not be true for a given database. In the first part of our talk, we present research on learning in query optimization that we have carried out at the IBM Almaden Research Center. We introduce LEO, DB2′s LEarning Optimizer, as a comprehensive way to repair incorrect statistics and cardinality estimates of a query execution plan. By monitoring executed queries, LEO compares the optimizer’s estimates with actuals at each step in a QEP, and computes adjustments to cost estimates and statistics that may be used during the current and future query optimizations. LEO introduces a feedback loop to query optimization that enhances the available information on the database where the most queries have occurred, allowing the optimizer to actually learn from its past mistakes. In the second part of the talk, we describe how the knowledge gleaned by LEO is exploited consistently in a query optimizer, by adjusting the optimizer’s model and by maximizing information entropy. In addition, Volker will briefly highlight the DAMIA project and IBM’s Mashup Starter Kit, his current research focusing on the creation of a Data Mashup Fabric forIntranet Applications using Web 2.0 technologies.


Dr. Markl has been working at IBM’s Almaden Research Center in San Jose,USA since 2001, conducting research in query optimization, indexing, and self-managing databases. Volker Markl is spearheading the LEO project, an effort on autonomic computing with the goal to create a self-tuning optimizer for DB2 UDB. He also is the Almaden chair for the IBM Data Management Professional Interest Community (PIC). Volker Markl is a graduate of the Technische Universität München, where he earned a Masters degree in Computer Science in 1995. He completed his PhD in 1999 under the supervision of Rudolf Bayer. His dissertation on “Relational Query Processing Using a Multidimensional Access Technique” was honored “with distinction” by the German Computer Society (Gesellschaft für Informatik). He also earned a degree in Business Administration from the University Hagen, Germany in 1995. Since 1996, Volker Markl has published more than 30 reviewed papers at prestigious scientific conferences and journals, filed more than 10 patents and has been invited speaker at many universities and companies. Dr. Markl is member of the German Computer Society (GI) as well as the Special Interest Group on Management of Data of the Association for Computing Machinery (ACM SIGMOD). He also serves as program committee member and reviewer for several international conferences and journals, including SIGMOD, ICDE, VLDB, TKDE, TODS, IS, and the Computer Journal. His main research interests are on autonomic computing, query processing, and query optimization, but also include applications like data warehousing, electronic commerce and pervasive computing.