Predicting the Performance of Index Structures for High-Dimensional Datasets

Report ID

1999-34

Report Authors

C. Lang and A. Singh

Report Date

1999-11-01

Abstract

We present two new models for predicting the number of index page accessesduring nearest neighbor queries for high-dimensional datasets, a density-basedmodel that depends weakly on the underlying index structure and results incoarser predictions, and a sampling-based model that depends strongly on theunderlying index structure and gives more accurate predictions. We givedetailed evaluations of both models and validate them experimentally byconsidering a number of synthetic and real datasets. We analyze the error inthe models and illustrate how it can be minimized. As an application of ourmodels, we show how the sampling-based model can be used to determine theoptimal number of index dimensions for a high-dimensional dataset. Sinceexisting models are tailored either to uniform datasets or to a small number ofdimensions, this research makes a significant contribution to the understandingof high-dimensional multimedia data.

Document

1999-34.ps500.63 KB