Report ID
1999-34
Report Authors
C. Lang and A. Singh
Report Date
Abstract
We present two new models for predicting the number of index page accessesduring nearest neighbor queries for high-dimensional datasets, a density-basedmodel that depends weakly on the underlying index structure and results incoarser predictions, and a sampling-based model that depends strongly on theunderlying index structure and gives more accurate predictions. We givedetailed evaluations of both models and validate them experimentally byconsidering a number of synthetic and real datasets. We analyze the error inthe models and illustrate how it can be minimized. As an application of ourmodels, we show how the sampling-based model can be used to determine theoptimal number of index dimensions for a high-dimensional dataset. Sinceexisting models are tailored either to uniform datasets or to a small number ofdimensions, this research makes a significant contribution to the understandingof high-dimensional multimedia data.
Document
1999-34.ps500.63 KB