Report ID
2000-10
Report Authors
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi
Report Date
Abstract
With the proliferation of multimedia data, there is increasing need tosupport the indexing and searching of high dimensional data. Recently,a vector approximation based technique called VA-file has beenproposed for indexing high dimensional data. It has been shown thatthe VA-file is an effective technique compared to the currentapproaches based on space and data partitioning. The VA-file givesgood performance especially when the data set is uniformlydistributed. Real data sets are not uniformly distributed, are oftenclustered, and the dimensions of the feature vectors in real data setsare usually correlated. More careful analysis for non-uniform orcorrelated data is needed for effectively indexing high dimensionaldata. We propose a solution to these problems and propose a newtechnique for indexing high dimensional data sets based on vectorapproximations. We conclude with an evaluation of nearest neighborqueries and show that the proposed technique results significantimprovements over the current VA-file approach for several real datasets.
Document
2000-10.ps1.64 MB