Report ID
2003-24
Report Authors
Ahmet Bulut and Ambuj K. Singh
Report Date
Abstract
Monitoring thousands of data streams online poses a challenge in many data-centric applications such as telecommunications networks, traffic management, trend-related analysis, web-click streams, intrusion detection, sensor networks. Stream mining techniques employed in these applications have to be efficient in terms of space usage and per-item processing time, while providing a high quality of answers to similarity queries such as detecting correlations and finding similar patterns. We propose a new approach for summarizing a set of data streams, and for constructing a composite index structure to answer similarity queries. The features of streams are extracted incrementally on the fly at multiple resolutions, and inserted into a family of dynamic index structures for later querying. We show the effectiveness of our method over existing techniques through an extensive set of experiments, both on real world and synthetic data. In case of pattern analysis, our technique offers a competitive accuracy compared to an exact algorithm while minimizing the space required for incremental computation. In detecting correlations, our technique performs up to 60 times better in response time, and up to 20 times better in terms of the quality of answers provided. Its time and space complexity also scale well with the number of streams.
Document
2003-24.pdf303.25 KB