Clustering Data Streams Based on Shared Density Between Micro-Clusters - 2016
As more and a lot of applications manufacture streaming information, clustering knowledge streams has become an vital technique for knowledge and data engineering. A typical approach is to summarize the info stream in real-time with an on-line process into a massive variety of so known as micro-clusters. Micro-clusters represent local density estimates by aggregating the data of many information points during a defined space. On demand, a (modified) standard clustering algorithm is utilized in a second offline step to recluster the micro-clusters into larger final clusters. For reclustering, the centers of the micro-clusters are used as pseudo points with the density estimates used as their weights. However, information regarding density in the world between micro-clusters is not preserved in the net process and reclustering relies on probably inaccurate assumptions about the distribution of information among and between micro-clusters (e.g., uniform or Gaussian). This paper describes DBSTREAM, the first micro-cluster-primarily based on-line clustering component that explicitly captures the density between micro-clusters via a shared density graph. The density data during this graph is then exploited for reclustering based mostly on actual density between adjacent micro-clusters. We discuss the house and time complexity of maintaining the shared density graph. Experiments on a wide selection of synthetic and real information sets highlight that using shared density improves clustering quality over alternative common information stream clustering ways which require the creation of a bigger variety of smaller micro-clusters to achieve comparable results.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here