Clustering Data Streams Based on Shared Density Between Micro-Clusters - 2016


As more and a lot of applications manufacture streaming information, clustering knowledge streams has become an vital technique for knowledge and data engineering. A typical approach is to summarize the info stream in real-time with an on-line process into a massive variety of so known as micro-clusters. Micro-clusters represent local density estimates by aggregating the data of many information points during a defined space. On demand, a (modified) standard clustering algorithm is utilized in a second offline step to recluster the micro-clusters into larger final clusters. For reclustering, the centers of the micro-clusters are used as pseudo points with the density estimates used as their weights. However, information regarding density in the world between micro-clusters is not preserved in the.Net process and reclustering relies on probably inaccurate assumptions about the distribution of information among and between micro-clusters (e.g., uniform or Gaussian). This paper describes DBSTREAM, the first micro-cluster-primarily based on-line clustering component that explicitly captures the density between micro-clusters via a shared density graph. The density data during this graph is then exploited for reclustering based mostly on actual density between adjacent micro-clusters. We discuss the house and time complexity of maintaining the shared density graph. Experiments on a wide selection of synthetic and real information sets highlight that using shared density improves clustering quality over alternative common information stream clustering ways which require the creation of a bigger variety of smaller micro-clusters to achieve comparable results.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : Multi-View Clustering with the Cooperation of Visible and Hidden Views ABSTRACT: The use of multi-view data in real-world applications is becoming increasingly common, and as a result, numerous multi-view clustering
PROJECT TITLE : SCHAIN-IRAM: An Efficient and Effective Semi-Supervised Clustering Algorithm for Attributed Heterogeneous Information Networks ABSTRACT: A heterogeneous information network, also known as an HIN, is a network in
PROJECT TITLE : RDMN: A Relative Density Measure Based on MST Neighborhood for Clustering Multi-Scale Datasets ABSTRACT: Techniques for discovering intrinsic clusters that are based on density do so by classifying the regions
PROJECT TITLE : Fully Dynamic kk-Center Clustering With Improved Memory Efficiency ABSTRACT: Any machine learning library worth its salt will include both static and dynamic clustering algorithms as core components. The sliding
PROJECT TITLE : ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering ABSTRACT: A wide variety of big data applications generate an enormous amount of streaming data that is high-dimensional, real-time, and constantly

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry