PROJECT TITLE :
Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates
Co-clustering has emerged to be a powerful information mining tool for 2-dimensional co-occurrence and dyadic information. However, co-clustering algorithms usually require important computational resources and are dismissed as impractical for giant data sets. Existing studies have provided strong empirical proof that expectation-maximization (EM) algorithms (e.g., k-means that algorithm) with sequential updates will considerably reduce the computational value while not degrading the resulting resolution. Motivated by this observation, we introduce sequential updates for alternate minimization co-clustering (AMCC) algorithms that are variants of EM algorithms, and additionally show that AMCC algorithms with sequential updates converge. We tend to then propose two approaches to parallelize AMCC algorithms with sequential updates in a distributed setting. Both approaches are proved to take care of the convergence properties of AMCC algorithms. Primarily based on these two approaches, we tend to gift a replacement distributed framework, Co-ClusterD, that supports economical implementations of AMCC algorithms with sequential updates. We tend to style and implement Co-ClusterD, and show its potency through 2 AMCC algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We tend to evaluate our framework on both a native cluster of machines and therefore the Amazon EC2 cloud. Empirical results show that AMCC algorithms implemented in Co-ClusterD can achieve a a lot of faster convergence and typically obtain better results than their traditional concurrent counterparts.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here