Understanding Errors in Approximate Distributed Latent Dirichlet Allocation


Latent Dirichlet allocation (LDA) is a popular algorithm for discovering semantic structure in large collections of text or other data. Although its complexity is linear in the data size, its use on increasingly massive collections has created considerable interest in parallel implementations. “Approximate distributed” LDA, or AD-LDA, approximates the popular collapsed Gibbs sampling algorithm for LDA models while running on a distributed architecture. Although this algorithm often appears to perform well in practice, its quality is not well understood theoretically or easily assessed on new data. In this work, we theoretically justify the approximation, and modify AD-LDA to track an error bound on performance. Specifically, we upper bound the probability of making a sampling error at each step of the algorithm (compared to an exact, sequential Gibbs sampler), given the samples drawn thus far. We show empirically that our bound is sufficiently tight to give a meaningful and intuitive measure of approximation error in AD-LDA, allowing the user to track the tradeoff between accuracy and efficiency while executing in parallel.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE :Understanding Practical Tradeoffs in HPC Checkpoint-Scheduling Policies - 2018ABSTRACT:As the dimensions of High-Performance Computing (HPC) clusters continues to grow, their increasing failure rates and energy
PROJECT TITLE : Mining Online Discussion Data for Understanding Teachers' Reflective Thinking - 2017 ABSTRACT: Teachers’ online discussion text knowledge streamline their reflective thinking. With the growing scale of
PROJECT TITLE : Understanding the Relation Between the Performance and Reliability of NAND Flash/SCM Hybrid Solid-State Drive - 2016 ABSTRACT: A NAND flash memory/storage-class memory (SCM) hybrid solid-state drive (SSD) will
PROJECT TITLE :Sharing the Ride of Power: Understanding Transactive Energy in the Ecosystem of Energy EconomicsABSTRACT:Advocates of Transactive Energy (TE) create arguments for the mixing of distributed energy resources (DERs)
PROJECT TITLE :Understanding the Magnetic Polarizability TensorABSTRACT:The aim of this paper is to provide new insights into the properties of the rank two polarizability tensor proposed by Ledger and Lionheart for describing

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry