Supervised Topic Modeling Using Hierarchical Dirichlet Process-Based Inverse Regression: Experiments on E-Commerce Applications - 2018


The proliferation of e-commerce involves mining client preferences and opinions from user-generated text. To this end, topic models are widely adopted to discover the underlying semantic themes (i.e., topics). Supervised topic models have emerged to leverage discovered topics for predicting the response of interest (e.g., product quality and sales). However, supervised topic modeling remains a challenging downside because of the need to prespecify the number of topics, the lack of predictive data in topics, and limited scalability. In this Project, we tend to propose a novel supervised topic model, Hierarchical Dirichlet Method-primarily based Inverse Regression (HDP-IR). HDP-IR characterizes the corpus with a flexible variety of topics, which persuade retain as a lot of predictive info as the first corpus. Moreover, we develop an efficient inference algorithm capable of examining giant-scale corpora (many documents or more). Three experiments were conducted to evaluate the predictive performance over major e-commerce benchmark testbeds of on-line reviews. Overall, HDP-IR outperformed existing state-of-the-art supervised topic models. Significantly, retaining sufficient predictive information improved predictive R-squared by over seventeen.6 %; having topic structure flexibility contributed to predictive R-squared by at least 4.1 p.c. HDP-IR provides an vital step for future study on user-generated texts from a subject perspective.

