Online Censoring for Large-Scale Regressions with Application to Streaming Big Data - 2016
On par with data-intensive applications, the sheer size of contemporary linear regression issues creates an ever-growing demand for efficient solvers. Fortunately, a significant share of the info accrued will be omitted while maintaining a bound quality of statistical inference with an inexpensive computational budget. This work introduces means of identifying and omitting less informative observations in an on-line and data-adaptive fashion. Given streaming knowledge, the related most-probability estimator is sequentially found using 1st- and second-order stochastic approximation algorithms. These schemes are well matched when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a very totally different operational state of affairs, the task of joint censoring and estimation is put forth to solve giant-scale linear regressions during a centralized setup. Novel on-line algorithms are developed enjoying straightforward closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and artificial datasets corroborate the efficacy of the proposed knowledge-adaptive strategies compared to data-agnostic random projection-primarily based alternatives.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here