Constructing Query-Driven Dynamic Machine Learning Model With Application to Protein-Ligand Binding Sites Prediction - 2015
We are facing an era with annotated biological knowledge rapidly and continuously generated. How to effectively incorporate new annotated data into the training step is crucial for enhancing the performance of a bioinformatics prediction model. Although machine-learning-primarily based methods have been extensively used for managing various biological problems, existing approaches usually train static prediction models primarily based on mounted coaching datasets. The static approaches are found having many disadvantages such as low scalability and impractical when coaching dataset is large. In view of this, we tend to propose a dynamic learning framework for constructing query-driven prediction models. The key difference between the proposed framework and the prevailing approaches is that the training set for the machine learning algorithm of the proposed framework is dynamically generated in line with the query input, versus training a general model regardless of queries in traditional static methods. Accordingly, a query-driven predictor primarily based on the smaller set of knowledge specifically selected from the entire annotated base dataset can be applied on the question. The new way for constructing the dynamic model enables us capable of updating the annotated base dataset flexibly and using the most relevant core subset because the coaching set makes the made model having higher generalization ability on the query, showing “part may be higher than all” phenomenon. Consistent with the new framework, we tend to have implemented a dynamic protein-ligand binding sites predictor referred to as OSML (On-website model for ligand binding sites prediction). Computer experiments on ten completely different ligand varieties of 3 hierarchically organized levels show that OSML outperforms most existing predictors. The results indicate that the present dynamic framework may be a promising future direction for bridging the gap between the rapidly accumulated annotated biological data and therefore the effective machine-learning-primarily based pre- ictors. OSML web server and datasets are freely accessible at: http://www.csbio.sjtu.edu.cn/bioinf/OSML/ for tutorial use.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here