Constructing Query-Driven Dynamic Machine Learning Model With Application to Protein-Ligand Binding Sites Prediction - 2015
We face an era with annotated biological knowledge rapidly and continuously generated. How to effectively incorporate new annotated information into the learning step is crucial for enhancing the performance of a bioinformatics prediction model. Although machine-learning-primarily based methods have been extensively used for coping with numerous biological problems, existing approaches usually train static prediction models based on fixed training datasets. The static approaches are found having several disadvantages such as low scalability and impractical when coaching dataset is big. In view of this, we have a tendency to propose a dynamic learning framework for constructing question-driven prediction models. The key distinction between the proposed framework and the present approaches is that the coaching set for the machine learning algorithm of the proposed framework is dynamically generated in line with the query input, versus training a general model regardless of queries in ancient static methods. Accordingly, a question-driven predictor based mostly on the smaller set of data specifically selected from the whole annotated base dataset will be applied on the query. The new manner for constructing the dynamic model allows us capable of updating the annotated base dataset flexibly and using the most relevant core subset because the training set makes the constructed model having higher generalization ability on the question, showing “half might be higher than all” phenomenon. In step with the new framework, we have a tendency to have implemented a dynamic protein-ligand binding sites predictor called OSML (On-web site model for ligand binding sites prediction). Computer experiments on 10 different ligand types of 3 hierarchically organized levels show that OSML outperforms most existing predictors. The results indicate that the current dynamic framework could be a promising future direction for bridging the gap between the rapidly accumulated annotated biological data and the effective machine-learning-primarily based pre- ictors.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here