PROJECT TITLE :
Ontology-Based Search of Genomic Metadata
The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,00zero experiments and 25,00zero information files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored information, leading to information-driven genomic, transcriptomic, and epigenomic discoveries. Nevertheless, search of relevant datasets for information discovery is limitedly supported: metadata describing ENCODE datasets are quite easy and incomplete, and not described by a coherent underlying ontology. Here, we show how to beat this limitation, by adopting an ENCODE metadata looking out approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM(http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we have a tendency to constructed a Semantic Data Base by beginning with ideas extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We have a tendency to prove that this inference method is sound and complete. Then, we leveraged the Semantic Data Base to semantically search ENCODE knowledge from arbitrary biologists' queries. This permits properly finding additional datasets than those extracted by a purely syntactic search, as supported by the other obtainable systems. We empirically show the relevance of found datasets to the biologists' queries.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here