PROJECT TITLE :
Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting
ABSTRACT:
De novo clustering may be a widespread technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we have a tendency to introduce a replacement dendrogram-primarily based OTU clustering pipeline referred to as CRiSPy. The key idea employed in CRiSPy to enhance clustering accuracy is the applying of an anomaly detection technique to obtain a dynamic distance cutoff rather than using the de facto worth of 97 % sequence similarity as in most existing OTU clustering pipelines. This technique works by detecting an abrupt modification in the merging heights of a dendrogram. To produce the output dendrograms, CRiSPy employs the OTU hierarchical clustering approach that's computed on a genetic distance matrix derived from an all-against-all scan comparison by pairwise sequence alignment. However, most existing dendrogram-primarily based tools have problem processing datasets larger than 10,00zero unique reads thanks to high computational complexity. We have a tendency to address this difficulty by developing two efficient algorithms for CRiSPy: a compute-economical GPU-accelerated parallel algorithm for pairwise distance matrix computation and a memory-efficient hierarchical clustering algorithm. Our experiments on numerous datasets with distinct attributes show that CRiSPy is in a position to produce additional correct OTU groupings than most OTU clustering applications.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here