In:
The International Journal of High Performance Computing Applications, SAGE Publications, Vol. 30, No. 2 ( 2016-05), p. 200-211
Abstract:
Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarchical Ward clustering of protein structures, using both atom-based root mean square deviation (RMSD) and rigid-body RMSD molecular distances on a graphics processing unit (GPU). This leads to a speedup of around one order of magnitude of our CUDA implementation on a GeForce Titan GPU compared to a multi-threaded CPU implementation on a Core-i7 2700. Furthermore, the runtimes compare favourably with ClusCo, another state-of-the-art CUDA-enabled protein structure clustering method, while achieving similar accuracy on the iTasser benchmark dataset. Our implementation has also been incorporated into the Biochemical Algorithms library to allow easy integration into biologists’ workflows.
Type of Medium:
Online Resource
ISSN:
1094-3420
,
1741-2846
DOI:
10.1177/1094342015597988
Language:
English
Publisher:
SAGE Publications
Publication Date:
2016
detail.hit.zdb_id:
2017480-9
Bookmarklink