aaHash: recursive amino acid sequence hashing.

Bibliographic Details
Title: aaHash: recursive amino acid sequence hashing.
Authors: Wong, Johnathan1 jowong@bcgsc.ca, Kazemi, Parham1, Coombe, Lauren1, Warren, René L1, Birol, Inanç1
Source: Bioinformatics Advances. 2023, Vol. 3 Issue 1, p1-5. 5p.
Subject Terms: *AMINO acid sequence, *BIOINFORMATICS, *BIOCHEMISTRY databases, *COMPUTATIONAL biology, *HASHING
Abstract: Motivation K -mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work. We note that amino acid sequences, with complexities and context that cannot be captured by generic hashing algorithms, can also benefit from a domain-specific hashing algorithm. Such a hashing algorithm can accelerate and improve the sensitivity of bioinformatics applications developed for protein sequences. Results Here, we present aaHash, a recursive hashing algorithm tailored for amino acid sequences. This algorithm utilizes multiple hash levels to represent biochemical similarities between amino acids. aaHash performs ∼10× faster than generic string hashing algorithms in hashing adjacent k -mers. Availability and implementation aaHash is available online at https://github.com/bcgsc/btllib and is free for academic use. [ABSTRACT FROM AUTHOR]
Copyright of Bioinformatics Advances is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Academic Search Complete
More Details
DOI:10.1093/bioadv/vbad162
Published in:Bioinformatics Advances
Language:English