Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LISA: Towards Learned DNA Sequence Search #1015

Open
evancofer opened this issue Apr 28, 2020 · 1 comment
Open

LISA: Towards Learned DNA Sequence Search #1015

evancofer opened this issue Apr 28, 2020 · 1 comment
Labels

Comments

@evancofer
Copy link
Collaborator

Next-generation sequencing (NGS) technologies have enabled affordable sequencing of billions of short DNA fragments at high throughput, paving the way for population-scale genomics. Genomics data analytics at this scale requires overcoming performance bottlenecks, such as searching for short DNA sequences over long reference sequences. In this paper, we introduce LISA (Learned Indexes for Sequence Analysis), a novel learning-based approach to DNA sequence search. As a first proof of concept, we focus on accelerating one of the most essential flavors of the problem, called exact search. LISA builds on and extends FM-index, which is the state-of-the-art technique widely deployed in genomics tool-chains. Initial experiments with human genome datasets indicate that LISA achieves up to a factor of 4X performance speedup against its traditional counterpart.

https://arxiv.org/abs/1910.04728

@evancofer
Copy link
Collaborator Author

Although not strictly deep learning, this paper presents an interesting application of machine learning to improve running time of DNA sequence search algorithms. I think this is interesting, because algorithms and data structures are important components of bioinformatics research but they have not really seen significant applications of ML yet. This paper, which is inspired by The Case for Learned Index Structures , signifies a change in that regard. Along with the Sapling paper, this is a sort of first application of ML to this aspect of bioinformatics. It is possible that we will continue to see similar works. As such, the review might benefit from us carefully speculating about whether deep learning might similarly benefit these fields, or whether deep learning is outperformed by ML here. It is entirely possible that deep learning is a poor fit here due to running time constraints, and that might be worth noting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant