Skip to content

Implementation of Context Binning and Model Clustering for Compression of Genetic Data

License

Notifications You must be signed in to change notification settings

m4tx/masters-thesis

Repository files navigation

Implementation of Context Binning and Model Clustering for Compression of Genetic Data

My master's thesis written as part of the computer science course at Jagiellonian University.

Abstract

In recent years, there happened a gigantic leap in the speed of DNA sequencing methods, which allowed us to sequence DNAs of complex organisms, such as humans, quickly. However, this leads to increasing demand for disk storage, as the sizes of the databases containing such data can easily reach dozens of terabytes. In his article "Context binning, model clustering and adaptivity for data compression of genetic data", Jarek Duda proposes promising compression techniques that should help build a compressor better than the current state of the art. This thesis describes the compressor built to evaluate those techniques, tests it with real-world data and compares it to other genetic data compression tools.

Download

The PDF file can be downloaded from the GitHub Releases page.

Building

Make sure you have Inkscape and a distribution of LaTeX installed in your system.

make

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

Implementation of Context Binning and Model Clustering for Compression of Genetic Data

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published