GitHub - fanavarro/DnaCompress: Dna sequence compressor.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ejemplos		ejemplos
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
coder.c		coder.c
coder.h		coder.h
decode		decode
decode.c		decode.c
encode		encode
encode.c		encode.c
huffman.c		huffman.c
huffman.h		huffman.h
huffman_table.c		huffman_table.c
huffman_table.h		huffman_table.h
huffman_tree.c		huffman_tree.c
huffman_tree.h		huffman_tree.h
number_utils.c		number_utils.c
number_utils.h		number_utils.h
probabilidad.c		probabilidad.c
probabilidad.h		probabilidad.h
readme.txt		readme.txt
simprob_utils.c		simprob_utils.c
simprob_utils.h		simprob_utils.h
string_analyzer.c		string_analyzer.c
string_analyzer.h		string_analyzer.h
string_list.c		string_list.c
string_list.h		string_list.h
string_utils.c		string_utils.c
string_utils.h		string_utils.h
string_value_list.c		string_value_list.c
string_value_list.h		string_value_list.h
testN		testN
testN.c		testN.c
testN.txt		testN.txt
testN2.txt		testN2.txt

Repository files navigation

To encode a file you only have to write the following line:
encode -f fileToCompress

fileToCompress is a file tha contains a DNA sequence, for example "AATCCGCTGACGT".

If the name of fileToCompres is foo, the application will create the following new files:
foo_bin -> This is a text file that contains the binary of the DNA sequence after the compression.
foo_table -> This is a text file that contains the Huffman Table generated by the algorithm to perform the compression.

Encode operation will show the following information in the screen:
Huffman Table generated without grouped symbols (This table will not be used to encode. It's shown for comparative reasons).
Huffman Table generated with grouped symbols (This table will be used to encode).
Compression rate if we would have used the Huffman Table without grouped symbols.
Compression rate using the Huffman Table with grouped symbols.

To decode a file, you have to write the following line:
decode -f binaryFile -t huffmanTableFile -o outputFile

binaryFile is a text file that contains the compressed DNA sequence in bits.
huffmanTableFile is a text file that contains the Huffman Table generated by the algorithm when it performed the compression.
outputFile is the file where the original DNA sequence will be written.

There is a folder called "ejemplos" that contains three output examples.
To encode:
encode -f ejemplos/ej1
encode -f ejemplos/ej2
encode -f ejemplos/ej3

To decode (output will be written in ejemplos/ejX_recompuesto):
decode -f ejemplos/ej1_bin -t ejemplos/ej1_table -o ejemplos/ej1_recompuesto
decode -f ejemplos/ej2_bin -t ejemplos/ej2_table -o ejemplos/ej2_recompuesto
decode -f ejemplos/ej3_bin -t ejemplos/ej3_table -o ejemplos/ej3_recompuesto