dna2vec

Dna2vec is an open-source library to train distributed representations of variable-length k-mers.

For more information, please refer to the paper: dna2vec: Consistent vector representations of variable-length k-mers

This repo is fork of the original pnpnpn/dna2vec repo. The upgrades are:

runs within Docker container
uses latest versions of packages
has nworkers argument, so that you can use more than 4 workers
contains openBLAS but not sure if NumPy or SciPy can utilize it, needs testing

No Installation Required

There's no need to install packages or no need to clone the repository. To run the script just type the following command in terminal

docker run --rm -v $(pwd):/app/data --user 1000 alperyilmaz/dna2vec train_dna2vec.py -c sample_config.yml

The container mounts the working directory, thus files are read in or write out to folders within working directory. Just put your fasta files into a folder and edit config file and then start training.

The sample config file can be as simple as:

inputs: inputs/chr*.fa
k-low: 3
k-high: 5
out-dir: results/

Please refer to hg38-20161219-0153.yml for full blown example. For details about usage, please refer to pnpnpn/dna2vec repo.

New: You can add nworkers argument to take full advantage of available CPUs.

About speed

This is not systematic testing results, merely my observations at different platforms

Hardware	Number of workers	Words/s
i7-4700HQ CPU @ 2.40GHz (Laptop)	8	242,794
c5.4xlarge	16

License

This software is licensed under the MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
attic_util		attic_util
configs		configs
dna2vec		dna2vec
docs		docs
example_inputs		example_inputs
inputs		inputs
pretrained		pretrained
results		results
scripts		scripts
stat_output		stat_output
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

alperyilmaz/dna2vec

Folders and files

Latest commit

History

Repository files navigation

dna2vec

No Installation Required

About speed

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages