Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mccortex unitigs: "Fatal Error: Not enough kmers in hash" or "Fatal Error: Hash table is full" #90

Open
karel-brinda opened this issue Sep 11, 2020 · 2 comments

Comments

@karel-brinda
Copy link

  • OS: OS X
  • Version: mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31

Preparation:

$ wget http://ftp.ebi.ac.uk/pub/software/bigsi/nat_biotech_2018/ctx/ERR189/ERR189737/cleaned/ERR189737.ctx.bz2

$ bzip2 -d -k ERR189737.ctx.bz2

Failure mode 1:

$ mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:08-DAF][cmd] mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:08-DAF][cwd] /private/tmp/~20200910223252
[10 Sep 2020 23:21:08-DAF][version] mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[10 Sep 2020 23:21:08-DAF][memory] 73 bits per kmer
[10 Sep 2020 23:21:08-DAF][cmd_mem.c:98] Fatal Error: Not enough kmers in hash: require at least 70,540,096 kmers (min memory: 624.5MB)
Karel:~20200910223252 karel$ mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:18-fOD][cmd] mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:18-fOD][cwd] /private/tmp/~20200910223252
[10 Sep 2020 23:21:18-fOD][version] mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[10 Sep 2020 23:21:18-fOD][memory] 73 bits per kmer
[10 Sep 2020 23:21:18-fOD][cmd_mem.c:98] Fatal Error: Not enough kmers in hash: require at least 70,540,096 kmers (min memory: 624.5MB)

Failure mode 2:

$ bzcat -f ERR189737.ctx.bz2 |  mccortex31 unitigs -
[11 Sep 2020 12:28:09-fIt][cmd] mccortex31 unitigs -
[11 Sep 2020 12:28:09-fIt][cwd] /private/tmp/~20200910223252
[11 Sep 2020 12:28:09-fIt][version] mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[11 Sep 2020 12:28:09-fIt][memory] 73 bits per kmer
[11 Sep 2020 12:28:09-fIt][memory] graph: 496.8MB
[11 Sep 2020 12:28:09-fIt][memory] total: 496.8MB of 40GB RAM
[11 Sep 2020 12:28:09-fIt] Output in FASTA format to STDOUT
[11 Sep 2020 12:28:09-fIt][hasht] Allocating table with 56,623,104 entries, using 436MB
[11 Sep 2020 12:28:09-fIt][hasht]  number of buckets: 2,097,152, bucket size: 27
[11 Sep 2020 12:28:09-fIt][graph] kmer-size: 31; colours: 1; capacity: 56,623,104
[11 Sep 2020 12:28:09-fIt][FileFilter] Reading file - [1 src colour]
[11 Sep 2020 12:28:09-fIt][GReader] 18,446,744,073,709,551,615 kmers, 16EB filesize
^[[B^[[B^[[B^[[B^[[B^[[B[11 Sep 2020 12:28:50-fIt][hasht] buckets: 2,097,152 [2^21]; bucket size: 27; 
[11 Sep 2020 12:28:50-fIt][hasht] memory: 436MB; filled: 51,626,922 / 56,623,104 (91.18%)
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  0: 49009867
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  1: 1927184
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  2: 462390
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  3: 144851
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  4: 50724
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  5: 19183
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  6: 7551
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  7: 2960
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  8: 1266
[11 Sep 2020 12:28:50-fIt][hasht]  collisions  9: 497
[11 Sep 2020 12:28:50-fIt][hasht]  collisions 10: 276
[11 Sep 2020 12:28:50-fIt][hasht]  collisions 11: 102
[11 Sep 2020 12:28:50-fIt][hasht]  collisions 12: 38
[11 Sep 2020 12:28:50-fIt][hasht]  collisions 13: 21
[11 Sep 2020 12:28:50-fIt][hasht]  collisions 14: 9
[11 Sep 2020 12:28:50-fIt][hasht]  collisions 15: 2
[11 Sep 2020 12:28:50-fIt][hasht]  collisions 16: 1
[11 Sep 2020 12:28:50-fIt][hash_table.c:247] Fatal Error: Hash table is full
@karel-brinda
Copy link
Author

It might be related to #89.

@karel-brinda
Copy link
Author

Other experiments revealed that adding -m 20G helps; I previously didn't know that this parameter should be used for the unitigs subcommand too.

Maybe changing the error message Fatal Error: Hash table is full to something more informative would help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant