Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate called after throwing an instance of 'std::runtime_error'; Unable to open superkmers/xjin_AB_P0R2c/skp.90 #33

Open
bsmith89 opened this issue Mar 22, 2024 · 2 comments

Comments

@bsmith89
Copy link

I'm getting the following error when I run kmtricks with 276 samples:

$ kmtricks pipeline --kmer-size 111 --hard-min 0 --share-min 1 --soft-min 2 --recurrence-min 3 --file data/group/xjin/r.proc.kmtricks_input.txt --run-dir <DIR> --mode kmer:count:text --threads 24
[2024-03-22 09:26:07.016] [info] Run with Kmer<128> - uint64_t[4] implementation
[2024-03-22 09:26:07.069] [info] Compute configuration...
[2024-03-22 09:26:07.069] [info] 276 samples found (552 read files).
[2024-03-22 09:26:47.828] [info] Use 156 partitions.
[2024-03-22 09:26:48.117] [info] Compute minimizer repartition...
Compute SuperK   [>                                                 ] [00m:00s]                                    
Count partitions [>                                                 ] [00:00s]                                     
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unable to open <DIR>/superkmers/xjin_AB_P0R2c/skp.90
terminate called recursively
[2024-03-22 09:30:47.682] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

./kmtricks_backtrace.log:

Backtrace:
1 0x00007f8d88e41090 (null) + 140245863764112
2 0x00007f8d88e4100b gsignal + 203
3 0x00007f8d88e20859 abort + 299
4 0x00007f8d89213f00 __gnu_cxx::__verbose_terminate_handler() + 192
5 0x00007f8d8921243c (null) + 140245867766844
6 0x00007f8d8921248e (null) + 140245867766926
7 0x000055a817e6c705 (null) + 94180443866885
8 0x00007f8d88e448a7 (null) + 140245863778471
9 0x00007f8d88e44a60 on_exit + 0
10 0x000055a817f2e98a (null) + 94180444662154
11 0x00007f8d88e41090 (null) + 140245863764112
12 0x00007f8d88edb23f clock_nanosleep + 223
13 0x00007f8d88ee0ec7 nanosleep + 23
14 0x000055a817f3f95b (null) + 94180444731739
15 0x000055a8180b14d0 (null) + 94180446246096
16 0x000055a8180b2165 (null) + 94180446249317
17 0x000055a8180b9443 (null) + 94180446278723
18 0x000055a817e52afb main + 1419
19 0x00007f8d88e22083 __libc_start_main + 243
20 0x000055a817e555e5 (null) + 94180443772389

infos:

$ kmtricks infos
kmtricks v1.4.0

- HOST -
build host: Linux-6.1.3
run host: Linux 4.18.0-513.11.1.el8_9.x86_64

- BUILD -
c compiler: GNU 11.2.0
cxx compiler: GNU 11.2.0
conda: ON
static: OFF
native: OFF
modules: ON
socks: ON
howde: ON
dev: OFF
kmer: 32,64,96,128,160,192,224,256
max_c: 4294967295

- GIT SHA1 / VERSION -
kmtricks: 7dc4d18
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01

Contact: teo.lemane@inria.fr

I don't get the same error when I use a subset of 8 samples (instead of 276), nor when I use just 12 threads (instead of 24).
This seems pretty clearly related to Issue #15 , where kmtricks is opening too many files.
Indeed, lsof confirms this, with the crash occuring just as the number of open files ramps up. When I use 12 threads, <1000 files are opened and it doesn't crash.

While using fewer threads works, I'd love a solution that maintains the high parallelization during other steps. Can you suggest a way to run the kmtricks pipeline so that the superkmers computation step doesn't open too many files, but I get as much parallelization as possible?

Thanks for your help, and for building a valuable tool!

@bsmith89
Copy link
Author

Just to further confirm that the number of open files is the problem:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 4075957
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4075957
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

With 12 threads, I still haven't exceeded 1000 open files (currently running, but it's gotten past the point where the 24-thread run consistently fails).

@bsmith89
Copy link
Author

Run with 12 threads did eventually fail (4 hours later):

$ kmtricks pipeline                 --kmer-size 111                 --hard-min 0 --share-min 1 --soft-min 2 --recurrence-min 3                 --file data/group/xjin/r.proc.kmtricks_input.txt --run-dir $tmpdir                 --mode kmer:count:text                 --threads 12
[2024-03-22 09:32:04.718] [info] Run with Kmer<128> - uint64_t[4] implementation
[2024-03-22 09:32:04.781] [info] Compute configuration...
[2024-03-22 09:32:04.781] [info] 276 samples found (552 read files).
[2024-03-22 09:32:44.950] [info] Use 156 partitions.
[2024-03-22 09:32:45.632] [info] Compute minimizer repartition...
Compute SuperK   [==================================================] [02h:22m:42s]
Count partitions [==================================================] [02h:22m:44s]
Merge partitions [>                                                 ] [00:00s]
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called after throwing an instance of 'std::runtime_error'
  what():  <DIR>/counts/partition_2/xjin_AS_P2R2c.kmer
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
[2024-03-22 12:00:21.549] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2024-03-22 12:00:21.549] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2024-03-22 12:00:21.550] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

backtrace:

Backtrace:
1 0x00007f8d88e41090 (null) + 140245863764112
2 0x00007f8d88e4100b gsignal + 203
3 0x00007f8d88e20859 abort + 299
4 0x00007f8d89213f00 __gnu_cxx::__verbose_terminate_handler() + 192
5 0x00007f8d8921243c (null) + 140245867766844
6 0x00007f8d8921248e (null) + 140245867766926
7 0x000055a817e6c705 (null) + 94180443866885
8 0x00007f8d88e448a7 (null) + 140245863778471
9 0x00007f8d88e44a60 on_exit + 0
10 0x000055a817f2e98a (null) + 94180444662154
11 0x00007f8d88e41090 (null) + 140245863764112
12 0x00007f8d88edb23f clock_nanosleep + 223
13 0x00007f8d88ee0ec7 nanosleep + 23
14 0x000055a817f3f95b (null) + 94180444731739
15 0x000055a8180b14d0 (null) + 94180446246096
16 0x000055a8180b2165 (null) + 94180446249317
17 0x000055a8180b9443 (null) + 94180446278723
18 0x000055a817e52afb main + 1419
19 0x00007f8d88e22083 __libc_start_main + 243
20 0x000055a817e555e5 (null) + 94180443772389

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant