Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenotypeGVCFs report java.lang.OutOfMemoryError: Java heap space while call incremental imported GenomicsDB #8777

Open
LYOKOIIIYYR opened this issue Apr 16, 2024 · 3 comments

Comments

@LYOKOIIIYYR
Copy link

Bug Report

Affected tool(s) or class(es)

gatk GenomicsDBImport GenotypeGVCFs

Affected version(s)

The Genome Analysis Toolkit (GATK) v4.5.0.0

Description

Hi,
Here is my situation, I'm testing the feasibility of incremental GenomicsDB,I have total 400 samples to joint calling, I have no problem directly using GenomicsDBImport and GenotypeGVCFs for joint calling of all 400 samples. The configuration used is 4c32g for GenomicsDBImport and 2c16g for GenotypeGVCFs. But when I first built a GenomicsDB of 200 samples using GenomicsDBImport successfully, and then use GenomicsDB --genomicsdb-update-workspace-path increment 200 samples into the GenomicsDB , use this incremental imported GenomicsDB to GenotypeGVCFs. The error happend and report GENOMICSDB_TIMER,Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Here are my code

gatk --java-options "-Xms8000m -Xmx~{max_mem}m" \
            GenomicsDBImport \
            --tmp-dir $PWD \
            --genomicsdb-workspace-path ~{workspace_dir_name}~{prefix}.~{index} \
            --batch-size 50 \
            -L ~{intervals} \
            --reader-threads 5 \
            --merge-input-intervals \
            --consolidate \
            -V ~{sep = " -V " single_sample_gvcfs}

gatk --java-options "-Xms8000m -Xmx~{max_mem}m" \
            GenomicsDBImport \
            --tmp-dir $PWD \
            --genomicsdb-update-workspace-path ~{workspace_dir_name} \
            --batch-size 50 \
            --reader-threads 5 \
            --merge-input-intervals \
            --consolidate \
            -V ~{sep = " -V " single_sample_gvcfs}

gatk --java-options "-Xms8000m -Xmx~{max_mem}m" \
            GenotypeGVCFs \
            --tmp-dir $PWD \
            -R ~{ref} \
            -O ~{workspace_dir_name}.vcf.gz \
            -G StandardAnnotation \
            --only-output-calls-starting-in-intervals \
            -V gendb://~{workspace_dir_name} \
            -L ~{intervals} \
            --merge-input-intervals \
           -all-sites

And I found that before report error the number of threads used by GATK increased, but the memory usage did not exceed the maximum limit of the server.
I also cheched --max-alternate-alleles and --genomicsdb-max-alternate-alleles to a smaller size but still the same error

I would appreciate some insights in why that is.

Thanks,
Yang

@gokalpcelik
Copy link
Contributor

Hi @LYOKOIIIYYR
You seem to set your heapsize to the maximum memory size that you have which we do not recommend. GenotypeGVCFs does not need that much memory if I can recall. Can you set the heapsize to a more moderate number such as 8gb or 12 gb and try that way?

@droazen
Copy link
Collaborator

droazen commented Apr 16, 2024

Yes, it's important to realize that GenomicsDB is implemented in C (not Java), and so the memory allocated for GenomicsDB is whatever is NOT allocated to Java (ie., whatever is left over after -Xmx). So -Xmx should never claim all of the memory on the machine, and should leave enough free memory for GenomicsDB to use.

@LYOKOIIIYYR
Copy link
Author

There is no problem on runing GenomicsDBImport , and @gokalpcelik I have already tried Xmx10G to Xmx 14G and get the same error.
I'm most curious about why running GenomicsDB GenotypeGVCFs directly with 400 samples on the same computational resources can succeed, while running incremental GenomicsDB GenotypeGVCFs with 200 + 200 samples fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants