Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in NIO library using public reference files ending in .fa.gz #8751

Open
1 task done
rickymagner opened this issue Mar 25, 2024 · 0 comments
Open
1 task done

Bug in NIO library using public reference files ending in .fa.gz #8751

rickymagner opened this issue Mar 25, 2024 · 0 comments

Comments

@rickymagner
Copy link
Contributor

Bug Report

Affected tool(s) or class(es)

SelectVariants

Affected version(s)

  • Latest public release version [4.5.0.0]

Description

When trying to stream a reference file from a public URL, there is trouble interpreting the path when the file ends with .fa.gz, especially in finding the index.

Steps to reproduce

This was discovered trying to debug another Picard issue. To reproduce, run this command:

gatk SelectVariants -L chr17:22477226-22477227 -V https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-54kjpn-20230626-af_snvindelall/files/tommo-54kjpn-20230626r3-GRCh38-af-autosome.vcf.gz -R https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jg2.1.0-20211208/files/jg2.1.0.fa.gz -O subset.vcf.gz

Here we try to stream a small region from a VCF using a public reference file.

Expected behavior

The file subset.vcf.gz should be written with just the regions given.

Actual behavior

You get a stacktrace:

org.broadinstitute.http.nio.HttpPath$CantDealWithThisException: Attempting to resolve this against a path which is relatve but looks like it has a scheme.
This: https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jg2.1.0-20211208/files
Other: https:/jmorp.megabank.tohoku.ac.jp/jg2.1.0.fa.gz.fai
Other interpretted as URI: https:/jmorp.megabank.tohoku.ac.jp/jg2.1.0.fa.gz.fai
This is a limitatation of the current implementation of resolve.
Please use choose a less horrible file name or get in touch with the developers to complain.
	at org.broadinstitute.http.nio.HttpPath.resolve(HttpPath.java:381)
	at org.broadinstitute.http.nio.HttpPath.resolve(HttpPath.java:53)
	at java.base/java.nio.file.Path.resolveSibling(Path.java:549)
	at org.broadinstitute.http.nio.HttpPath.resolveSibling(HttpPath.java:418)
	at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getFastaIndexFileName(ReferenceSequenceFileFactory.java:262)
	at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.checkFastaPath(CachingIndexedFastaSequenceFile.java:181)
	at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:147)
	at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:129)
	at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:114)
	at org.broadinstitute.hellbender.engine.ReferenceFileSource.<init>(ReferenceFileSource.java:35)
	at org.broadinstitute.hellbender.engine.ReferenceDataSource.of(ReferenceDataSource.java:27)
	at org.broadinstitute.hellbender.engine.GATKTool.initializeReference(GATKTool.java:439)
	at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:722)
	at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:45)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
	at org.broadinstitute.hellbender.Main.main(Main.java:306)

After chatting with Louis, this sounds like a bug in our nio library and represents an edge case in file extensions that might've not been properly handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant