Can't process gzipped fastq #35

ohthetrees · 2018-05-17T22:22:58Z

Hi, I'm just getting started with Nonpareil, thanks for your work.

I'm unable to process my gzipped fastq. If I first uncompress the file, it processes as expected. The error:

$ nonpareil -s ETNP_120m_R2.name.fastq.gz -t 4 -T kmer -f fastq -b ETNP_120m_R2.nonpareil.k
Nonpareil v3.301
Fatal error:
The file provided does not have the proper fastq format
 [      0.0] Fatal error: The file provided does not have the proper fastq format

The text was updated successfully, but these errors were encountered:

lmrodriguezr · 2019-08-28T16:42:22Z

Sorry for the loooong delay, I'm back now at tending to the issues.

I believe this is an issue on the kmer kernel, that doesn't allow gzipped input due to the random access function it uses (@gunturus please comment if I'm wrong).

Unfortunately, I don't think this can be easily resolved. I'll leave this issue open until I add a corresponding comment to the documentation, but you'll have to unzip the fastq file prior to using nonpareil.

jfy133 · 2020-11-06T10:02:02Z

I'm starting to investigate nonpareil, and also had the same issue.

Having gzipped input support would be very useful to have, because I have >100 sequencing files all in >1GB file-size range, so having to decompress each time would be a bit nasty when trying to parallelise processing all the files at once.

So I would like to give support to this, if a solution is feasible (even if there is an internal temporary decompression)!

lmrodriguezr · 2020-11-06T18:11:15Z

@gunturus Do you have an update on this issue? I know you were looking into it. Thanks!

jfy133 · 2021-02-25T11:15:25Z

@gunturus do you have any more news? I'm interested in potentially adding nonpariel to the nf-core/eager pipeline, but the lack of gzip support is unfortunately a deal breaker...

gunturus · 2021-02-25T12:53:14Z

@jfy133 unfortunately gzip is not supported. @lmrodriguezr do you have any suggestions to provide gzip support? I have no idea.

jfy133 · 2021-06-09T11:29:22Z

Do you think this is in anyway on a roadmap @lmrodriguezr? Just to know if I should look for different solutions instead.

VGalata · 2021-08-24T08:05:40Z

I would also like to add that having support for compressed FASTQ files would be good.

lmrodriguezr · 2022-02-09T16:56:09Z

Hello. We're finally back at this issue, and it's top of the roadmap. An initial not-so-clean solution would be to unzip the files into a temporary directory, launch nonpareil, and then remove the directory. Would this work as a temporary solution? If yes, I can implement it into a bash wrapper so you could use it out of the box.

A more robust solution is to read directly from the zipped file, but this will take some heavy lifting because we will need to replace a random file access with another method. It's also doable, but I'll take us a bit longer, so hopefully the first option works in the meantime?

VGalata · 2022-02-10T06:32:20Z

Dear @lmrodriguezr,

Thank you very much for looking into this!

For our purpose, having the second option being implemented would be better. We use nonpareil in a snakemake workflow where we want to move away from using unzipped FASTQ files and we would like to avoid unnecessary unzipping if possible. And, as you are saying it yourself, that would be also a more robust solution and I think it would be worth waiting for it.

jfy133 · 2022-02-10T07:45:36Z

@lmrodriguezr we are in the same situtation as @VGalata as we would like to add it to a nextflow pipeline ;).

However, I think unzipping to a /tmp location & automatic cleanup after might be an OK temporary workaround, as at then at least we ourselves don't then have to deal with the unzipping itself. On the otherhand this depends on the implementatoin, and whether you rely on an internal unzipping library within the bash script, or rely on a tool already used on a users machine (which is much more flaky, unfortunately as it's this is often frustratingly not very portable).

But depending on the time it takes for the more robust solution, I guess I would prefer to wait a bit longer (thus time investment) goes into an 'inbuilt' solution.

jfy133 · 2022-03-14T09:04:24Z

@lmrodriguezr just another thought... would it be easier to refactor input to allow stdin?

then could simply to zcat <fastq>.gz | nonpareil <additional params?

Just sayin' as also would be fine with me in terms of accepting gzipped input in terms of useability.

davidecarlson · 2024-01-26T14:24:02Z

Just wanted to chime in with more support for enabling compressed fastq files!

lmrodriguezr self-assigned this Aug 28, 2019

lmrodriguezr added the docs label Aug 28, 2019

lmrodriguezr added the enhancement label Feb 9, 2022

lmrodriguezr assigned gunturus Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't process gzipped fastq #35

Can't process gzipped fastq #35

ohthetrees commented May 17, 2018

lmrodriguezr commented Aug 28, 2019

jfy133 commented Nov 6, 2020 •

edited

lmrodriguezr commented Nov 6, 2020

jfy133 commented Feb 25, 2021

gunturus commented Feb 25, 2021

jfy133 commented Jun 9, 2021

VGalata commented Aug 24, 2021

lmrodriguezr commented Feb 9, 2022

VGalata commented Feb 10, 2022

jfy133 commented Feb 10, 2022

jfy133 commented Mar 14, 2022 •

edited

davidecarlson commented Jan 26, 2024

Can't process gzipped fastq #35

Can't process gzipped fastq #35

Comments

ohthetrees commented May 17, 2018

lmrodriguezr commented Aug 28, 2019

jfy133 commented Nov 6, 2020 • edited

lmrodriguezr commented Nov 6, 2020

jfy133 commented Feb 25, 2021

gunturus commented Feb 25, 2021

jfy133 commented Jun 9, 2021

VGalata commented Aug 24, 2021

lmrodriguezr commented Feb 9, 2022

VGalata commented Feb 10, 2022

jfy133 commented Feb 10, 2022

jfy133 commented Mar 14, 2022 • edited

davidecarlson commented Jan 26, 2024

jfy133 commented Nov 6, 2020 •

edited

jfy133 commented Mar 14, 2022 •

edited