Not reproducible results #48

VGalata · 2021-08-24T10:18:50Z

Hello,

I have a question regarding the reproducibility of the results: I ran nonpareil on the same input using the same command line and got slightly different results for both runs.
Is that something to be expected? Do you know what the source of this randomness is and whether the analysis could be made deterministic in the future?

Used version: nonpareil=3.3.3=r341h470a237_0 installed via conda

Thank you in advance!

Best,
Valentina

The text was updated successfully, but these errors were encountered:

cjfields · 2022-01-27T18:40:35Z

Coming into this a bit late, but there is a random seed setting as one of the parameters and sampling is mentioned in the documentation, so I think this is both completely expected and possible to make reproducible by setting -r to the same seed between runs:

-r <int> | Random generator seed. By default current time.

VGalata · 2022-01-28T10:49:35Z

Dear @cjfields,

Could you clarify how you use the -r option and with which version of the tool?
Also, I do not see this option listed when running nonpareil -h - neither in the mentioned version 3.3.3 nor in the latest one (3.3.4).

I tried to run version v3.303 (3.3.3, r341h470a237_0) with the option -r set using the same command two times. The output from the two runs has different md5sums and different content as well - except for the *.npl files which don't contain any relevant output anyway.

Here are the commands I executed:

nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2

lmrodriguezr · 2022-01-28T17:52:26Z

Thanks for bringing this up to our attention! I have now implemented consistency with -r when using -T alignment. Note that it may still produce slightly different results with different numbers of threads (-t).

For -T kmer, we use an implementation of random_device, so it needs a little more work.

@gunturus Do you think the kmer kernel could be migrated to a deterministic implementation instead?

cjfields · 2022-01-28T18:12:01Z

Dear @cjfields,

Could you clarify how you use the -r option and with which version of the tool? Also, I do not see this option listed when running nonpareil -h - neither in the mentioned version 3.3.3 nor in the latest one (3.3.4).

I tried to run version v3.303 (3.3.3, r341h470a237_0) with the option -r set using the same command two times. The output from the two runs has different md5sums and different content as well - except for the *.npl files which don't contain any relevant output anyway.

Here are the commands I executed:
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2

Happy to see @lmrodriguezr 's answer (and agree that it's good you raised it); I planned on replying that this sounds like a definite bug.

VGalata · 2022-01-31T06:41:16Z

Dear @lmrodriguezr and @cjfields,

Thank you both for looking into this!

lmrodriguezr added the enhancement label Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not reproducible results #48

Not reproducible results #48

VGalata commented Aug 24, 2021 •

edited

cjfields commented Jan 27, 2022

VGalata commented Jan 28, 2022 •

edited

lmrodriguezr commented Jan 28, 2022

cjfields commented Jan 28, 2022

VGalata commented Jan 31, 2022

Not reproducible results #48

Not reproducible results #48

Comments

VGalata commented Aug 24, 2021 • edited

cjfields commented Jan 27, 2022

VGalata commented Jan 28, 2022 • edited

lmrodriguezr commented Jan 28, 2022

cjfields commented Jan 28, 2022

VGalata commented Jan 31, 2022

VGalata commented Aug 24, 2021 •

edited

VGalata commented Jan 28, 2022 •

edited