Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not reproducible results #48

Open
VGalata opened this issue Aug 24, 2021 · 5 comments
Open

Not reproducible results #48

VGalata opened this issue Aug 24, 2021 · 5 comments

Comments

@VGalata
Copy link

VGalata commented Aug 24, 2021

Hello,

I have a question regarding the reproducibility of the results: I ran nonpareil on the same input using the same command line and got slightly different results for both runs.
Is that something to be expected? Do you know what the source of this randomness is and whether the analysis could be made deterministic in the future?

Used version: nonpareil=3.3.3=r341h470a237_0 installed via conda

Thank you in advance!

Best,
Valentina

@cjfields
Copy link

Coming into this a bit late, but there is a random seed setting as one of the parameters and sampling is mentioned in the documentation, so I think this is both completely expected and possible to make reproducible by setting -r to the same seed between runs:

-r <int> | Random generator seed. By default current time.

@VGalata
Copy link
Author

VGalata commented Jan 28, 2022

Dear @cjfields,

Could you clarify how you use the -r option and with which version of the tool?
Also, I do not see this option listed when running nonpareil -h - neither in the mentioned version 3.3.3 nor in the latest one (3.3.4).

I tried to run version v3.303 (3.3.3, r341h470a237_0) with the option -r set using the same command two times. The output from the two runs has different md5sums and different content as well - except for the *.npl files which don't contain any relevant output anyway.

Here are the commands I executed:

nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2

@lmrodriguezr
Copy link
Owner

Thanks for bringing this up to our attention! I have now implemented consistency with -r when using -T alignment. Note that it may still produce slightly different results with different numbers of threads (-t).

For -T kmer, we use an implementation of random_device, so it needs a little more work.

@gunturus Do you think the kmer kernel could be migrated to a deterministic implementation instead?

@cjfields
Copy link

Dear @cjfields,

Could you clarify how you use the -r option and with which version of the tool? Also, I do not see this option listed when running nonpareil -h - neither in the mentioned version 3.3.3 nor in the latest one (3.3.4).

I tried to run version v3.303 (3.3.3, r341h470a237_0) with the option -r set using the same command two times. The output from the two runs has different md5sums and different content as well - except for the *.npl files which don't contain any relevant output anyway.

Here are the commands I executed:

nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2

Happy to see @lmrodriguezr 's answer (and agree that it's good you raised it); I planned on replying that this sounds like a definite bug.

@VGalata
Copy link
Author

VGalata commented Jan 31, 2022

Dear @lmrodriguezr and @cjfields,

Thank you both for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants