Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly convert counts to cf when doing an autocorr #201

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Christopher-Bradshaw
Copy link
Contributor

@Christopher-Bradshaw Christopher-Bradshaw commented Oct 16, 2019

I think that the current way that Corrfunc converts counts to a correlation function isn't quite right for autocorrelations. I think the normalization is slightly off (see equation 3 in the LS paper).

The effect of this is totally negligible when the number of pairs is high, and very small even when it isn't so this isn't a big deal... But I was confused why things weren't quite as I expected and it seems worth a fix.

Please let me know I am right about this - there's always a good change I've misunderstood something! This change is also pretty rough at the moment - I just wanted to have something to show. I'm happy to clean it up, write tests, etc later but didn't want to do that before checking this made sense!

Thanks!

@pep8speaks
Copy link

pep8speaks commented Oct 16, 2019

Hello @Christopher-Bradshaw! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-18 20:38:01 UTC

@lgarrison
Copy link
Collaborator

Thanks! I think this is consistent with what we're doing in Corrfunc.theory.xi(), so I agree with this change. Would you mind checking that Corrfunc.theory.xi() and convert_3d_counts_to_cf() now agree?

@Christopher-Bradshaw
Copy link
Contributor Author

Hey @lgarrison I might be missing something (or just looking in entirely the wrong place) but it looks like Corrfunc.theory.xi() doesn't use the LS estimator, just the basic DD/RR.

results->xi[i] = (weight0/weightrandom-1.0);

@lgarrison
Copy link
Collaborator

lgarrison commented Oct 16, 2019

No, you're absolutely right! I was mis-remembering that we also supported the natural estimator in convert_3d_counts_to_cf(). Is there another consistency check we could do? Such as, supply analytic DR and RR to the LS estimator, and check that it asymptotes to the natural estimator (i.e. the result of Corrfunc.theory.xi()) for a large number of randoms? I'm not entirely sure that makes sense... but maybe something along those lines? Basically, I want to make sure that we got the -1 correction right in all of our estimators. One can use a small number of data points (e.g. 10) when making this comparison to exaggerate the importance of the ND1 - 1 correction.

@manodeep
Copy link
Owner

@Christopher-Bradshaw Thanks!

@lgarrison Since the pair-counts are designed to be the same for an auto-corr of D1 and a cross-corr of D1 with D1, shouldn't the conversion from pair-counts to the cf be the same?

@manodeep
Copy link
Owner

@lgarrison Would your gist to demonstrate how to correctly account for pair-counts in a Rmin=0.0 bin work?

@lgarrison
Copy link
Collaborator

It's definitely relevant, as it was designed to show that (ND-1)/V is the right density to use for the randoms in the natural estimator: https://gist.github.com/lgarrison/1efabe4430429996733a9d29397423d2

That's why I was thinking that some kind of consistency check or limit should be possible to ensure that LS and the natural estimator are behaving the same with respect to the ND-1 factor.

@Christopher-Bradshaw
Copy link
Contributor Author

That gist is actually really useful for what I am doing - I was using N instead of N - 1 in some places too, so thanks!

The test I had locally that was failing basically just constructed data and randoms with the same clustering and then asserted that the auto-correlation was 0. With this change that is now true. We could maybe just do the same thing for all the places where counts are converted to a cf?

Also, just checking that by natural estimator you mean DD/RR?

@manodeep
Copy link
Owner

manodeep commented Oct 18, 2019

Also, just checking that by natural estimator you mean DD/RR?

@Christopher-Bradshaw Yup!

@manodeep
Copy link
Owner

@Christopher-Bradshaw @lgarrison Does anyone recollect what needs to happen with this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants