Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Local (aspatial) Correlation #89

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

ljwolf
Copy link
Member

@ljwolf ljwolf commented Sep 17, 2019

Correlation coefficients can also be "localized" as a LISA:

r[i] = (x[i]*y[i]) / numpy.sqrt((x**2).sum() * (y**2).sum())

for any two variates x,y. This provides the "contribution" each site makes to the global correlation between two variables. When the statistic is large (close to one/negative one), the site contributes to the correlation in the direction of the sign of this local statistic. When this is small (close to zero), the site isn't as important to the correlation.

  1. I believe this also gives us local spearman, if x,y are ranked.
  2. This'd also be helped along by abstracting the permutational inference machinery (both global/unconditioned and local/conditional) into a mixin class.
  3. If this goes here, should Tau_Local go here, too? or, atleast cross-listing them by importing giddy & adding them to an esda.correlation namespace?
  4. Before this PR gets merged, we will need to:
  • actually implement the statistic
  • run the input checks
  • clarify/disclaim the null in the local permutations for the docstring.

@ljwolf ljwolf added enhancement new-estimator WIP work in progress (for discussion) labels Sep 17, 2019
@ljwolf
Copy link
Member Author

ljwolf commented Sep 18, 2019

hey @weikang9009 @sjsrey, with giddy.rank.Tau_Local, I don't see an explicit permutational inference strategy. Thinking about it, I'm not sure how to do this for a correlation coefficient: an observation with a small X but outlying Y would have strongly different reference distributions for their local statistic if you permute X and hold Y fixed vs. permuting Y holding X fixed. I'm not yet sure if randomly picking which variate to permute each iteration will work, either.

Do you (A) have a permutation strategy in the works for giddy.rank.Tau_Local, or (B) know of any relevant lit that might suggest a permutation test for this?

@weikang9009
Copy link
Member

weikang9009 commented Dec 19, 2019

  1. If this goes here, should Tau_Local go here, too? or, at least cross-listing them by importing giddy & adding them to an esda.correlation namespace?

We all think it makes sense to (1) move the source code of Kendall's tau as well as its spatial counterpart and local decomposition to esda; (2) in giddy, import these functions from esda ; (3) keep the current APIs of these functions in giddy intact. An esda.correlation namespace sounds very reasonable!

@ljwolf if you can start an esda.correlation namespace, I will move the source code from giddy to this namespace and adjust giddy accordingly.

@weikang9009
Copy link
Member

weikang9009 commented Dec 19, 2019

hey @weikang9009 @sjsrey, with giddy.rank.Tau_Local, I don't see an explicit permutational inference strategy. Thinking about it, I'm not sure how to do this for a correlation coefficient: an observation with a small X but outlying Y would have strongly different reference distributions for their local statistic if you permute X and hold Y fixed vs. permuting Y holding X fixed. I'm not yet sure if randomly picking which variate to permute each iteration will work, either.

Do you (A) have a permutation strategy in the works for giddy.rank.Tau_Local, or (B) know of any relevant lit that might suggest a permutation test for this?

This is a very reasonable point. We do not have inference for local Tau in giddy at the moment - it seems to be a pretty tricky one with permutation-based inference. I completely agree with you - permuting the values of X would very possibly give quite different results from permuting Y. We have done some simulation experiments to examine the sampling distributions of local Tau sorted by starting ranks (like ranks in X here), these distributions vary to a great extent - starting with a middle rank will have a much narrow distribution than starting with a more extreme rank.

Since for investigating the dynamics (or exchanges) with tau or local tau, there is a temporal dimension, we can potentially use the starting rank as the reference point to build the sampling distribution. I guess we can try considering a certain variable (X or Y) as the conditioning variable and build sampling distributions starting from there?

@ljwolf
Copy link
Member Author

ljwolf commented Apr 9, 2020

Revisiting this with @weikang9009's advice, I've started to implement the following strategy for inference.
Pearon_Local would take a conditional_inference keyword with four options:

  • conditional_inference=False. For each observation, fix their site values. Randomize both the x & the y of remaining sites independently in all k local permutations. It should correctly condition on the local value of xi*yi relative to the space of all remaining cross-products.
  • conditional_inference='x' conditions on site i's x[i] value. For each observation, fix its x value. Randomize all y values & remaining x values in all k local permutations.
  • conditional_inference='y' conditions on site i's y[i[ value. For each observation, fix its y value. Randomize all x values and remaining y values in all k local permutations.
  • conditional_inference=True splits permutations//2 and runs half with x conditional and half with y conditionals for each site. Then, a post-hoc paired t-test is done for each site to check if the two distributions have the same mean correlation. If they don't, then a warning is raised suggesting to use conditional_inference = False.

So,

  1. We might want to change this to read conditional_on or something?
  2. Is it OK to re-use permutations? For instance, the code currently fixes all sites' x and shuffles Y. Since there's no spatial configuration, this should be ok?
  3. We still need to force the site-specific conditioning when conditional_inference=False, too. Right now, it's permuting all observations every time. It should be sufficient to:
  • numpy.delete(permutation, i), and then use row_stack(iless_permutation, (xi, yi) for the ccomputation of the statistic.
  1. whatever is implemented should also apply for Tau_Local, and I think the right move is the conditional_inference=False. @sjsrey, @weikang9009, perspective?

ljwolf added a commit to ljwolf/esda that referenced this pull request May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new-estimator WIP work in progress (for discussion)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants