Add `alternative` argument for determining one-tailed or two-tailed permutation tests #199

JosiahParry · 2022-02-13T14:41:15Z

It appears that when a p-value exceeds 0.5, a different method of p-value calculation is used rather than the simulation p-value formula (M + 1) / (R + 1).

The below example manually calculates the p-value for each observation.

Can the p-value calculation method be documented or corrected?

import libpysal
import numpy as np
import geopandas as gpd
from esda.join_counts_local import Join_Counts_Local

fp = libpysal.examples.root + "/guerry/" + "Guerry.shp" 

guerry_ds = gpd.read_file(fp)
guerry_ds['SELECTED'] = 0
guerry_ds.loc[(guerry_ds['Donatns'] > 10997), 'SELECTED'] = 1

w = libpysal.weights.Queen.from_dataframe(guerry_ds)

LJC_uni = Join_Counts_Local(connectivity=w, keep_simulations = True).fit(guerry_ds['SELECTED'])

# Calculate P-values manually

index = np.where(~np.isnan(ps))
index = index[0].tolist()
# I think these are the simulations
sims = LJC_uni.rjoins
sims_p = sims[index]
obs_p = LJC_uni.LJC[index]
ps = LJC_uni.p_sim[index]
nsim = LJC_uni.permutations
sims_p[0]

def p_sim_calc(obs, sims, nsim = 999):
  return (sum(obs <= sims) + 1) / (nsim + 1)  
  
p_sim_calc(obs_p[0], sims_p[0])

p = list()
for i in range(len(ps)):
  x = p_sim_calc(obs_p[i], sims_p[i])
  p.append(x)

print(p, "\n", ps)

Note that when the calculated p-value is 0.5 the report simulated p-value is less than 0.5.

#> [0.409, 0.026, 0.026, 0.662, 0.652, 0.286, 0.662, 0.513,  0.03, 0.131, 0.051] 
#> [0.409, 0.026, 0.026, 0.339, 0.349, 0.286, 0.339, 0.488, 0.03, 0.131, 0.051]

The text was updated successfully, but these errors were encountered:

JosiahParry · 2022-02-14T13:39:30Z

After giving this a more thorough look, it appears that the local join count is choosing the lesser of the 2 p-values from greater or lesser. Is there a way to select either lesser or greater alternatives when calculating the p-values? Choosing the lesser of the 2 feels somewhat off to me

ljwolf · 2022-02-14T15:39:26Z

Thanks for the report! This is the intended behavior of our two-sided permutation test. The LJC does this, as do all of our simulation p-values (any .p_sim) across edsa. Your request to

select either lesser or greater alternatives

amounts to a one-tailed permutation test. We havent implemented a one-tailed version, but we're happy to implemented this with an alternative keyword argument.

To try to explain why we're doing this, maybe this will be helpful.

For most autocorrelation statistics (especially local stats) we don't have a prior hypothesis about the direction of the test. So, it's reasonable to default to a "two-sided test:" we want to know how extreme a given test statistic is under a specific null hypothesis.

Now, we need a way to assess extremity. We don't have anything "analytical" (like, equations & the like), so we synthesize replicates. We simulate from the "conditional randomization null hypothesis" using permutation to build up a set of "fake" local test statistics. We just wrote Sauer et al. (2021) (preprint) with plenty of discussion on the topic, but the original outlines come from Anselin (1995).

Computationally, this means we have a target set of replicates and a test statistic. We want to figure out how "extreme" that test statistic is, given the distribution of replicates. So, we want the fraction of values that are on the "other side" of the test statistic from the rest of the distribution... but how do we define "other side?" There are three ways.

Make assumptions and derive an expected value. This lets us know what "side" of the distribution the test statistic is on. Indeed, this generally even lets us test against a standard normal distribution, so we don't even need the replicates!
"Fold" the distribution; center the distribution of replicates, take the absolute value of the test statistic and the replicates, then count the fraction of larger replicates as the p value. This is justifiable; it gives you p-values between 0 and 1. But, this results in extreme values on the left side of the distribution "counting" as extreme values on the right side (and vice versa), and tests a slightly different null (about |s|, not s). Further, this is sensitive to the folding point (mean vs. median).
Our nonparametric counting strategy. If the test statistic is on the "left" side of the distribution, there are fewer observations to its left. So, count the fraction replicates to the left as the p-value. Alternatively, if the test statistic is on the "right" side of the distribution, count the fraction to the right. This is the flipping you see: by picking the smaller P-value, we're counting the relevant fraction of observations that fall outside of the test statistic.

Where we can, we implement 1 and 3. In Moran_Local, this is p_z_norm and p_sim. We don't implement the folded variant.

from esda.moran import Moran_Local
import numpy
numpy.random.seed(112112)
lmo = Moran_Local(guerry_ds.Donatns, w)

#here, the local replicate matrix is the .rlisas attribute
folded_replicates = numpy.abs(lmo.rlisas - numpy.median(lmo.rlisas, axis=1, keepdims=True))
folded_p_sim = (folded_replicates >= numpy.abs(lmo.Is[:,None])).mean(axis=1)

print((folded_p_sim < .01).sum()) # 0 are clusters using folding
print((lmo.p_sim < .01).sum()) # 9 are clusters using "outside" counting
print((lmo.p_z_sim*2 < .01).sum()) # 2 are clusters assuming a normal approximation

The nonparametric counting strategy was implemented by @sjsrey back in the early days of the library. I think this is a faithful interpretation of why he did it this way, but I don't think we've actually spoken about it explicitly 😮

Note: edits to code example for repro & correctness on the folding

JosiahParry · 2022-02-14T18:28:20Z

Thank you so much for the added color! Much appreciated.
Changing the issue title. Close upon your discretion :)

jGaboardi · 2023-01-08T20:19:03Z

We'll go ahead and close this then, and we can reopen if needed in the future.

ljwolf · 2023-05-11T14:48:50Z

I've had the time to investigate the differences here after ruminating on some email exchanges with @rsbivand, and I think that we may want to report 2*p_sim, not p_sim, if we keep this approach.

For justification, we can think about the folding strategy (p_f) & the "outside" counting (p_o) which is implemented currently. The simplest "folding" strategy takes the absolute value of the random Lisas and counts how many are at least as large as the absolute value of the test statistic.

def folded_p(replicates, observed):
    n_extreme = (numpy.abs(replicates) >= numpy.abs(observed.reshape(-1,1)).sum(axis=1)) + 1
    return n_extreme / (replicates.shape[1]+1)

Using this very simple method, we get the a correlation between the p-values of 99%, but the "outside" p_o values are half as big:

I think this may lead to an overstatement of the number of "significant" local statistics at any given significance threshold. This overstatement seems to be proportional to the number of significant observations, but this largely goes away when we use 2*p_o:

I think The agreement between p_f and 2p_o (yellow line in the right facet) is about what we would expect between two different simulations using the same p-value method. ~~,but I am testing this currently~~. Regardless, you can see on the left pane that folding and outside counting are in much stronger agreement when we double the p-values from outside-counting, 2p_o.

from libpysal import weights, examples
from esda.moran import Moran_Local
import geopandas, numpy


data = geopandas.read_file(examples.get_path("election.shp"))

rook = weights.Rook.from_dataframe(data)
knn1 = weights.KNN.from_dataframe(data, k=1)

w = weights.attach_islands(rook, knn1)


swing = data.pct_dem_16 - data.pct_dem_12
lmo = Moran_Local(swing.values, w, permutations=9999)


p_abs = (numpy.abs(lmo.rlisas) > numpy.abs(lmo.Is[:, None])).sum(axis=1) + 1.0
p_abs /= lmo.permutations + 1
p_sim = lmo.p_sim
numpy.corrcoef(p_abs, p_sim)[0,1]

JosiahParry · 2023-05-11T14:54:44Z

@ljwolf, thank you very much for sharing! If you and Roger do end up doing a thorough write up, please let me know here. I'm following quite intently!

sjsrey · 2023-05-11T23:31:15Z

@JosiahParry the determination of the pseudo-values is done with the following:

esda/esda/crand.py

Lines 222 to 224 in 7f83108

    
           low_extreme = (permutations - larger) < larger 
        
           larger[low_extreme] = permutations - larger[low_extreme] 
        
           p_sim = (larger + 1.0) / (permutations + 1.0)

which is for the local statistics. This follows the logic we use for a global statistic (which is arguably easier to grok):

esda/esda/moran.py

Lines 181 to 190 in 7f83108

    
           if permutations: 
        
               sim = [ 
        
                   self.__calc(np.random.permutation(self.z)) for i in range(permutations) 
        
               ] 
        
               self.sim = sim = np.array(sim) 
        
               above = sim >= self.I 
        
               larger = above.sum() 
        
               if (self.permutations - larger) < larger: 
        
                   larger = self.permutations - larger 
        
               self.p_sim = (larger + 1.0) / (permutations + 1.0)

Your issue makes us realize we need to surface this logic in the documentation. Thanks!

ljwolf · 2023-05-25T00:16:43Z

@sjsrey any perspective on the 2*p_sim issue noted above? I think this would imply the outside counting method we currently implement is a one-tailed procedure, but we're picking and choosing which tail we're looking in on the fly.

sjsrey · 2023-05-26T22:35:17Z

@ljwolf the outside counting method we have is a one-tailed procedure that follows from logic in geoda. It does the selection of the tail for the user. I think we can surface this better in the docs.

For the lisas, I'm not clear on the use of the absolute values? That would seem to suggest you want a two-tailed test, not a directional test?

ljwolf · 2023-05-27T02:22:24Z

Right, so this is my confusion:

My understanding of a "classical" one-tailed test requires you to specify a direction a priori (e.g. alternative="greater"). Outside counting will specify the direction of our one-tailed test on the fly based on the most likely direction for a significant result. This means our testing procedure is equivalent to a two-sided test (counting the replicates that are more extreme than the observed value) at 1 - 2*\alpha%, which is what the simulations above show.

Now, this is normally the case for directed and undirected tests: alternative="greater" will have more power than alternative="two-sided" in the positive direction, but has no power in the negative direction. In our case, the gain in power is not motivated by the prior about the direction of the effect size, though: it comes from running two directed tests at each site and treating the p-value as if it comes from one directed test. Doing two directed tests and reporting the smaller p-value will increase the power of our test at a constant \alpha. Put another way, outside counting is a test for extremal values, since it flips depending on the direction of the replicate distribution. When you test for extremal values, you are doing a two-sided test, whose proper significance level is \alpha/2.

Does that make sense? Maybe I misunderstand.

sjsrey · 2023-05-27T14:35:55Z

I see what you are getting at, and I agree with the points you make.

My interpretation is the logic for the permutation based inference was born in the "pre-local" era (and before geoda) where the focus was on a single test statistic:

[1]

Things are more complicated for the local tests, since we now have n such tests and need to allow that for some locations the relevant tail may change.

At first glance, I think a fix might be to add a kw argument to the global test alternative="greater|lower|two-sided" and we carry on as we currently do, but return 2*p_sim when alternative=two-sided.

At the local level, we interpret all tests as one-tailed with the current logic. We could say the default is directional with an option for two-sided and if the user passes two-sided we return 2*p_sim.

Does this make sense to you?

[1] Hope, A. C. A. (1968). A Simplified Monte Carlo Significance Test Procedure. Journal of the Royal Statistical Society. Series B (Methodological), 30(3), 582–598.

ljwolf · 2023-05-27T19:47:23Z

It makes sense, but our "flip" is not documented in Hope (1968). With Hope's algorithm, you can definitely get one-tailed p-values above .5 for a one-tailed test. Our p-value can never be greater than .5. In the quote clipped above, I'm pretty sure the parenthetical "greater (less)" statement is with reference to "the form of the alternative hypothesis," not the "corresponding values." Once we look in both directions, we have to adjust for the fact that our reference set is now 2*permutations, not permutations. Hope notes this:

Given the simulations above, I don't think we should continue to offer our current procedure. It does the one-tailed procedure described twice (once with alternative="greater" and again with alternative="lesser") and then picks the smaller p value. Presenting the result from two one-tailed tests as if it were a two-tailed test will double your power.

Another point of evidence: if we were to Bonferroni correct for this repeated one-tailed testing procedure (for example), the significance threshold would become alpha/2... meaning that p_sim*2 gives you the "right" result at alpha.

ljwolf · 2023-05-27T20:37:55Z

I should say: I recognize the issue that specifying the direction of local tests one by one would be impossible... but we could provide similar "auto" behavior by picking the direction of the test using EIc? If the test statistic is above its EIc, we run "greater", otherwise we run "lesser". Then, we have four alternative options:

"Greater" using our current routine with no flip
"Lesser" using our current routine with no flip
"Two sided" as the percent of replications outside of the test statistic's percentile and 1-percentile. This should be default.
"directed" as 2* our current routine, or picking the direction for each site depending on its EIc

sjsrey · 2023-05-28T17:34:33Z

I agree on the new logic. I'm a bit hesitant to deprecate the current behavior as it is based on what geoda does. Might we consider a policy="hope|geoda" parameter with the default hope that implements the new logic? If the user specifies policy=geoda it returns the current implementation.

ljwolf · 2023-05-28T18:29:27Z

Sounds like a great idea to me!

rsbivand · 2023-05-28T18:51:44Z

Very interesting discussion! I'll align spdep with your choices, probably with "hope" as default, once esda has been updated.

JosiahParry · 2024-02-14T16:39:49Z

Hey folks! This issue has been brought up again in regards to spdep in a private email to me. Has esda modified the approach to address the alternative hypotheses such that the count of extremes is done in one tail in those cases? I think it would be a worthwhile effort for both esda and spdep to complete this. It almost feels as if an incorrect significance is being reported. For the sake of science I'd might even be willing to....write python! If it would be helpful! ;)

ljwolf · 2024-02-14T17:04:07Z

It's not implemented yet, but we agreed above to implement the following:

greater using our current routine, checking only the right tail with no flip
lesser using our current routine, checking only the left tail with no flip
two-sided as our current routine, but doubling the p-value). this should be the default.
directed as our current routine.

I also want to test whether a percentile rule will work (find the percentile p of the test statistic in the permutation distribution, then calculate the % of permutations outside of p and 1-p.)

I think that doing this should be easy, and should be done in a general function like:

def calculate_significance(test_stat, reference_distribution, method='two-sided'):
   ...

which we then just call within every statistic.

JosiahParry · 2024-02-14T17:05:18Z

Sounds good! Should this issue be reopened or another made with less baggage?

ljwolf · 2024-02-14T17:06:16Z

I should say, @JosiahParry if you have time to start defining def calculate_significance() function in a separate file like significance.py, I can start pushing on that with you later this week 😄 I would really like this done by next release.

keep the discussion here, and I will reopen the issue.

JosiahParry changed the title ~~Local Join Count p-values possibly incorrect~~ Add alternative argument for determining one-tailed or two-tailed permutation tests Feb 14, 2022

jGaboardi closed this as completed Jan 8, 2023

rsbivand mentioned this issue Dec 5, 2023

[Doc] improve documentation for p-values r-spatial/spdep#138

Open

ljwolf reopened this Feb 14, 2024

JosiahParry mentioned this issue Feb 14, 2024

[Draft] pseudo-p significance calculation #281

Open

jGaboardi assigned JosiahParry Mar 5, 2024

rsbivand mentioned this issue Apr 14, 2024

Addition of two new versions of the Moran scatterplot plus fixing of a minor typo in the documentation for moran.plot. r-spatial/spdep#149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `alternative` argument for determining one-tailed or two-tailed permutation tests #199

Add `alternative` argument for determining one-tailed or two-tailed permutation tests #199

JosiahParry commented Feb 13, 2022 •

edited

JosiahParry commented Feb 14, 2022

ljwolf commented Feb 14, 2022 •

edited

JosiahParry commented Feb 14, 2022

jGaboardi commented Jan 8, 2023

ljwolf commented May 11, 2023 •

edited

JosiahParry commented May 11, 2023

sjsrey commented May 11, 2023

ljwolf commented May 25, 2023

sjsrey commented May 26, 2023

ljwolf commented May 27, 2023 •

edited

sjsrey commented May 27, 2023 •

edited

ljwolf commented May 27, 2023

ljwolf commented May 27, 2023

sjsrey commented May 28, 2023

ljwolf commented May 28, 2023

rsbivand commented May 28, 2023

JosiahParry commented Feb 14, 2024

ljwolf commented Feb 14, 2024 •

edited

JosiahParry commented Feb 14, 2024

ljwolf commented Feb 14, 2024

Add alternative argument for determining one-tailed or two-tailed permutation tests #199

Add alternative argument for determining one-tailed or two-tailed permutation tests #199

Comments

JosiahParry commented Feb 13, 2022 • edited

JosiahParry commented Feb 14, 2022

ljwolf commented Feb 14, 2022 • edited

JosiahParry commented Feb 14, 2022

jGaboardi commented Jan 8, 2023

ljwolf commented May 11, 2023 • edited

JosiahParry commented May 11, 2023

sjsrey commented May 11, 2023

ljwolf commented May 25, 2023

sjsrey commented May 26, 2023

ljwolf commented May 27, 2023 • edited

sjsrey commented May 27, 2023 • edited

ljwolf commented May 27, 2023

ljwolf commented May 27, 2023

sjsrey commented May 28, 2023

ljwolf commented May 28, 2023

rsbivand commented May 28, 2023

JosiahParry commented Feb 14, 2024

ljwolf commented Feb 14, 2024 • edited

JosiahParry commented Feb 14, 2024

ljwolf commented Feb 14, 2024

Add `alternative` argument for determining one-tailed or two-tailed permutation tests #199

Add `alternative` argument for determining one-tailed or two-tailed permutation tests #199

JosiahParry commented Feb 13, 2022 •

edited

ljwolf commented Feb 14, 2022 •

edited

ljwolf commented May 11, 2023 •

edited

ljwolf commented May 27, 2023 •

edited

sjsrey commented May 27, 2023 •

edited

ljwolf commented Feb 14, 2024 •

edited