WIP: Weighted correlations #508

calum-chamberlain · 2022-08-05T02:15:06Z

What does this PR do?

Adds the ability to weight cross-correlations. Weights are currently expected to be in tr.stats.extra.weight for all stream correlation functions. This PR should be considering in tandem with #506

Why was it initiated? Any relevant Issues?

#445

PR Checklist

develop base branch selected?
This PR is not directly related to an existing issue (which has no PR yet).
All tests still pass.
Any new features or fixed regressions are be covered via new tests.
Any new or changed features have are fully documented.
Significant changes have been added to CHANGES.md.
~~- [ ] First time contributors have added your name to CONTRIBUTORS.md.~~

flixha · 2022-08-08T14:11:22Z

@calum-chamberlain From the current commits, it reads to me that you are right now intending to give one weight to each template, rather than weight each trace in the template during the correlation - is that right? Or were you also intending to let traces in each template be weighted, e.g. like fast_matched_filter can do?
Maybe it's too early to ask because it's still work in progress; I did a few tests with trace-weights in FMF and I think it would be promising to define trace weights based on the SNR of each trace in the template.

calum-chamberlain · 2022-08-08T22:00:47Z

I definitely intend to weight individual channels of individual templates in the same way that fast_matched_filter can. This is supported by these commits (I think!).

The array correlation functions that take multiple templates and one "stream" work on single-channel data, so only one weight per template is passed. The weights are extracted from the templates in the _get_array_dicts function here and are in the same shape as the pads (one for each trace of each template).

There are a lot of small changes to the correlation code to support this, and I might keep this PR separate from another PR that would actually open up the interface to make templates with weights.

flixha · 2023-01-05T14:03:30Z

Right now it's possible to set trace weights greater 1, and then one can run into an AssertionError when the detect_val is greater than the number of channels in

EQcorrscan/eqcorrscan/core/match_filter/detection.py

Line 97 in f5b8aa5

assert abs(self.detect_val) <= self.no_chans

To solve that, here are some options:

do not allow weights to be larger than 1 / smaller than -1
allow detect_val to become greater than number of channels.
for each template, normalize weights according to the number of nonzero weights (i.e., weights for real channels, excluding nan-channels etc.).

All options have an effect on the detection results with threshold_type='absolute' and 'av_chan_corr' that may be a bit hard to follow / interpret. With MAD-threshold the results should be independent of the normalization, except for very small weights below the stability threshold that matters in some correlation backends. I prefer option 3 because I think that the absolute detection values are still somewhat interpretable compared compared to the other options.

Code to normalize weights according to number of nonzero weights in `eqcorrscan.utils.correlate._get_array_dicts`:

    # Normalize weights for each template so that CC sum stays smaller than
    # number of channels (else: AssertionError in Detection).
    weights = np.array([template_weights
                        for seed_id, template_weights in weight_dict.items()])
    weights = weights * (np.count_nonzero(weights, axis=0, keepdims=True) /
                         weights.sum(axis=0, keepdims=True))
    # weights = np.float32(weights)  # Required for FFTW backend
    Logger.info('Setting weights from trace-stats, minimum weight: %s, '
                'maximum weight %s, total sum of weights: %s',
                np.min(weights), np.max(weights), np.sum(weights))
    # Update weights in weight-dict with normalized values:
    for seed_id, normalized_template_weights in zip(seed_ids, weights):
        weight_dict.update({seed_id: normalized_template_weights})```

calum-chamberlain · 2023-01-08T20:57:12Z

Good point @flixha - I agree that interpreting the weighted absolute and av_chan correlation results will be less intuitive. For that reason I prefer to enforce that the user set weights between +/-1 rather than us normalize them for them. I can imagine the case where someone thinks they know what they are doing and sets a threshold and weights carefully which we then normalize resulting in unexpected results.

If we do enforce that then we should check the weights as soon as possible to avoid wasting people's time (e.g. raise an error as soon as possible, and maybe suggest some normalized weights that they can paste into their code).

calum-chamberlain added 4 commits August 4, 2022 20:06

WIP:Add test for weighted correlations

695ff58

Implement weights

ff66165

Test array correlations

3d62d54

Document weighting

487f99d

calum-chamberlain added the utils.correlate label Aug 5, 2022

calum-chamberlain added this to the 0.5.0 milestone Aug 5, 2022

calum-chamberlain added 5 commits August 9, 2022 10:00

Merge branch 'develop' into weighted-correlations

45befc7

Merge branch 'develop' into weighted-correlations

bb9c1c0

Merge branch 'develop' into weighted-correlations

3a38bc6

Merge branch 'develop' into weighted-correlations

20109be

Merge branch 'develop' into weighted-correlations

f5b8aa5

Merge branch 'develop' into weighted-correlations

8410d97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Weighted correlations #508

WIP: Weighted correlations #508

calum-chamberlain commented Aug 5, 2022

flixha commented Aug 8, 2022

calum-chamberlain commented Aug 8, 2022

flixha commented Jan 5, 2023 •

edited

calum-chamberlain commented Jan 8, 2023

WIP: Weighted correlations #508

Are you sure you want to change the base?

WIP: Weighted correlations #508

Conversation

calum-chamberlain commented Aug 5, 2022

What does this PR do?

Why was it initiated? Any relevant Issues?

PR Checklist

flixha commented Aug 8, 2022

calum-chamberlain commented Aug 8, 2022

flixha commented Jan 5, 2023 • edited

Code to normalize weights according to number of nonzero weights in eqcorrscan.utils.correlate._get_array_dicts:

calum-chamberlain commented Jan 8, 2023

flixha commented Jan 5, 2023 •

edited

Code to normalize weights according to number of nonzero weights in `eqcorrscan.utils.correlate._get_array_dicts`: