add estimate_sigma function to threshold features #395

micha2718l · 2018-08-03T04:28:06Z

This PR adds a feature to estimate the noise variance of a signal. This is useful in many thresholding algorithms such as 'VisuShrink' and 'RiskShrink'. Below is an example using this function to find a good threshold value for a contrived data set with noise added. This shows how choosing a threshold too high or too low will not perform as well.
Inspired by Issue #394

There are remaining details to work out yet, I don't know what the most correct thing to do on line 319 of _thresholding.py, as we do not have scipy as a dependency.

I am sure there are other little things as well as a better test set and docs, and a final decision on the name of the function and where it will live.

Middle right figure shows the correct threshold applied, top is too high and bottom is too low.

import matplotlib.pyplot as plt
import pywt

N = 10000
sigmaIn = 0.25
data = pywt.data.demo_signal('doppler', n=N)
data_noised = data + np.random.normal(0,sigmaIn,N)

wavelet = 'db4'
coeffs = pywt.wavedecn(data_noised, wavelet=wavelet)
# Detail coefficients at each decomposition level
dcoeffs = coeffs[1:]
detail_coeffs = dcoeffs[-1]['d']
sigma = pywt.estimate_sigma(detail_coeffs)
print(f'sigma(estimated) = {sigma}\n'
      f'sigma(real)      = {sigmaIn}\n'
      f'%error           = {(np.abs(sigma-sigmaIn)/sigmaIn)*100:0.2f}%')
# Method for finding threshold, 'VisuShrink'
threshold = sigma*np.sqrt(2*np.log(data_noised.size))

denoised_datas = []
#Denoise the data using the 'correct' threshold as well as one too high
#and one too low, it can be seen that too high will distort the signal
#while too low will leave in too much of the noise.

for thresh in [threshold*2, threshold, threshold/2]:
    denoised_detail = [{key: pywt.threshold(level[key],
                                value=thresh) for key in level}
                                for level in dcoeffs]
    denoised_coeffs = [coeffs[0]] + denoised_detail
    denoised_datas.append(pywt.waverecn(denoised_coeffs, wavelet))

for i, denoised_data in enumerate(denoised_datas):
    plt.subplot(3,3,1+3*i)
    plt.plot(data)
    plt.subplot(3,3,2+3*i)
    plt.plot(data_noised)
    plt.subplot(3,3,3+3*i)
    plt.plot(denoised_data)
plt.tight_layout()
plt.show()``` #

codecov-io · 2018-08-04T18:59:09Z

Codecov Report

Merging #395 into master will decrease coverage by 0.09%.
The diff coverage is 64.7%.

@@            Coverage Diff            @@
##           master     #395     +/-   ##
=========================================
- Coverage   84.52%   84.43%   -0.1%     
=========================================
  Files          22       22             
  Lines        3600     3616     +16     
  Branches      627      630      +3     
=========================================
+ Hits         3043     3053     +10     
- Misses        489      493      +4     
- Partials       68       70      +2

Impacted Files	Coverage Δ
pywt/_thresholding.py	`82.27% <64.7%> (-5.03%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4be78d...407985b. Read the comment docs.

rgommers · 2018-08-08T16:12:38Z

pywt/_thresholding.py

+
+    if distribution.lower() == 'gaussian':
+        # 75th quantile of the underlying, symmetric noise distribution
+        # denom = scipy.stats.norm.ppf(0.75)


In addition to a string name, you could accept any object with a ppf method here perhaps (assuming that works, didn't read the paper)? Just check with if hasattr(distribution, 'ppf').

The bigger issue here seems to be the hardcoding of the 75th quantile. Why is that not a keyword?

It seems like both choices originate from specific suggestions by the author, I agree that the best solution would be to have any distribution or quantile as valid inputs, as that could prove useful. The issue I saw is the use of scipy.stats in the original code, because PyWavelets doesn't rely on scipy (right?) I'm not sure what the best thing to do there is, including scipy for just this is overkill, perhaps a couple hand coded options for now? Or for just Gaussian with a variable percent input, shouldn't be too hard to get going without scipy.

Btw, nice suggestion for finding if a distribution supports a method, seeing the example it seems obvious now.

Indeed we don't want to rely on scipy. That's why I suggested checking for a ppf method; users can still pass in scipy.stats distributions but we don't need to import from scipy anywhere so we don't rely on scipy.

A concise explanation of the rationale for use of the 75th quantile in MAD is given on the following wikipedia page:
https://en.wikipedia.org/wiki/Median_absolute_deviation
Perhaps it is worth adding that to the References? I don't remember if it was spelled out explicitly in Donoho's paper

I also agree about not relying on scipy here. scipy was already a dependency of scikit-image and a reviewer of my PR there preferred to call the ppf method explicitly instead of hard-coding the value, but that doesn't mean we have to do the same here.

The idea to check for a ppf method is nice.

Another approach to deal with other noise distributions such as Poisson noise is to first apply a variance stabilizing transform prior to calling this Gaussian-based MAD method. An example implementation of this type of approach for use with Rician noise is available here, but it is not under compatible license terms.

micha2718l · 2018-08-17T04:52:22Z

pywt/_thresholding.py

-    if distribution.lower() == 'gaussian':
+    if hasattr(distribution, 'ppf'):
+        if not kwargs:
+            kwargs = {'q': 0.75}


I originally thought it would be good to auto-fill the arguments with 75th quantile in the case there are no args passed in to keep up the defaults, but this may be a bad idea in case someone wants to use a ppf method which takes no arguments (as this would result in passing in an unwanted keyword argument).

catch up to master

grlee77 · 2019-10-15T15:53:42Z

Hi @micha2718l. Over in #394, @rgommers suggested renaming to estimate_noise or estimate_noiselevel. I think estimate_noise sounds reasonable and may be clearer to users than estimate_sigma. What do you think?

micha2718l · 2019-10-15T18:23:55Z

I agree that estimate_noise is probably best. I'll go ahead and change it.

Hi @micha2718l. Over in #394, @rgommers suggested renaming to estimate_noise or estimate_noiselevel. I think estimate_noise sounds reasonable and may be clearer to users than estimate_sigma. What do you think?

micha2718l · 2024-03-06T19:10:53Z

@rgommers @grlee77 Hi All! Sorry for the multi-year delay on this, I had some time and was reminded of this work. Would someone with access please enable the CI workflow here to help with tests (I don't think it existed in this state when I first started on this)/maybe it doesn't need to be enabled, if not then let me know what the next steps need to be.

rgommers · 2024-03-08T09:23:34Z

Hi @micha2718l, thanks for getting back to this. I have just hit the button to have CI run again. Note that this is needed on each new push. To avoid the problem, please feel free to submit a separate PR with for example a typo fix or trivial rewording - I'd be happy to merge such a PR straight away, and once you have a commit in the default branch, GitHub will trigger CI on future pushes to this PR without needing approval.

micha2718l added 7 commits July 29, 2018 21:25

first commit to ensure tool chain correct

48e6006

Naive copy-paste estimate_sigma ripped out of scikit.

f018bd3

Removed image functions. Changed test and example. Comments to match.

3cb502e

Removed temp and vs code files.

25fab87

Fixed variables in estimate_sigma input.

65ce36c

Fixed PEP8 violations.

1e53ece

Remove print from example in doc to get travis to pass 2.7 build.

484856b

rgommers added the enhancement label Aug 8, 2018

rgommers reviewed Aug 8, 2018

View reviewed changes

rgommers added this to the v1.1 milestone Aug 11, 2018

Added ability to pass in object with ppf method.

407985b

micha2718l commented Aug 17, 2018

View reviewed changes

micha2718l added 5 commits March 17, 2019 20:52

Merge branch 'master' into threshold_alpha

557738b

Merge https://github.com/PyWavelets/pywt into threshold_alpha

602ceb0

Remove old tests for threshold.

7dab21a

Merge branch 'PyWavelets-master' into threshold_alpha

bc8b1d5

Fixed test wavelet leftover.

ed1d9ff

grlee77 mentioned this pull request Oct 4, 2019

plans for 1.1 release #520

Closed

grlee77 removed this from the v1.1 milestone Oct 4, 2019

micha2718l and others added 5 commits October 14, 2019 20:06

Merge pull request #2 from PyWavelets/master

e2aa85b

catch up to master

Add estimate_sigma to docs.

11282da

Remove debug wavelet name.

606fae6

Fixing docstring formatting.

0ba3996

Add tests for estimate_sigma

644a1c5

Michael John Haas II and others added 3 commits October 15, 2019 13:29

change estimate_sigma to estimate_noise

1f5beee

Merge remote-tracking branch 'upstream/master' into threshold_alpha

1f284f5

Merge branch 'master' into threshold_alpha

73fa511

removing auto files

1c2a6ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add estimate_sigma function to threshold features #395

add estimate_sigma function to threshold features #395

micha2718l commented Aug 3, 2018 •

edited

codecov-io commented Aug 4, 2018 •

edited

rgommers Aug 8, 2018

micha2718l Aug 11, 2018

micha2718l Aug 11, 2018

rgommers Aug 11, 2018

grlee77 Aug 14, 2018

micha2718l Aug 17, 2018

grlee77 commented Oct 15, 2019

micha2718l commented Oct 15, 2019

micha2718l commented Mar 6, 2024

rgommers commented Mar 8, 2024

add estimate_sigma function to threshold features #395

Are you sure you want to change the base?

add estimate_sigma function to threshold features #395

Conversation

micha2718l commented Aug 3, 2018 • edited

codecov-io commented Aug 4, 2018 • edited

Codecov Report

rgommers Aug 8, 2018

Choose a reason for hiding this comment

micha2718l Aug 11, 2018

Choose a reason for hiding this comment

micha2718l Aug 11, 2018

Choose a reason for hiding this comment

rgommers Aug 11, 2018

Choose a reason for hiding this comment

grlee77 Aug 14, 2018

Choose a reason for hiding this comment

micha2718l Aug 17, 2018

Choose a reason for hiding this comment

grlee77 commented Oct 15, 2019

micha2718l commented Oct 15, 2019

micha2718l commented Mar 6, 2024

rgommers commented Mar 8, 2024

micha2718l commented Aug 3, 2018 •

edited

codecov-io commented Aug 4, 2018 •

edited