Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add estimate_sigma function to threshold features #395

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

micha2718l
Copy link

@micha2718l micha2718l commented Aug 3, 2018

This PR adds a feature to estimate the noise variance of a signal. This is useful in many thresholding algorithms such as 'VisuShrink' and 'RiskShrink'. Below is an example using this function to find a good threshold value for a contrived data set with noise added. This shows how choosing a threshold too high or too low will not perform as well.
Inspired by Issue #394

There are remaining details to work out yet, I don't know what the most correct thing to do on line 319 of _thresholding.py, as we do not have scipy as a dependency.

I am sure there are other little things as well as a better test set and docs, and a final decision on the name of the function and where it will live.

9plots_threshold_doppler
Middle right figure shows the correct threshold applied, top is too high and bottom is too low.

import matplotlib.pyplot as plt
import pywt

N = 10000
sigmaIn = 0.25
data = pywt.data.demo_signal('doppler', n=N)
data_noised = data + np.random.normal(0,sigmaIn,N)

wavelet = 'db4'
coeffs = pywt.wavedecn(data_noised, wavelet=wavelet)
# Detail coefficients at each decomposition level
dcoeffs = coeffs[1:]
detail_coeffs = dcoeffs[-1]['d']
sigma = pywt.estimate_sigma(detail_coeffs)
print(f'sigma(estimated) = {sigma}\n'
      f'sigma(real)      = {sigmaIn}\n'
      f'%error           = {(np.abs(sigma-sigmaIn)/sigmaIn)*100:0.2f}%')
# Method for finding threshold, 'VisuShrink'
threshold = sigma*np.sqrt(2*np.log(data_noised.size))

denoised_datas = []
#Denoise the data using the 'correct' threshold as well as one too high
#and one too low, it can be seen that too high will distort the signal
#while too low will leave in too much of the noise.

for thresh in [threshold*2, threshold, threshold/2]:
    denoised_detail = [{key: pywt.threshold(level[key],
                                value=thresh) for key in level}
                                for level in dcoeffs]
    denoised_coeffs = [coeffs[0]] + denoised_detail
    denoised_datas.append(pywt.waverecn(denoised_coeffs, wavelet))

for i, denoised_data in enumerate(denoised_datas):
    plt.subplot(3,3,1+3*i)
    plt.plot(data)
    plt.subplot(3,3,2+3*i)
    plt.plot(data_noised)
    plt.subplot(3,3,3+3*i)
    plt.plot(denoised_data)
plt.tight_layout()
plt.show()``` #

@codecov-io
Copy link

codecov-io commented Aug 4, 2018

Codecov Report

Merging #395 into master will decrease coverage by 0.09%.
The diff coverage is 64.7%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #395     +/-   ##
=========================================
- Coverage   84.52%   84.43%   -0.1%     
=========================================
  Files          22       22             
  Lines        3600     3616     +16     
  Branches      627      630      +3     
=========================================
+ Hits         3043     3053     +10     
- Misses        489      493      +4     
- Partials       68       70      +2
Impacted Files Coverage Δ
pywt/_thresholding.py 82.27% <64.7%> (-5.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4be78d...407985b. Read the comment docs.


if distribution.lower() == 'gaussian':
# 75th quantile of the underlying, symmetric noise distribution
# denom = scipy.stats.norm.ppf(0.75)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to a string name, you could accept any object with a ppf method here perhaps (assuming that works, didn't read the paper)? Just check with if hasattr(distribution, 'ppf').

The bigger issue here seems to be the hardcoding of the 75th quantile. Why is that not a keyword?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like both choices originate from specific suggestions by the author, I agree that the best solution would be to have any distribution or quantile as valid inputs, as that could prove useful. The issue I saw is the use of scipy.stats in the original code, because PyWavelets doesn't rely on scipy (right?) I'm not sure what the best thing to do there is, including scipy for just this is overkill, perhaps a couple hand coded options for now? Or for just Gaussian with a variable percent input, shouldn't be too hard to get going without scipy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, nice suggestion for finding if a distribution supports a method, seeing the example it seems obvious now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we don't want to rely on scipy. That's why I suggested checking for a ppf method; users can still pass in scipy.stats distributions but we don't need to import from scipy anywhere so we don't rely on scipy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concise explanation of the rationale for use of the 75th quantile in MAD is given on the following wikipedia page:
https://en.wikipedia.org/wiki/Median_absolute_deviation
Perhaps it is worth adding that to the References? I don't remember if it was spelled out explicitly in Donoho's paper

I also agree about not relying on scipy here. scipy was already a dependency of scikit-image and a reviewer of my PR there preferred to call the ppf method explicitly instead of hard-coding the value, but that doesn't mean we have to do the same here.

The idea to check for a ppf method is nice.

Another approach to deal with other noise distributions such as Poisson noise is to first apply a variance stabilizing transform prior to calling this Gaussian-based MAD method. An example implementation of this type of approach for use with Rician noise is available here, but it is not under compatible license terms.

@rgommers rgommers added this to the v1.1 milestone Aug 11, 2018
if distribution.lower() == 'gaussian':
if hasattr(distribution, 'ppf'):
if not kwargs:
kwargs = {'q': 0.75}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought it would be good to auto-fill the arguments with 75th quantile in the case there are no args passed in to keep up the defaults, but this may be a bad idea in case someone wants to use a ppf method which takes no arguments (as this would result in passing in an unwanted keyword argument).

@grlee77 grlee77 mentioned this pull request Oct 4, 2019
@grlee77 grlee77 removed this from the v1.1 milestone Oct 4, 2019
@grlee77
Copy link
Contributor

grlee77 commented Oct 15, 2019

Hi @micha2718l. Over in #394, @rgommers suggested renaming to estimate_noise or estimate_noiselevel. I think estimate_noise sounds reasonable and may be clearer to users than estimate_sigma. What do you think?

@micha2718l
Copy link
Author

I agree that estimate_noise is probably best. I'll go ahead and change it.

Hi @micha2718l. Over in #394, @rgommers suggested renaming to estimate_noise or estimate_noiselevel. I think estimate_noise sounds reasonable and may be clearer to users than estimate_sigma. What do you think?

@micha2718l
Copy link
Author

@rgommers @grlee77 Hi All! Sorry for the multi-year delay on this, I had some time and was reminded of this work. Would someone with access please enable the CI workflow here to help with tests (I don't think it existed in this state when I first started on this)/maybe it doesn't need to be enabled, if not then let me know what the next steps need to be.

@rgommers
Copy link
Member

rgommers commented Mar 8, 2024

Hi @micha2718l, thanks for getting back to this. I have just hit the button to have CI run again. Note that this is needed on each new push. To avoid the problem, please feel free to submit a separate PR with for example a typo fix or trivial rewording - I'd be happy to merge such a PR straight away, and once you have a commit in the default branch, GitHub will trigger CI on future pushes to this PR without needing approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants