Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filters.rank.mean_percentile gives unexpected zeros #7096

Open
ryandeon opened this issue Aug 15, 2023 · 10 comments · May be fixed by #7111
Open

filters.rank.mean_percentile gives unexpected zeros #7096

ryandeon opened this issue Aug 15, 2023 · 10 comments · May be fixed by #7111
Labels
🐛 Bug 😴 Dormant no recent activity

Comments

@ryandeon
Copy link

Description:

When running mean_percentile, I get regions where the input image is completely made up of pixel values > 0, but the output image is set to 0.
I'm expecting this to take the pixels within the footprint, throw away pixel value entries below P0 and above P1, and return the mean. This should be >0, in this case.
output img
Input img

Is is possible that the local population could be completely thrown away, leaving nothing? If the region is very flat, it could have very few distinct pixel values. If all copies of values determined to be > P1 are removed, instead of just the top x% of copies, that might result in the behaviour I see? just guessing.

The test case included seems consistent with that, but I'm not confident it's the same in my real image case.

Way to reproduce:

import numpy as np
import skimage.filters
import skimage.morphology
import matplotlib.pyplot as plt

#actual case
input_img = skimage.io.imread("Input img.png")
output_img = skimage.filters.rank.mean_percentile(input_img,footprint=skimage.morphology.disk(min(input_img.shape)//4),p0=0.25,p1=0.75)
fig,ax = plt.subplots(1,2,sharex='all',sharey='all')
ax[0].imshow(input_img)
ax[1].imshow(output_img)

#Minimal test case
test = np.hstack([np.ones((5,5)),np.zeros((5,5))])
test_out = skimage.filters.rank.mean_percentile(test,footprint=skimage.morphology.disk(2),p0=0.25,p1=0.75)
print(f"output max values is {np.max(test_out)}, but it should contain some higher values!")

Version information:

3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
Windows-10-10.0.19042-SP0
scikit-image version: 0.21.0
numpy version: 1.25.1
@lagru
Copy link
Member

lagru commented Aug 15, 2023

Hey @ryandeon, thanks for the report and effort with the minimal example!

If all copies of values determined to be > P1 are removed, instead of just the top x% of copies, that might result in the behaviour I see?

I don't quite follow this statement? Could you maybe elaborate?

I think the kernel function that selects the pixels from the local neighborhood is this one

cdef inline void _kernel_mean(dtype_t_out* out, Py_ssize_t odepth,
Py_ssize_t[::1] histo,
cnp.float64_t pop, dtype_t g,
Py_ssize_t n_bins, Py_ssize_t mid_bin,
cnp.float64_t p0, cnp.float64_t p1,
Py_ssize_t s0, Py_ssize_t s1) noexcept nogil:
cdef Py_ssize_t i, sum, mean, n
if pop:
sum = 0
mean = 0
n = 0
for i in range(n_bins):
sum += histo[i]
if (sum >= p0 * pop) and (sum <= p1 * pop):
n += histo[i]
mean += histo[i] * i
if n > 0:
out[0] = <dtype_t_out>(mean / n)
else:
out[0] = <dtype_t_out>0
else:
out[0] = <dtype_t_out>0

When I put a print statement in there to debug, it looks like histo contains values from the local neighborhood. In case of your minimal example its an "array" (actually a memoryview) of 256 bins, with only the first or last bin every being non-zero. For i = 0, mean += histo[i] * i doesn't add anything, and for i = 255 (last value), sum <= p1 * pop never evaluates to True. So mean is only ever 0. Does that help?

If you provide me with the original image I can have a look at how the kernel behaves there as well. Or you can try out this patch (or similar yourself):

Subject: [PATCH] Debug _kernel_mean with print
---
Index: skimage/filters/rank/percentile_cy.pyx
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/skimage/filters/rank/percentile_cy.pyx b/skimage/filters/rank/percentile_cy.pyx
--- a/skimage/filters/rank/percentile_cy.pyx	(revision 06b7986cceb4c678f63fe3b0b0ab033e21c00f6f)
+++ b/skimage/filters/rank/percentile_cy.pyx	(date 1692140252761)
@@ -3,6 +3,7 @@
 #cython: nonecheck=False
 #cython: wraparound=False
 
+import numpy as np
 cimport numpy as cnp
 from .core_cy cimport dtype_t, dtype_t_out, _core, _min, _max
 cnp.import_array()
@@ -79,6 +80,9 @@
 
     cdef Py_ssize_t i, sum, mean, n
 
+    with gil:
+        print(np.array(histo))
+
     if pop:
         sum = 0
         mean = 0
@@ -86,10 +90,18 @@
         for i in range(n_bins):
             sum += histo[i]
             if (sum >= p0 * pop) and (sum <= p1 * pop):
+                with gil:
+                    print("  pop, sum, p0, p1", pop, sum, p0, p1, end=" | ")
+                    print("adding bin to mean", i, histo[i])
                 n += histo[i]
                 mean += histo[i] * i
+                with gil:
+                    if histo[i] * i != 0:
+                        print(i, histo[i] * i)
 
         if n > 0:
+            with gil:
+                print("assigning mean", mean, n)
             out[0] = <dtype_t_out>(mean / n)
         else:
             out[0] = <dtype_t_out>0

@ryandeon
Copy link
Author

Hi @lagru
Thanks for that explanation...If I've figured out this algorithm correctly, it's doing something different than I expect. However, I'm not used to looking at cython code, and don't have this all set up on my system, so I haven't tested it out.
I'll try to describe both, to see if I've got it correct. For example, in the case where the local neighbourhood is a 3x3 square of
[[0,0,0], [ 1,1,1], [1,1,1]]
The variable histo will contain a histogram, which is seems like a dictionary with keys from 0 to 255, and values equal to the number of entries in the neighbourhood. {0:3, 1:6} , in my example.
You assign your thresholds of p0 and p1 to pixel values, by multiplying p0 * pop and p1*pop (where pop is the total size of the neighbourhood).
Then you iterate through the histogram, over all unique pixel values, counting up the entries, but only adding to the running total if you're between the two p0,p1 thresholds.
So this point is where it breaks from my expectation. Let's say your p0 = 2/9 and p1=1(for simplicity). I'd expect, in my example above, for us to throw away the first 2 zeros, and take the mean over the rest, giving a result of (0+6*1)/7 . But the algorithm as I understand above would instead add 3 to sum then since sum >= p0*pop (3 > 2), it would add all the zeros into the mean, resulting in (30 + 61) / 9.
In my original minimal test case, I have p0=0.25 and p1=0.75, so in the 3x3 neighbourhood example, we would end up with (3*0)/3, because all 6 of the pixels ==1 were determined to be above the high limit.

What I expect, and maybe you can determine if it's reasonable, is instead for it add fractional amounts of the histogram when needed. maybe something like

for i in range(n_bins): 
  sum += histo[i] 
  if sum-histo[i] < p0*pop and sum > p0*pop: # low fractional case
    n = histo[i] - int(p0*pop) # not sure how you would want to treat the edges . round up, round down...
    mean += n* i
  elif (sum >= p0 * pop) and (sum <= p1 * pop): # same as existing code
    n += histo[i] 
    mean += histo[i] * i
  elif sum - histo[i] < p1*pop and sum > p1*pop: # high fractional case
    n+= int(p1*pop -sum)  
    mean +=  int(p1*pop -sum)  * i

but again, not that I'm used to cython, or have tested this (or optimized it!)

last thing : my original images are included in my first post, for some reason the 'input image' is the second one, and the 'output image' is the first (though I intended to upload them in the correct order). There are two black regions in the output that don't seem to belong.
Are you not able to access them as files? I'm not sure how to get them to you otherwise.

@ryandeon
Copy link
Author

update: I tried to build skimage from source so I could play around with this, but haven't gotten it working yet. It'll be slow, but I'll keep trying when I get a chance.
For now I'm working around it by adding this code, though I know it's not logically equivalent to what I'm looking for. But in my real-image cases, it's doing a good enough job.

#found here: https://stackoverflow.com/a/27745627
    if (invalid:=(output_img ==0)).any():
        ind = ndi.distance_transform_edt(invalid, return_distances=False, return_indices=True)
        output_img = output_img [tuple(ind)]

@lagru
Copy link
Member

lagru commented Aug 18, 2023

Nice, let us know if we can help you with setting up the dev environment. :)

@ryandeon
Copy link
Author

I think I could use some help getting the project built. First time using cython, spin, meson...I'm finding the error messages unhelpful. This thread did not seem like the right place to start troubleshooting build issues, so I found the Zulip and tried there.

@lagru
Copy link
Member

lagru commented Aug 27, 2023

I am still not clear on what behavior to expect from the algorithm. We should get that clear before discussing the implementation. Intuitively I would expect the following:

  • Sort all N pixels in the neighborhood by their intensity.
  • Throw away the first floor(N * p0) pixels.
  • Throw away the last floor(N * (1 - p1)) pixels.
  • Return arithmetic mean of remaining pixels.

So if the neighborhood is [0,0,0,1,1,1,1,1,1]], and p0, p1 = (0.25, 0.75), we throw away the first and last 2 pixels, and calculate the mean of [0,1,1,1,1] which is 0.8 and rounds to 1. Do you agree?

In that case I'd say there is a bug in the current implementation. However,

  • the function's docstring isn't very precise on what behavior to expect
  • the test coverage is pretty poor and also not communicating a lot of intended behavior
  • the original PR with the code Rank filters #348 also doesn't seem to answer the question
  • to my surprise I don't find any similar functions in other image processing libraries that would hint at a convention.

So I am not sure at all, whether the current behavior aligns with the original intention of the authors. I am invoking @scikit-image/core in the hope for some institutional knowledge. 🙏

@lagru
Copy link
Member

lagru commented Aug 27, 2023

Ah, I think I get your suggested implementation in #7096 (comment) now. The problem is, when a portion of a bin should be thrown away and a portion should be factored into the mean. The current implementation just throws away the entire bin. Your suggestion accounts for these cases. 👍

Diff for later

Subject: [PATCH] Fix and optimize _kernel_mean
---
Index: skimage/filters/rank/percentile_cy.pyx
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/skimage/filters/rank/percentile_cy.pyx b/skimage/filters/rank/percentile_cy.pyx
--- a/skimage/filters/rank/percentile_cy.pyx	(revision 5408455469ddf3bc9456a6787f903b69a3a52316)
+++ b/skimage/filters/rank/percentile_cy.pyx	(date 1693157754527)
@@ -5,6 +5,7 @@
 
 cimport numpy as cnp
 from .core_cy cimport dtype_t, dtype_t_out, _core, _min, _max
+from libc.math cimport round, floor, ceil
 cnp.import_array()
 
 cdef inline void _kernel_autolevel(dtype_t_out* out, Py_ssize_t odepth,
@@ -77,22 +78,20 @@
                               cnp.float64_t p0, cnp.float64_t p1,
                               Py_ssize_t s0, Py_ssize_t s1) noexcept nogil:
 
-    cdef Py_ssize_t i, sum, mean, n
+    cdef Py_ssize_t i, _
+    cdef Py_ssize_t mean = 0
+    cdef Py_ssize_t percentile = 0
+    cdef Py_ssize_t lower = <Py_ssize_t>floor(pop * p0)
+    cdef Py_ssize_t upper = <Py_ssize_t>ceil(pop * p1)
+    cdef Py_ssize_t n = upper - lower
 
-    if pop:
-        sum = 0
-        mean = 0
-        n = 0
+    if pop and n:
         for i in range(n_bins):
-            sum += histo[i]
-            if (sum >= p0 * pop) and (sum <= p1 * pop):
-                n += histo[i]
-                mean += histo[i] * i
-
-        if n > 0:
-            out[0] = <dtype_t_out>(mean / n)
-        else:
-            out[0] = <dtype_t_out>0
+            for _ in range(histo[i]):
+                percentile += 1
+                if lower < percentile < upper:
+                    mean += i
+        out[0] = <dtype_t_out>round(mean / n)
     else:
         out[0] = <dtype_t_out>0

@lagru lagru linked a pull request Aug 28, 2023 that will close this issue
@lagru
Copy link
Member

lagru commented Aug 28, 2023

See #7111 for an attempted fix.

This is the output I get for your provided example image with e5e62bb:

import numpy as np
import skimage as ski
import matplotlib.pyplot as plt

#actual case
input_img = ski.io.imread("~/Temp/input.png")
output_img = ski.filters.rank.mean_percentile(
    input_img,
    footprint=ski.morphology.disk(min(input_img.shape)//4),
    p0=0.25,
    p1=0.75
)
fig,ax = plt.subplots(1,2,sharex='all',sharey='all')
ax[0].imshow(input_img)
ax[1].imshow(output_img)
#Minimal test case
test = np.hstack([np.ones((5,5)),np.zeros((5,5))])
test_out = ski.filters.rank.mean_percentile(
    test,
    footprint=ski.morphology.disk(2),
    p0=0.25,
    p1=0.75
)
print(f"output max values is {np.max(test_out)}, but it should contain some higher values!")
fig.show()
# output max values is 255, but it should contain some higher values!

image

How does this look to you?

@ryandeon
Copy link
Author

Hi, thanks for pressing on with this! That is looking very good to me. I haven't spent any more time looking at the back end of this, but I do appreciate seeing the steps and fixes you're detailing, it'll help my understanding.

Copy link

Hello scikit-image core devs! There hasn't been any activity on this issue for more than 180 days. I have marked it as "dormant" to make it easy to find.
To our contributors, thank you for your contribution and apologies if this issue fell through the cracks! Hopefully this ping will help bring some fresh attention to the issue. If you need help, you can always reach out on our forum
If you think that this issue is no longer relevant, you may close it, or we may do it at some point (either way, it will be done manually).

@github-actions github-actions bot added the 😴 Dormant no recent activity label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug 😴 Dormant no recent activity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants