Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiler failing with use_largeimage=True #577

Open
k-rakovic opened this issue May 31, 2023 · 8 comments
Open

Tiler failing with use_largeimage=True #577

k-rakovic opened this issue May 31, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@k-rakovic
Copy link

k-rakovic commented May 31, 2023

Hi everyone,

I have a dataset consisting of large WSIs in Hamamatsu .ndpi many of which are >5GB in size.
I am able to initialise the slide using something like this:

test_img = Slide('/path/to/image.ndpi', processed_path='/path/to/output', use_largeimage=True)

And do basic tasks, such as:

from histolab.masks import TissueMask

all_tissue_mask = TissueMask()
test_img.locate_mask(all_tissue_mask)

This reads the file and generates a mask, outputting the result.

from histolab.tiler import GridTiler

gtiler = GridTiler(
    tile_size=(224,224),
    check_tissue=True,
    tissue_percent=60,
    pixel_overlap=0,
    mpp=1.8
)

gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')

This fails, with the error:

histolab.exceptions.HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.

Which is the error if you try and load a large image without use_largeimage=True set.
I would expect the uselarge_image flag to get passed into the tiler but this does not appear to be happening.

histolab v0.6
python 3.8

EDIT: typographical error

@k-rakovic k-rakovic added help wanted Extra attention is needed question Further information is requested labels May 31, 2023
@alessiamarcolini
Copy link
Collaborator

Hi @k-rakovic thank you for opening this issue!

The flag use_largeimage is to be passed to the Slide, which internally handles the backend that needs to be used to read such slide.

Anyway, I see that you're passing tile_img to gtiler.extract, and not test_img, is it intended?

@k-rakovic
Copy link
Author

Thanks for replying so soon. Sorry that was a typo on the post. I am passing test_img not tile_img to gtiler_extract but the use_largeimage flag appears not to be following it.

The code should read:

from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMask

test_img = Slide('/path/to/image.ndpi', processed_path='/path/to/output', use_largeimage=True)

all_tissue_mask = TissueMask()
test_img.locate_mask(all_tissue_mask)

gtiler = GridTiler(
    tile_size=(224,224),
    check_tissue=True,
    tissue_percent=60,
    pixel_overlap=0,
    mpp=1.8
)

gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')

The full error log is:

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
OpenSlideError                            Traceback (most recent call last)
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736), in Slide._wsi(self)
    735 try:
--> 736     slide = openslide.open_slide(self._path)
    737 except PIL.UnidentifiedImageError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430), in open_slide(filename)
    429 try:
--> 430     return OpenSlide(filename)
    431 except OpenSlideUnsupportedFormatError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166), in OpenSlide.__init__(self, filename)
    165 self._filename = filename
--> 166 self._osr = lowlevel.open(str(filename))

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199), in _check_open(result, _func, _args)
    198 if err is not None:
--> 199     raise OpenSlideError(err)
    200 return slide

OpenSlideError: Can't validate JPEG for directory 0: Expected marker at 4294972598, found none

During handling of the above exception, another exception occurred:
...
    743 except Exception as other_error:
--> 744     raise HistolabException(other_error.__repr__() + f". {bad_format_error}")
    745 return slide

HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.

@alessiamarcolini
Copy link
Collaborator

Thank you, the error log is useful but I see it's only a partial stack trace (Output exceeds the size limit. Open the full output data in a text editor), could you post it whole?

@k-rakovic
Copy link
Author

Of course, this is the whole output:

---------------------------------------------------------------------------
OpenSlideError                            Traceback (most recent call last)
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736), in Slide._wsi(self)
    735 try:
--> 736     slide = openslide.open_slide(self._path)
    737 except PIL.UnidentifiedImageError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430), in open_slide(filename)
    429 try:
--> 430     return OpenSlide(filename)
    431 except OpenSlideUnsupportedFormatError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166), in OpenSlide.__init__(self, filename)
    165 self._filename = filename
--> 166 self._osr = lowlevel.open(str(filename))

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199), in _check_open(result, _func, _args)
    198 if err is not None:
--> 199     raise OpenSlideError(err)
    200 return slide

OpenSlideError: Can't validate JPEG for directory 0: Expected marker at 4294972598, found none

During handling of the above exception, another exception occurred:

HistolabException                         Traceback (most recent call last)
[/raid/users/kr151p/histolab.ipynb](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/histolab.ipynb) Cell 3 in ()
      1 from histolab.tiler import GridTiler
      3 gtiler = GridTiler(
      4     tile_size=(224,224),
      5     check_tissue=True,
   (...)
      8     mpp=1.8
      9 )
---> 11 gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:384](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:384), in GridTiler.extract(self, slide, extraction_mask, log_level)
    382 level = logging.getLevelName(log_level)
    383 logger.setLevel(level)
--> 384 self._validate_level(slide)
    385 self.tile_size = self._tile_size(slide)
    386 self.pixel_overlap = int(self._scale_factor(slide) * self.pixel_overlap)

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:279](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:279), in Tiler._validate_level(self, slide)
    266 def _validate_level(self, slide: Slide) -> None:
    267     """Validate the Tiler's level according to the Slide.
    268 
    269     Parameters
   (...)
    277         If the level is not available for the slide
    278     """
--> 279     if len(slide.levels) - abs(self.level) < 0:
    280         raise LevelError(
    281             f"Level {self.level} not available. Number of available levels: "
    282             f"{len(slide.levels)}"
    283         )

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:359](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:359), in Slide.levels(self)
    350 @lazyproperty
    351 def levels(self) -> List[int]:
    352     """Slide's available levels
    353 
    354     Returns
   (...)
    357         The levels available
    358     """
--> 359     return list(range(len(self._wsi.level_dimensions)))

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:744](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:744), in Slide._wsi(self)
    740     raise FileNotFoundError(
    741         f"The wsi path resource doesn't exist: {self._path}"
    742     )
    743 except Exception as other_error:
--> 744     raise HistolabException(other_error.__repr__() + f". {bad_format_error}")
    745 return slide

HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.

@ernestoarbitrio
Copy link
Member

Hi @k-rakovic is that .ndpi you're using available somewhere on the internet or is a private/legacy wsi?

@k-rakovic
Copy link
Author

Hi @k-rakovic is that .ndpi you're using available somewhere on the internet or is a private/legacy wsi?

It is unfortunately part of a private dataset so I can't share it. I can view the image in something like QuPath so I know the image file itself is not corrupt.

@alessiamarcolini
Copy link
Collaborator

Ok so actually @k-rakovic you found a bug 🥇

Turns out that Slide.levels called by Tiler._validate_level(slide) does not care about the use_largeimage flag.
We did not realize this because the tests that have use_largeimage=True use the CMU_1_SMALL_REGION which is readable by openslide. We should then use another slide not compatible with openslide to test and fix this.

@ernestoarbitrio ernestoarbitrio added bug Something isn't working and removed help wanted Extra attention is needed question Further information is requested labels May 31, 2023
@k-rakovic
Copy link
Author

@alessiamarcolini
As I said above, I unfortunately can't share any images but I'm happy to help if I can (please note though I'm a pathologist rather than a developer...!). It seems existing image tiling methods which support large ndpi images are thin on the ground so it would be awesome if yours could work!

@nicolebussola nicolebussola self-assigned this Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants