In-memory statistics calculation #209

MichalOleszak · 2023-07-10T09:58:26Z

Hello,

Do you support in-memory computation of statistics, or are you planning to add such a feature?

Details

I'm missing the possibility to obtain statistics like the ones returned by imagelab.get_stats() for an image that is not stored in a filesystem, but rather is kept in memory.

Let's say I have a vision model deployed and it receives an image for inference via a REST API. The image is a numpy array or a PIL Image. I'd like to be able to obtain the statistics for it before passing it to the model for inference. A working solution I came up with is saving the image to a tempdir and calling cleanvision on it, but this unsurprisingly is very slow.

In case you are not planning on developing such a feature, could you please advise on a faster workaround than using tempdir? Thanks!

The text was updated successfully, but these errors were encountered:

sanjanag · 2023-07-10T12:36:08Z

Hi @MichalOleszak !
Thanks for your question. You can use cleanvision on in-memory images by wrapping them in a hugginface Dataset object.
Here's a code snippet doing that

from PIL import Image
import os
from datasets import Dataset
from cleanvision import Imagelab

if __name__ == "__main__":
    # loading images in-memory
    files = os.listdir("./tests/data")
    fpaths = [os.path.join("./tests/data", f) for f in files]
    image_list = [Image.open(f) for f in fpaths]
    
    # construct in-memory dataset
    mydict = {"image": image_list}
    dataset = Dataset.from_dict(mydict)
    
    # call cleanvision on this dataset
    imagelab = Imagelab(hf_dataset=dataset, image_key="image")
    imagelab.find_issues()
    imagelab.report()
    print(imagelab.get_stats())

MichalOleszak · 2023-07-10T13:05:11Z

Hey @sanjanag,

Thanks a lot for a quick reply!

The solution you suggest works well, but from my quick&dirty experiments it seems to follow that for a single image (which is the use case I'm the most interested in) it's actually slower than dumping to a tempdir.

I assume you are not planning to expose APIs in the form of get_brightness(img: Image) -> float?

sanjanag · 2023-07-12T13:45:38Z

Hi @MichalOleszak ! That sure looks like a good use case. We already have the code for computing these stats in bulk but not per image. But it should not be difficult to get those. You can find related code in image_property.py. If you take a look at the implemented ImageProperty classes, the calculate() method computes the raw value of the statistic and the get_scores() method converts it into a score between 0 and 1.
Would you be interested in working on exposing these statistics methods from the package for the use case mentioned above?

jwmueller · 2023-07-13T15:29:37Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-memory statistics calculation #209

In-memory statistics calculation #209

MichalOleszak commented Jul 10, 2023

sanjanag commented Jul 10, 2023

MichalOleszak commented Jul 10, 2023

sanjanag commented Jul 12, 2023 •

edited

jwmueller commented Jul 13, 2023

In-memory statistics calculation #209

In-memory statistics calculation #209

Comments

MichalOleszak commented Jul 10, 2023

Details

sanjanag commented Jul 10, 2023

MichalOleszak commented Jul 10, 2023

sanjanag commented Jul 12, 2023 • edited

jwmueller commented Jul 13, 2023

sanjanag commented Jul 12, 2023 •

edited