Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-memory statistics calculation #209

Open
MichalOleszak opened this issue Jul 10, 2023 · 4 comments
Open

In-memory statistics calculation #209

MichalOleszak opened this issue Jul 10, 2023 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@MichalOleszak
Copy link

Hello,

Do you support in-memory computation of statistics, or are you planning to add such a feature?

Details

I'm missing the possibility to obtain statistics like the ones returned by imagelab.get_stats() for an image that is not stored in a filesystem, but rather is kept in memory.

Let's say I have a vision model deployed and it receives an image for inference via a REST API. The image is a numpy array or a PIL Image. I'd like to be able to obtain the statistics for it before passing it to the model for inference. A working solution I came up with is saving the image to a tempdir and calling cleanvision on it, but this unsurprisingly is very slow.

In case you are not planning on developing such a feature, could you please advise on a faster workaround than using tempdir? Thanks!

@sanjanag
Copy link
Member

Hi @MichalOleszak !
Thanks for your question. You can use cleanvision on in-memory images by wrapping them in a hugginface Dataset object.
Here's a code snippet doing that

from PIL import Image
import os
from datasets import Dataset
from cleanvision import Imagelab

if __name__ == "__main__":
    # loading images in-memory
    files = os.listdir("./tests/data")
    fpaths = [os.path.join("./tests/data", f) for f in files]
    image_list = [Image.open(f) for f in fpaths]
    
    # construct in-memory dataset
    mydict = {"image": image_list}
    dataset = Dataset.from_dict(mydict)
    
    # call cleanvision on this dataset
    imagelab = Imagelab(hf_dataset=dataset, image_key="image")
    imagelab.find_issues()
    imagelab.report()
    print(imagelab.get_stats())

@MichalOleszak
Copy link
Author

Hey @sanjanag,

Thanks a lot for a quick reply!

The solution you suggest works well, but from my quick&dirty experiments it seems to follow that for a single image (which is the use case I'm the most interested in) it's actually slower than dumping to a tempdir.

I assume you are not planning to expose APIs in the form of get_brightness(img: Image) -> float?

@sanjanag
Copy link
Member

sanjanag commented Jul 12, 2023

Hi @MichalOleszak ! That sure looks like a good use case. We already have the code for computing these stats in bulk but not per image. But it should not be difficult to get those. You can find related code in image_property.py. If you take a look at the implemented ImageProperty classes, the calculate() method computes the raw value of the statistic and the get_scores() method converts it into a score between 0 and 1.
Would you be interested in working on exposing these statistics methods from the package for the use case mentioned above?

@jwmueller
Copy link
Member

See also: #210

@jwmueller jwmueller added help wanted Extra attention is needed hacktoberfest labels Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants