Yoloworld use images as classes #12793

hoeflechner · 2024-05-18T15:34:41Z

Use Images as classes in YoloWorld

Extension to the model.set_classes() method so it optionally accepts images:

from ultralytics import YOLOWorld
model = YOLOWorld('yolov8x-world.pt')

# provide a generic image of a tire
model.set_classes(["bus"],images=["tire.jpg"])
results = model.predict('ultralytics/assets/bus.jpg',conf=0.7)

The clip model provides an encoder for text as well as for images. This encoder is used to search for occurences of one image in another image.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced the set_classes method in the YOLO model to accept images for class specification.

📊 Key Changes

Method signature changed to allow passing both class labels and images to specify what the model should recognize.
Integration of PIL (Python Imaging Library) for image processing.
Ability to use both text labels and images to define classes, enriching the model’s understanding and recognition capability.
Internal handling in set_classes for converting image paths to PIL images and generating image features using the CLIP model.

🎯 Purpose & Impact

Flexibility: Users can now specify classes not only through text but also by providing example images. This enhances the model's adaptability to various contexts and increases ease of use for non-expert users. 🔄
Enhanced Recognition: By using images as part of class specification, the model can potentially improve its accuracy for those specific classes. This can be particularly beneficial in scenarios where certain objects or subjects are best defined visually. 🎯
Inclusivity in Input: This update makes the model more versatile by accepting input in various formats (text and image), catering to a broader range of user needs and use cases. 🌐

This update is a step toward making AI models more interactive and user-friendly while potentially improving performance through richer input methods.

github-actions · 2024-05-18T15:34:53Z

All Contributors have signed the CLA. ✅
_{Posted by the CLA Assistant Lite bot.}

github-actions

👋 Hello @hoeflechner, thank you for submitting an Ultralytics YOLOv8 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify your PR is up-to-date with ultralytics/ultralytics main branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge main locally.
✅ Verify all YOLOv8 Continuous Integration (CI) checks are passing.
✅ Update YOLOv8 Docs for any new or updated features.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

See our Contributing Guide for details and let us know if you have any questions!

codecov · 2024-05-18T15:36:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.03%. Comparing base (11a2ed1) to head (e607968).

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #12793      +/-   ##
==========================================
- Coverage   70.03%   68.03%   -2.01%     
==========================================
  Files         124      124              
  Lines       15723    15721       -2     
==========================================
- Hits        11012    10695     -317     
- Misses       4711     5026     +315

Flag	Coverage Δ
Benchmarks	`35.20% <8.33%> (-0.05%)`	⬇️
GPU	`?`
Tests	`66.25% <100.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hoeflechner · 2024-05-18T15:38:37Z

I have read the CLA Document and I sign the CLA

Burhan-Q

This a very cool idea! I personally can't give you a full review of the proposed changes, but I've added a few notes to consider. Hopefully @Laughing-q will have a chance to take a look at these changes soon 🚀

ultralytics/models/yolo/model.py

…ner/ultralytics into yoloworld_image_classes

Laughing-q · 2024-05-23T06:59:20Z

@hoeflechner Thanks for the PR! This feature seems really awesome!
For the way of loading images, can we just use our internal source loader?

from ultralytics.data import load_inference_source
self.dataset = load_inference_source(
    source=source,
    batch=self.args.batch,
    vid_stride=self.args.vid_stride,
    buffer=self.args.stream_buffer,
)

Then we're able to support all the formats just like our predictor, no matter it's a file or a directory or ndarrays.

And the problem of using our internal loader I think it that the output image would be ndarray with BGR channel order from opencv, but what the clip model expecting is PIL format with RGB order, then I guess we'll need to add a new preprocess here for clip to handle opencv format, with resize and normalization.

As reference, here's the preprocess from clip repo. Noted the CenterCrop is actually not used since it's accepting the same arg n_px as resize operation.

def _transform(n_px):
    return Compose([
        Resize(n_px, interpolation=BICUBIC),
        CenterCrop(n_px),
        _convert_image_to_rgb,
        ToTensor(),
        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
    ])

hoeflechner · 2024-05-23T07:01:44Z

@Laughing-q thanks for your feedback. I will try to implement as proposed

hoeflechner · 2024-05-24T16:11:52Z

i switched to load_inference_source() but for the convertion to PIL i simply used Image.fromarray() not sure what the advantages and disadvantages are.

Burhan-Q · 2024-05-24T17:21:18Z

i switched to load_inference_source() but for the convertion to PIL i simply used Image.fromarray() not sure what the advantages and disadvantages are.

The load_inference_source() function has the capability to load numerous data types

ultralytics/ultralytics/data/build.py

Lines 190 to 202 in d7bbfa4

    
           # Dataloader 
        
           if tensor: 
        
               dataset = LoadTensor(source) 
        
           elif in_memory: 
        
               dataset = source 
        
           elif stream: 
        
               dataset = LoadStreams(source, vid_stride=vid_stride, buffer=buffer) 
        
           elif screenshot: 
        
               dataset = LoadScreenshots(source) 
        
           elif from_img: 
        
               dataset = LoadPilAndNumpy(source) 
        
           else: 
        
               dataset = LoadImagesAndVideos(source, batch=batch, vid_stride=vid_stride)

so you can remove from PIL import Image and all related code using Image (conversion to RGB is carried out by the LoadPilAndNumpy class).

hoeflechner · 2024-05-24T17:46:33Z

@Burhan-Q I used this approach to load the images but as @Laughing-q pointed out it returns a ndarray. clip requires a pil image. So I used PILs function to convert it.

Burhan-Q · 2024-05-24T18:56:23Z

@hoeflechner I'm not familiar with the inputs for CLIP but I see now the comment from @Laughing-q above with regard to needing some sort of preprocessing. I'll let him weight in from here then. Thank you!

hoeflechner · 2024-05-24T19:05:24Z

@Burhan-Q Clip returns a function that transforms a pil image into its own torch tensor. @Laughing-q was suggesting to write a function that does thesame with a ndarray. For me this probably more work than the hole patch I proposed as I have very little understanding about the internal mechanisms of clip and ultralytics... I also think it could lead to problems if clip changes it's internal tensor format.

Laughing-q · 2024-06-06T09:06:07Z

@hoeflechner @Burhan-Q Guys I polished this PR a little bit and currently it supports all the source formats that we support for predictor(except the torch.Tensor type though), and I found we actually have a internal classify_transforms we can reuse for clip preprocessing. Everything looks good to me now!

@glenn-jocher This PR added support of images as input for YOLOWorld.set_classes, which we can now use images to set categories for our yoloworld model, which I feel it's a cool feature.
The PR is ready from my side, if you also find it interesting please take a look when you have time. :)

hoeflechner added 2 commits May 18, 2024 09:32

added images as classes to yolo world

1f74a44

accept filenames as images

6a3aa90

Auto-format by https://ultralytics.com/actions

7d34a44

github-actions bot reviewed May 18, 2024

View reviewed changes

Merge branch 'main' into yoloworld_image_classes

5cb1397

Burhan-Q added the enhancement New feature or request label May 21, 2024

Burhan-Q requested a review from Laughing-q May 21, 2024 22:26

Burhan-Q reviewed May 21, 2024

View reviewed changes

ultralytics/models/yolo/model.py Outdated Show resolved Hide resolved

ultralytics/models/yolo/model.py Outdated Show resolved Hide resolved

hoeflechner and others added 5 commits May 22, 2024 11:20

handle Images as Path and cv2 arrays as well

27cb969

Merge branch 'yoloworld_image_classes' of https://github.com/hoeflech…

2385f50

…ner/ultralytics into yoloworld_image_classes

Auto-format by https://ultralytics.com/actions

e5e3d40

Merge branch 'main' into yoloworld_image_classes

c9eda30

Merge branch 'main' into yoloworld_image_classes

20b9c7f

hoeflechner and others added 3 commits May 24, 2024 18:02

switched to load_inference_source() for loading images

b8c8ca8

Auto-format by https://ultralytics.com/actions

60ed8f2

Merge branch 'main' into yoloworld_image_classes

be6319e

Merge branch 'main' into yoloworld_image_classes

9281613

glenn-jocher added 2 commits May 25, 2024 21:48

Merge branch 'main' into yoloworld_image_classes

a7a4e5a

Merge branch 'main' into yoloworld_image_classes

ef41e08

Laughing-q added the TODO Items that needs completing label May 27, 2024

Laughing-q self-assigned this May 27, 2024

glenn-jocher and others added 9 commits May 27, 2024 22:37

Merge branch 'main' into yoloworld_image_classes

3e7150e

Merge branch 'main' into yoloworld_image_classes

bfb32d1

Merge branch 'main' into yoloworld_image_classes

fd916ff

Merge branch 'main' into yoloworld_image_classes

4c7d0b7

Merge branch 'main' into yoloworld_image_classes

dd17b7a

Merge branch 'main' into yoloworld_image_classes

58c0571

Merge branch 'main' into yoloworld_image_classes

f1c7566

Merge branch 'main' into yoloworld_image_classes

53c0247

Merge branch 'main' into yoloworld_image_classes

e422536

glenn-jocher removed the TODO Items that needs completing label Jun 1, 2024

glenn-jocher and others added 9 commits June 2, 2024 14:08

Merge branch 'main' into yoloworld_image_classes

943be49

Merge branch 'main' into yoloworld_image_classes

8d1684b

Merge branch 'main' into yoloworld_image_classes

4b1f1ae

Update

48bb808

update docstring

0db7c47

Auto-format by https://ultralytics.com/actions

5f09e39

update docs

23a7806

update tests

e87c236

Merge branch 'main' into yoloworld_image_classes

00be6c4

Merge branch 'main' into yoloworld_image_classes

e607968

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yoloworld use images as classes #12793

Yoloworld use images as classes #12793

hoeflechner commented May 18, 2024 •

edited by github-actions bot

github-actions bot commented May 18, 2024 •

edited

github-actions bot left a comment

codecov bot commented May 18, 2024 •

edited

hoeflechner commented May 18, 2024

Burhan-Q left a comment

Laughing-q commented May 23, 2024

hoeflechner commented May 23, 2024

hoeflechner commented May 24, 2024

Burhan-Q commented May 24, 2024

hoeflechner commented May 24, 2024

Burhan-Q commented May 24, 2024

hoeflechner commented May 24, 2024

Laughing-q commented Jun 6, 2024 •

edited

Yoloworld use images as classes #12793

Are you sure you want to change the base?

Yoloworld use images as classes #12793

Conversation

hoeflechner commented May 18, 2024 • edited by github-actions bot

Use Images as classes in YoloWorld

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot commented May 18, 2024 • edited

github-actions bot left a comment

Choose a reason for hiding this comment

codecov bot commented May 18, 2024 • edited

Codecov Report

hoeflechner commented May 18, 2024

Burhan-Q left a comment

Choose a reason for hiding this comment

Laughing-q commented May 23, 2024

hoeflechner commented May 23, 2024

hoeflechner commented May 24, 2024

Burhan-Q commented May 24, 2024

hoeflechner commented May 24, 2024

Burhan-Q commented May 24, 2024

hoeflechner commented May 24, 2024

Laughing-q commented Jun 6, 2024 • edited

hoeflechner commented May 18, 2024 •

edited by github-actions bot

github-actions bot commented May 18, 2024 •

edited

codecov bot commented May 18, 2024 •

edited

Laughing-q commented Jun 6, 2024 •

edited