-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yoloworld use images as classes #12793
base: main
Are you sure you want to change the base?
Yoloworld use images as classes #12793
Conversation
All Contributors have signed the CLA. ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @hoeflechner, thank you for submitting an Ultralytics YOLOv8 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with
ultralytics/ultralytics
main
branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by runninggit pull
andgit merge main
locally. - ✅ Verify all YOLOv8 Continuous Integration (CI) checks are passing.
- ✅ Update YOLOv8 Docs for any new or updated features.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee
See our Contributing Guide for details and let us know if you have any questions!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #12793 +/- ##
==========================================
- Coverage 70.03% 68.03% -2.01%
==========================================
Files 124 124
Lines 15723 15721 -2
==========================================
- Hits 11012 10695 -317
- Misses 4711 5026 +315
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I have read the CLA Document and I sign the CLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a very cool idea! I personally can't give you a full review of the proposed changes, but I've added a few notes to consider. Hopefully @Laughing-q will have a chance to take a look at these changes soon 🚀
…ner/ultralytics into yoloworld_image_classes
@hoeflechner Thanks for the PR! This feature seems really awesome! from ultralytics.data import load_inference_source
self.dataset = load_inference_source(
source=source,
batch=self.args.batch,
vid_stride=self.args.vid_stride,
buffer=self.args.stream_buffer,
) Then we're able to support all the formats just like our predictor, no matter it's a file or a directory or ndarrays. As reference, here's the preprocess from clip repo. Noted the def _transform(n_px):
return Compose([
Resize(n_px, interpolation=BICUBIC),
CenterCrop(n_px),
_convert_image_to_rgb,
ToTensor(),
Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
]) |
@Laughing-q thanks for your feedback. I will try to implement as proposed |
i switched to |
The ultralytics/ultralytics/data/build.py Lines 190 to 202 in d7bbfa4
from PIL import Image and all related code using Image (conversion to RGB is carried out by the LoadPilAndNumpy class).
|
@Burhan-Q I used this approach to load the images but as @Laughing-q pointed out it returns a ndarray. clip requires a pil image. So I used PILs function to convert it. |
@hoeflechner I'm not familiar with the inputs for CLIP but I see now the comment from @Laughing-q above with regard to needing some sort of preprocessing. I'll let him weight in from here then. Thank you! |
@Burhan-Q Clip returns a function that transforms a pil image into its own torch tensor. @Laughing-q was suggesting to write a function that does thesame with a ndarray. For me this probably more work than the hole patch I proposed as I have very little understanding about the internal mechanisms of clip and ultralytics... I also think it could lead to problems if clip changes it's internal tensor format. |
@hoeflechner @Burhan-Q Guys I polished this PR a little bit and currently it supports all the source formats that we support for predictor(except the @glenn-jocher This PR added support of images as input for |
Use Images as classes in YoloWorld
Extension to the
model.set_classes()
method so it optionally accepts images:The clip model provides an encoder for text as well as for images. This encoder is used to search for occurences of one image in another image.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced the
set_classes
method in the YOLO model to accept images for class specification.📊 Key Changes
set_classes
for converting image paths to PIL images and generating image features using the CLIP model.🎯 Purpose & Impact
This update is a step toward making AI models more interactive and user-friendly while potentially improving performance through richer input methods.