-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add image-text-to-text and edit image-to-text task pages #553
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice!
@@ -0,0 +1,32 @@ | |||
## Use Cases | |||
|
|||
### Visual Question Answering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid to put examples of things that are covered in other task pages to avoid confusion. The 3 examples now are already covered. E.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I disagree here. one could segment humans with both image segmentation and mask detection for instance (except for zero shot part). some hugging face tasks are similar. some models are very generalistic that they can handle these tasks altogether and the paradigm is shifting more towards there anyway.
these models are similar either way, if I were an MLE who had to do captioning I'd like to try both VLMs (BTW which perform better IMO) and direct captioning models (image-to-text) which has sole purpose of captioning. I wouldn't like to keep this information from the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can briefly explain this just after the Use Cases
heading. Also, I think we haven't defined VLMs yet. For example:
These models are commonly called vision-language models, or VLMs. They can typically generalize to various types of tasks for which specialist models may also exist. For example, you can use a VLM to caption an image, or you can use specific captioning models as described in the image to text task page.
Edit: VLMs are indeed defined in data.ts
, but I think it doesn't hurt to also mention it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that now we have two tasks that cover the same thing, so it could end up confusing users. As an example, imagine if https://huggingface.co/tasks/text-generation use cases were Translation and Summarization, which are also their own separate tasks.
I wonder if we can use here specific applications rather than sub-tasks covered here (thinking of https://huggingface.co/tasks/text-generation#use-cases as a nice example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@osanseviero I think as a specific/solid use case there's not a lot (they'd be way too specific) or they'd eventually fall under visual question answering/retrieval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an organic way for me to infer use cases for tasks were checking out example inputs to many Spaces based on a specific task, and for VLMs it's mostly visual question answering for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added bunch of other things
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a couple of formatting issues. Agree with Omar's comments :)
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
const taskData: TaskDataCustom = { | ||
datasets: [ | ||
{ | ||
// TODO write proper description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
fyi @pcuenca this PR doesn't have to be merged until Niels' pipeline PR is merged imo |
No description provided.