Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image-text-to-text and edit image-to-text task pages #553

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

merveenoyan
Copy link
Contributor

No description provided.

@merveenoyan merveenoyan changed the title Add image-text-to-text and edit image-to-text Add image-text-to-text and edit image-to-text task pages Mar 14, 2024
@osanseviero osanseviero requested review from pcuenca and removed request for julien-c, gary149, Wauplin and SBrandeis March 16, 2024 09:57
Copy link
Member

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking nice!

packages/tasks/src/tasks/image-text-to-text/data.ts Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/data.ts Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/data.ts Outdated Show resolved Hide resolved
@@ -0,0 +1,32 @@
## Use Cases

### Visual Question Answering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid to put examples of things that are covered in other task pages to avoid confusion. The 3 examples now are already covered. E.g.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree here. one could segment humans with both image segmentation and mask detection for instance (except for zero shot part). some hugging face tasks are similar. some models are very generalistic that they can handle these tasks altogether and the paradigm is shifting more towards there anyway.

these models are similar either way, if I were an MLE who had to do captioning I'd like to try both VLMs (BTW which perform better IMO) and direct captioning models (image-to-text) which has sole purpose of captioning. I wouldn't like to keep this information from the user.

Copy link
Member

@pcuenca pcuenca Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can briefly explain this just after the Use Cases heading. Also, I think we haven't defined VLMs yet. For example:

These models are commonly called vision-language models, or VLMs. They can typically generalize to various types of tasks for which specialist models may also exist. For example, you can use a VLM to caption an image, or you can use specific captioning models as described in the image to text task page.

Edit: VLMs are indeed defined in data.ts, but I think it doesn't hurt to also mention it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be too repetitive. See here for instance the one in data.ts will be at the top, and then right after use case one repeats it. I wouldn't like to add it there
Screenshot 2024-03-25 at 20 49 41

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that now we have two tasks that cover the same thing, so it could end up confusing users. As an example, imagine if https://huggingface.co/tasks/text-generation use cases were Translation and Summarization, which are also their own separate tasks.

I wonder if we can use here specific applications rather than sub-tasks covered here (thinking of https://huggingface.co/tasks/text-generation#use-cases as a nice example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osanseviero I think as a specific/solid use case there's not a lot (they'd be way too specific) or they'd eventually fall under visual question answering/retrieval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an organic way for me to infer use cases for tasks were checking out example inputs to many Spaces based on a specific task, and for VLMs it's mostly visual question answering for instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added bunch of other things

packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a couple of formatting issues. Agree with Omar's comments :)

packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/data.ts Outdated Show resolved Hide resolved
packages/tasks/src/tasks/index.ts Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/about.md Outdated Show resolved Hide resolved
const taskData: TaskDataCustom = {
datasets: [
{
// TODO write proper description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

packages/tasks/src/tasks/image-text-to-text/data.ts Outdated Show resolved Hide resolved
packages/tasks/src/tasks/image-text-to-text/data.ts Outdated Show resolved Hide resolved
merveenoyan and others added 4 commits March 22, 2024 17:11
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@merveenoyan
Copy link
Contributor Author

fyi @pcuenca this PR doesn't have to be merged until Niels' pipeline PR is merged imo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants