Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fo.core.video.make_frames_dataset is sneakily considered a frame view #4397

Closed
1 of 3 tasks
evatt-harvey-salinger opened this issue May 18, 2024 · 9 comments · Fixed by #4416
Closed
1 of 3 tasks

fo.core.video.make_frames_dataset is sneakily considered a frame view #4397

evatt-harvey-salinger opened this issue May 18, 2024 · 9 comments · Fixed by #4416
Labels
bug Bug fixes

Comments

@evatt-harvey-salinger
Copy link

evatt-harvey-salinger commented May 18, 2024

Describe the problem

fo.core.video.make_frames_dataset seems like it should create a basic Dataset, as opposed to  Dataset.to_frames which should make a FramesView. However, this line...
(

dataset = fod.Dataset(name=name, _frames=True)
)
...results in Dataset._is_frames being True.

So even though the dataset is actually a type Dataset, functions that take its view() like annotate consider it a FramesView.

Is this expected behavior? I would assume that make_frames_dataset was specifically intended to create something that was distinct from a frames view, and would be treated as a normal dataset. (I'll note, even dataset.clone returns a dataset where _is_frames=False.)

Code to reproduce issue

vid_dataset = ... # contains video samples
new_dataset = fo.core.video.make_frames_dataset(vid_dataset)
print(new_dataset._is_frames) # True
dataset.annotate(...)
# ValueError: Annotating frames views is not supported
clone = new_dataset.clone
print(clone._is_frames) # False

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Ubuntu
  • Python version (python --version): 3.12.2
  • FiftyOne version (fiftyone --version): 0.23.8
  • FiftyOne installed from (pip or source): pip

Other info/logs

Willingness to contribute

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time
@evatt-harvey-salinger evatt-harvey-salinger added the bug Bug fixes label May 18, 2024
@benjaminpkane
Copy link
Contributor

benjaminpkane commented May 22, 2024

Hi @evatt-harvey-salinger. Under the hood, fo.core.video.make_frames_dataset is the function used to create a Dataset.to_frames() view. The term dataset is likely overloaded in this context, but the other to_* stages use the same nomenclature, e.g. fo.core.patches.make_patches_dataset and Dataset.to_patches()

Zooming out a bit, perhaps adding support (or a best practice) for annotating frame collections is the main goal?

@brimoor
Copy link
Contributor

brimoor commented May 23, 2024

@evatt-harvey-salinger thanks for calling this out!

I think it is a valid use case to directly call methods like make_frames_dataset() and that, indeed, you should get a "regular" dataset when you do that. This will be supported as of #4416.

In the meantime, it is slightly less efficient, but you can achieve the same end result via clone() like this:

patches_dataset = sample_collection.to_patches(...).clone()
frames_dataset = sample_collection.to_frames(...).clone()
clips_dataset = sample_collection.to_clips(...).clone()

@evatt-harvey-salinger
Copy link
Author

evatt-harvey-salinger commented May 23, 2024

Thanks @brimoor and @benjaminpkane!

Great, looks like #4416 will address the suggestion that make_frames_dataset() should return a "regular" dataset.

In general, I agree that adding support for annotating a FrameView of a video dataset would be an amazing feature. I can envision a few good use cases...

  • It would allow partial annotation runs on videos while maintaining the integrity of the source video. (ex: labeling a 30 fps video at 1 fps, then later adding annotating frames at 3 fps).
  • My initial idea was to down sample and annotate frames at ~4 fps, then train a model to auto-label the rest of the video

Currently, its seems that the workflow would be to use to_frames(...).clone to sample and annotate a subset of the video, and then maintain the video dataset alongside the "to_frames.clone" dataset. I could either (1) store the annotations in "to_frames.clone" dataset, and progressively sample more frames of the video, merging and labeling them into the "to_frames.clone" dataset in batches, or (2) store the annotations in the video dataset, by annotating the "to_frames.clone" dataset and then merging the annotations into the video frames by associating the frame_number's.

This is certainly doable. But if FrameViews could be annotated directly, and the annotations could be imported straight into the video dataset, it would prevent the need to flow back and forth between two datasets (and mitigate the risks of accidentally tweaking one dataset out of alignment to the other).

@evatt-harvey-salinger
Copy link
Author

I'll close the issue, since #4416 addresses the original request. But I'd love to hear what you think more general idea of annotating FrameViews directly, so I'll stay tuned on the thread!

@brimoor
Copy link
Contributor

brimoor commented May 24, 2024

Out of curiosity, is there a reason you specifically want to annotate your videos as individual frames rather than directly calling annotate() on your media_type == "video" dataset?

@evatt-harvey-salinger
Copy link
Author

Hi Brian,

I've tried to answer this a few different times, but then I get new ideas and try to hack together a solution. But I haven't really found one yet.

Basically, I have many hours worth of 15 fps videos to label. Each video sample has wayyy to many frames to label all at once. I'd like to be able downsample and iteratively label portions of video datasets, while retaining the integrity of the video samples as videos (rather than just converting them into image datasets). That would enable me to annotate the videos at 1 fps, then come back and annotate at 4 fps. Or, use the 1 fps frames to training a model that can help me auto-label a portion of the unlabeled frames.

For example, I have a workflow with images datasets that looks like this:

  1. request annotation for a view first_pass
  2. retrieve annotations, and use the anno_results.frame_id_map to select the frame_ids to reconstruct the first_pass view (a capability we should add btw :) )
  3. programtically exchange label_requested tags for labeled tags
  4. train a model on the labeled samples
  5. run inference on the unlabeled samples and label them as auto-labeled
  6. form a new view second_pass with a portion of the auto-labeled samples, where I correct the auto-labeled predictions
  7. retrieve those annotations, and iterate

I would like to develop an analogous workflow for video datasets. Sending FrameViews for annotation, then retrieving the annotations and pulling directly into my video dataset would be the cleanest way enable this kind of workflow.

As I said, I've been trying to find a work around, but haven't been able to achieve a solution yet that isn't terribly convoluted. I know that I can just abandon the video datasets altogether and just convert everything to image datasets, but it would be a shame to not make use of the other video datasets capabilities. I would also like to keep the source files as videos, which are cleaner to store, version, view, etc.

I've gotten close to a solution where I maintain a video dataset and a corresponding image dataset as a pair. I can use the workflow above to add annotations to the images, then use the frame_numbers (a field automatically populated by make_frames_dataset()) to merge the annotations back into the video dataset. However, this has proven to be quite tricky.

@evatt-harvey-salinger
Copy link
Author

I know that I can use the frame_step parameters in annotate with the CVAT backend. But if I use the tracks feature in CVAT, then the detections actually get interpolated once they are imported into the FO dataset anyways. For example, if I use a frame_step=8 for a 32 frame video, I would only label ~4 frames in CVAT. But after importing back into FO, all 32 frames are labelled.

frame_step can't be used for datasets that already have tracks anyways.

Because of these two things, I'm going to just live with label full fps videos (with whatever downsampling i want on the front end), and achieve "partial" annotation by just sending different clips within the video at a time.

@evatt-harvey-salinger
Copy link
Author

Anyways, I hope this description give you an idea of the workflow I was trying to achieve by annotating FrameViews directly!

@evatt-harvey-salinger
Copy link
Author

evatt-harvey-salinger commented May 31, 2024

One additional wrinkle is that dataset.annotate(frame_start=..., frame_stop=...) doesn't actually work for datasets with multiple videos. Since video samples can have overlapping frame numbers, only the last dataset's images get sent. I'll add a separate issue about that: #4447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants