Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Processing - Add support for groups, articulated scenes, episodes. #1886

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

0mdc
Copy link
Contributor

@0mdc 0mdc commented Apr 1, 2024

Motivation and Context

Motivations:

  • We need to support articulated scenes in Unity.
  • HSSD is massive. In WebGL, we only want to download assets we need, not include the entire dataset with the WebGL app.
  • The unit of work is an episode set. It captures all dependencies for a "project", along with all paths.

Context:

  • The unity_dataset_processing.py script is used to convert/decimate Habitat datasets into a format usable by Unity.
  • The premise is that Unity has a clone of the data folder. A relative path from the data folder resolves both in Habitat and Unity (or any other external engine, like Blender).

This changeset entirely refactors the Habitat -> Unity dataset processing pipeline such as:

  • An episode set is supplied as input instead of a list of scenes.
  • Datasets are automatically gathered (not hard-coded).
  • Articulated scenes are supported.
  • A metadata.json file is produced, containing groups.
    • Each group contains the list of assets they depend on. Currently, groups match 1-1 with the concept of a scene.
    • This file is consumed by external asset pipelines (e.g. Unity) to determine how assets should be packaged.
    • The default "local" group hints that the asset should be packaged locally (along with the build).
    • Grouped assets hint that they will be used together and should be batched when downloading from a remote location.
  • Refactoring. This will enable the dataset pipeline to be driven by hydra config.
    • This will be a future refactoring that will allow each "project" to define how to process this data.
    • Decimation level, included datasets and other parameters will be configurable per-dataset via config.
    • More target engines may be included (e.g. Blender).

How Has This Been Tested

Tested with existing episode sets and new articulated scenes.

Types of changes

  • [Refactoring]
  • [Development]

Checklist

  • My code follows the code style of this project.
  • I have updated the documentation if required.
  • I have read the CONTRIBUTING document.
  • I have completed my CLA (see CONTRIBUTING)
  • I have added tests to cover my changes if required.

@0mdc 0mdc requested review from jturner65 and aclegg3 April 1, 2024 16:41
@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Apr 1, 2024
Copy link
Contributor

@aclegg3 aclegg3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems functional for now and an improvement over existing process, so approving.

However, I do wonder if we could make this more generic by using MetadataMediator to load the SceneDataset and then enumerate the contents instead of directly querying the configs. I know that scene_instance access would be troubling, but the resulting config values would be more accurate since there is an override and defaults hierarchy built in.

Interested to get @jturner65's opinion on this.

@0mdc
Copy link
Contributor Author

0mdc commented Apr 1, 2024

Great point @aclegg3 !
Beware that the magnum dependency used in this library requires all batteries to be included, therefore we direct users to create a separate environment from habitat. The magnum included with habitat_sim does not include things like basis compression.

It is possible to arrange the script differently, but I'm concerned about complicating its usage.

@0mdc 0mdc force-pushed the 0mdc/large_scale_data_processing branch from 6c37569 to aeceb3d Compare April 4, 2024 00:32
@0mdc 0mdc marked this pull request as ready for review April 4, 2024 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants