Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bounding box converter #30852

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Conversation

qubvel
Copy link
Member

@qubvel qubvel commented May 16, 2024

What does this PR do?

Intorduces output_bbox_format parameter for postprocess_for_object_detection.
Adds the convert_boxes function to convert boxes from one format to another, e.g. from YOLO to Pascal VOC.

boxes = convert_boxes(
    boxes, input_format="relative_xcycwh", output_format="absolute_xyxy", image_size=target_sizes
)
  • works with np.array, torch.tensor, list, tuple (could be easily extended to jax and tf)
  • works with single box shape=(4,) , image boxes shape=(N, 4), multiple images boxes List[(N, 4)]
  • supports boxes with extra metadata, e.g. box=[x_min, y_min, x_max, y_max, color, class, ...]
  • supports format aliases. e.g. yolo, coco

Supported input/output bounding box formats:
- absolute_xyxy (aliases: pascal_voc, xyxy): [x_min, y_min, x_max, y_max]
- absolute_xywh (aliases: coco, xywh): [x_min, y_min, width, height]
- absolute_xcycwh: [center_x, center_y, width, height]
- relative_xyxy (aliases: albumentations): [x_min, y_min, x_max, y_max] normalized to [0, 1] by image size
- relative_xywh: [x_min, y_min, width, height] normalized to [0, 1] by image size
- relative_xcycwh (aliases: yolo, xcycwh): [center_x, center_y, width, height] normalized to [0, 1] by image size

Before submitting

Who can review?

@amyeroberts could you please look at this draft, looks a bit overсomplicated, probably we should drop some functionality in favor to simplification.

@qubvel qubvel marked this pull request as draft May 16, 2024 10:58
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

I haven't done a full review e.g. checking boring things like docstring formats, just looking at the overall structure.

This is really nicely structured and handled. I don't have any major comments on the overall logic - it's incredibly clean. Thanks for taking the time to write out comprehnsive tests too. Only major comment is to not have alias for the different formats, in particular for datasets like coco, it adds unnecessary complexity; I'm less sure about e.g. xyxy -> absolute_xyxy.

@@ -0,0 +1,523 @@
import logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright header

return norm_annotation
"""Convert annotation boxes from absolute xyxy (pascal voc) to relative xcycwh (yolo) format."""
if "boxes" in annotation:
annotation = annotation.copy() # shallow copy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the copy here? Only annotation["boxes"] is being passed to convert_boxes and is being set directly to annotation["boxes"] on the call

ArrayType = Union["torch.Tensor", np.ndarray]


SUPPORTED_BOX_FORMATS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have this as an enum

or most likely as a list of 2D arrays/tensors.

Supported input/output bounding box formats:
- `absolute_xyxy` (aliases: `pascal_voc`, `xyxy`): [x_min, y_min, x_max, y_max]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we didn't try to handle different aliases and just keep it simple to e.g. xyxy

input_format: str,
output_format: str,
image_size: Optional[Union[ArrayType, List, Tuple]] = None,
check: Optional[str] = "warn",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see "raise" never being used. I like the logic, but think we might be engineering something which we'll never need

return _convert_boxes_arrays_2d(boxes, input_format, output_format, image_size, check)

# Recursive approach.
elif depth == 1 and isinstance(boxes, (list, tuple) or is_array_type(boxes)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the isinstance(boxes, (list, tuple) or is_array_type(boxes)) check - what are the other types boxes could be here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants