Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Model Maker Object Detection TFLite model inference directly #5370

Open
DoctorDinosaur opened this issue May 2, 2024 · 1 comment
Open
Assignees
Labels
platform:python MediaPipe Python issues stat:awaiting googler Waiting for Google Engineer's Response task:object detection Issues related to Object detection: Track and label objects in images and video. type:modelmaker Issues related to creation of custom on-device ML solutions type:support General questions

Comments

@DoctorDinosaur
Copy link

DoctorDinosaur commented May 2, 2024

I've trained a model in mediapipe model maker.

I want to run inference through tensorflow directly, on python, so I can use a Coral Edge TPU. Since it's a TFLite Model this should be possible.

But I'm struggling to get proper outputs.

For input, I resize to 256x256. I've tried normalisation in [0,255], [0,1] and [-1,1].

Running the signature function returns a dictionary of {detection_boxes, detection_scores}
Where shape(detection_boxes) = (1, num_boxes, 4)
and shape(detection_scores) = (1, num_boxes, num_classes)

However, the values I'm getting for detection_boxes are unnormalised and frequently negative.
I've tried searching the repo for how decoding is done, and expected pre-processing on input, but its hard to traverse this repo.

Is there a minimal example of how to perform inference directly, and decode model output?
Failing that, what model input is expected and what format are the output detection_boxes and detection_score?

(Code: https://gist.github.com/DoctorDinosaur/be495b6065fff29f79ec11306dd89c3b)

@DoctorDinosaur DoctorDinosaur added the type:others issues not falling in bug, perfromance, support, build and install or feature label May 2, 2024
@DoctorDinosaur
Copy link
Author

DoctorDinosaur commented May 3, 2024

Seems like I need to transform the boxes with anchor values.

It's unclear from tensors_to_detections_calculator.cc how anchor values are calculated for Model-Maker models, so I'm taking them from the metadata.json generated by model-maker right now.

import json

with open("mediapipe/exported_model/metadata.json") as f:
    metadata = json.load(f)
    anchors = metadata["subgraph_metadata"][0]["custom_metadata"][0]["data"][
        "ssd_anchors_options"
    ]["fixed_anchors_schema"]["anchors"]
    # Convert list of dictionaries to array of arrays
    anchors = np.array(
        [
            [anchor["x_center"], anchor["y_center"], anchor["width"], anchor["height"]]
            for anchor in anchors
        ]
    )
    metadata = None

boxes = np.zeros_like(output["detection_boxes"])

scores = output["detection_scores"]

x_scale = 1
y_scale = 1
w_scale = 1
h_scale = 1

x_center = (
    output["detection_boxes"][:, :, :, 0] / x_scale * anchors[np.newaxis, :, 2].T
    + anchors[np.newaxis, :, 0].T
)
y_center = (
    output["detection_boxes"][:, :, :, 1] / y_scale * anchors[np.newaxis, :, 3].T
    + anchors[np.newaxis, :, 1].T
)

width = np.exp(output["detection_boxes"][:, :, :, 2] / w_scale) * anchors[np.newaxis, :, 2].T
height = np.exp(output["detection_boxes"][:, :, :, 3] / h_scale) * anchors[np.newaxis, :, 3].T

boxes[:, :, :, 0] = y_center - height / 2
boxes[:, :, :, 1] = x_center - width / 2
boxes[:, :, :, 2] = y_center + height / 2
boxes[:, :, :, 3] = x_center + width / 2

nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections = (
    tf.image.combined_non_max_suppression(
        boxes,
        scores,
        max_output_size_per_class=5,
        max_total_size=25,
        iou_threshold=0.2,
        score_threshold=0.5,
        clip_boxes=True,
    )
)

This seems to produce values much closer to the correct ones, but they're still wrong, there's some negative values. Although, plotting the NMS-ed result, I get some boxes in the right place.
I assume I'm doing something wrong here, perhaps a shape error as I've made quite a mess of the arrays?

Again, is there a working example for decoding?

@kuaashish kuaashish assigned kuaashish and unassigned ayushgdev May 6, 2024
@kuaashish kuaashish added type:support General questions type:modelmaker Issues related to creation of custom on-device ML solutions task:object detection Issues related to Object detection: Track and label objects in images and video. platform:python MediaPipe Python issues and removed type:others issues not falling in bug, perfromance, support, build and install or feature labels May 6, 2024
@kuaashish kuaashish assigned joezoug and unassigned kuaashish May 7, 2024
@kuaashish kuaashish added the stat:awaiting googler Waiting for Google Engineer's Response label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:python MediaPipe Python issues stat:awaiting googler Waiting for Google Engineer's Response task:object detection Issues related to Object detection: Track and label objects in images and video. type:modelmaker Issues related to creation of custom on-device ML solutions type:support General questions
Projects
None yet
Development

No branches or pull requests

4 participants