Why webgpu produces that many command buffers in metal? #5721

lukaschod · 2024-05-20T10:22:56Z

Is there a reason why webgpu produces that many command buffers in metal? It appears that each pass not only produces separate command buffer, but also another empty one for pass creation.

Personally, I do not know full overhead of command buffer, but at least apple recommends keeping them to minimum. For single threaded app usually one command buffer per full frame.

Here is example code of outline rendering that is composed out of three passes.

            let mut cmd = rd.create_command_encoder(&CommandEncoderDescriptor { label: None });
            let workgroup_size = (physical_target_size + 15) / 16;

            cmd.push_debug_group("outline group");

            // Gaussian Blur Horizontal
            {
                let mut pass = cmd.begin_compute_pass(&ComputePassDescriptor {
                    label: None,
                    timestamp_writes: None,
                });

                pass.set_pipeline(blur_horizontal);
                pass.set_bind_group(
                    0,
                    &blur_horizontal_bind_group,
                    &[view_uniform_offset.offset],
                );
                pass.dispatch_workgroups(workgroup_size.x, workgroup_size.y, 1);
            }

            // Gaussian Blur Vertical
            {
                let mut pass = cmd.begin_compute_pass(&ComputePassDescriptor {
                    label: None,
                    timestamp_writes: None,
                });

                pass.set_pipeline(blur_vertical);
                pass.set_bind_group(0, &blur_vertical_bind_group, &[view_uniform_offset.offset]);
                pass.dispatch_workgroups(workgroup_size.x, workgroup_size.y, 1);
            }

            // Apply Outline
            {
                let mut pass = cmd.begin_compute_pass(&ComputePassDescriptor {
                    label: None,
                    timestamp_writes: None,
                });

                pass.set_pipeline(apply_outline);
                pass.set_bind_group(0, &apply_outline_bind_group, &[view_uniform_offset.offset]);
                pass.dispatch_workgroups(workgroup_size.x, workgroup_size.y, 1);
            }

            cmd.pop_debug_group();

            cmd.finish()

In xcode capture it shows that each pass end up being added into different command buffer and additional empty one. It seems even push/pop debug commands end up being in separate command buffer.

The text was updated successfully, but these errors were encountered:

Wumpf · 2024-05-24T08:28:40Z

a lot of it happens because of the way wgpu has to stitch command buffers together for barriers. But I believe some if it also wgpu-hal too eagerly creating new command buffers of certain type and then needing to close it again because a different type is needed. Meaning, I would expect that we could do a lot better here - in particular the debug group only command buffers seem a bit odd, but likely are the sideeffect of something higher up in wgpu-core.

@cwfitzgerald might have more details on it

lukaschod · 2024-05-24T08:53:18Z

Reported more accurate question in #5738

lukaschod · 2024-05-24T08:54:27Z

a lot of it happens because of the way wgpu has to stitch command buffers together for barriers. But I believe some if it also wgpu-hal too eagerly creating new command buffers of certain type and then needing to close it again because a different type is needed. Meaning, I would expect that we could do a lot better here - in particular the debug group only command buffers seem a bit odd, but likely are the sideeffect of something higher up in wgpu-core.

@cwfitzgerald might have more details on it

I also added more detailed card #5738 that contains more investigation

Wumpf · 2024-05-25T10:36:04Z

Thank you for digging deeper and putting up the more accurate problem description & actionable performance issue!

I think with that you answered the original question yourself much better than I tried earlier here. Closing in favor of #5738

Wumpf added type: question Further information is requested api: metal Issues with Metal labels May 24, 2024

lukaschod mentioned this issue May 24, 2024

[Metal] Render/Compute pass forces command buffer finish at the end #5738

Open

lukaschod closed this as completed May 24, 2024

lukaschod reopened this May 24, 2024

Wumpf closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why webgpu produces that many command buffers in metal? #5721

Why webgpu produces that many command buffers in metal? #5721

lukaschod commented May 20, 2024 •

edited

Wumpf commented May 24, 2024 •

edited

lukaschod commented May 24, 2024

lukaschod commented May 24, 2024 •

edited

Wumpf commented May 25, 2024

Why webgpu produces that many command buffers in metal? #5721

Why webgpu produces that many command buffers in metal? #5721

Comments

lukaschod commented May 20, 2024 • edited

Wumpf commented May 24, 2024 • edited

lukaschod commented May 24, 2024

lukaschod commented May 24, 2024 • edited

Wumpf commented May 25, 2024

lukaschod commented May 20, 2024 •

edited

Wumpf commented May 24, 2024 •

edited

lukaschod commented May 24, 2024 •

edited