Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why webgpu produces that many command buffers in metal? #5721

Closed
lukaschod opened this issue May 20, 2024 · 4 comments
Closed

Why webgpu produces that many command buffers in metal? #5721

lukaschod opened this issue May 20, 2024 · 4 comments
Labels
api: metal Issues with Metal type: question Further information is requested

Comments

@lukaschod
Copy link

lukaschod commented May 20, 2024

Is there a reason why webgpu produces that many command buffers in metal? It appears that each pass not only produces separate command buffer, but also another empty one for pass creation.

Personally, I do not know full overhead of command buffer, but at least apple recommends keeping them to minimum. For single threaded app usually one command buffer per full frame.

Here is example code of outline rendering that is composed out of three passes.

            let mut cmd = rd.create_command_encoder(&CommandEncoderDescriptor { label: None });
            let workgroup_size = (physical_target_size + 15) / 16;

            cmd.push_debug_group("outline group");

            // Gaussian Blur Horizontal
            {
                let mut pass = cmd.begin_compute_pass(&ComputePassDescriptor {
                    label: None,
                    timestamp_writes: None,
                });

                pass.set_pipeline(blur_horizontal);
                pass.set_bind_group(
                    0,
                    &blur_horizontal_bind_group,
                    &[view_uniform_offset.offset],
                );
                pass.dispatch_workgroups(workgroup_size.x, workgroup_size.y, 1);
            }

            // Gaussian Blur Vertical
            {
                let mut pass = cmd.begin_compute_pass(&ComputePassDescriptor {
                    label: None,
                    timestamp_writes: None,
                });

                pass.set_pipeline(blur_vertical);
                pass.set_bind_group(0, &blur_vertical_bind_group, &[view_uniform_offset.offset]);
                pass.dispatch_workgroups(workgroup_size.x, workgroup_size.y, 1);
            }

            // Apply Outline
            {
                let mut pass = cmd.begin_compute_pass(&ComputePassDescriptor {
                    label: None,
                    timestamp_writes: None,
                });

                pass.set_pipeline(apply_outline);
                pass.set_bind_group(0, &apply_outline_bind_group, &[view_uniform_offset.offset]);
                pass.dispatch_workgroups(workgroup_size.x, workgroup_size.y, 1);
            }

            cmd.pop_debug_group();

            cmd.finish()

In xcode capture it shows that each pass end up being added into different command buffer and additional empty one. It seems even push/pop debug commands end up being in separate command buffer.
image

@Wumpf
Copy link
Member

Wumpf commented May 24, 2024

a lot of it happens because of the way wgpu has to stitch command buffers together for barriers. But I believe some if it also wgpu-hal too eagerly creating new command buffers of certain type and then needing to close it again because a different type is needed. Meaning, I would expect that we could do a lot better here - in particular the debug group only command buffers seem a bit odd, but likely are the sideeffect of something higher up in wgpu-core.

@cwfitzgerald might have more details on it

@lukaschod
Copy link
Author

Reported more accurate question in #5738

@lukaschod lukaschod reopened this May 24, 2024
@lukaschod
Copy link
Author

lukaschod commented May 24, 2024

a lot of it happens because of the way wgpu has to stitch command buffers together for barriers. But I believe some if it also wgpu-hal too eagerly creating new command buffers of certain type and then needing to close it again because a different type is needed. Meaning, I would expect that we could do a lot better here - in particular the debug group only command buffers seem a bit odd, but likely are the sideeffect of something higher up in wgpu-core.

@cwfitzgerald might have more details on it

I also added more detailed card #5738 that contains more investigation

@Wumpf
Copy link
Member

Wumpf commented May 25, 2024

Thank you for digging deeper and putting up the more accurate problem description & actionable performance issue!

I think with that you answered the original question yourself much better than I tried earlier here. Closing in favor of #5738

@Wumpf Wumpf closed this as completed May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: metal Issues with Metal type: question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants