-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure safety of indirect dispatch #5714
base: trunk
Are you sure you want to change the base?
Conversation
Hmm, I'm not sure how we should deal with the |
Let's just require it in core, honestly. We could build a naga module in a build script, serialize it and store that in the binary, but that's wayyyy to much work for no real gain |
That's what I thought as well. Requiring it in That PR was prefaced with:
This was also mentioned in #2549 (comment):
@daxpedda were the benefits of #2890 substantial in terms of compile time and binary size? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our office hours meeting this morning, we came up with a better way to test, by requesting a device with a lower limit than the adapter offers. Testing this way would give a more positive indication that the limit is being imposed.
I just tried a minimal example when using GLSL shaders on Wasm with WebGL.
Compile times are similar. Total compile time with O3 goes from 12.21s to 13.48s, so again around a 10% increase. Keep in mind that this is a minimal example and might not be representative for a full blown application. |
I would say 10% is significant especially for web apps but I'd be curious how much it decreases with larger applications. It's unfortunate though that since WebGL doesn't support indirect calls, this increase doesn't bring anything of value for this use case. Maybe we can feature gate the validation on the The downside being that |
018b23b
to
a5bebb0
Compare
@jimblandy I lowered the limit required to 10 since the testing infra is not setup to be able to require limits based on what the adapter supports. I think 10 should be safe, the default value is 65535. |
ac3f089
to
36281af
Compare
I added another test for the |
I wanted to get CI working so I just implemented #5714 (comment) for now. |
Filed #5739 for the metal issue which I've worked around in this PR. |
The And is because the internal call to I was hoping to get away with calling |
b57350e
to
e0bd41a
Compare
Actually, I don't think the test is correct. Shouldn't our The The test was recently added in #5570.
@Wumpf could you chime in on this? Did we use to crash or raise a validation error? |
No, drop just removes the user's handle to it, doesn't make access to the resource invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts so far:
I see, I will try to convert the code doing bindgroup creation to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance/code simplification thoughts.
Dispatches are unfortunately not trivial to validate all at once :(
d25ed4b
to
ab03bc6
Compare
by injecting a compute shader that validates the content of the indirect buffer also adds missing indirect buffer offset validation
ab03bc6
to
abeb863
Compare
I'm not happy about the new |
pub(crate) fn calculate_src_buffer_binding_size(device: &Device<A>, buffer: &Buffer<A>) -> u64 { | ||
let alignment = device.limits.min_storage_buffer_offset_alignment as u64; | ||
|
||
// We need to choose a binding size that can address all possible sets of 12 contiguous bytes in the buffer taking | ||
// into account that the dynamic offset needs to be a multiple of `min_storage_buffer_offset_alignment`. | ||
|
||
// Given the know variables: `offset`, `buffer_size`, `alignment` and the rule `offset + 12 <= buffer_size`. | ||
|
||
// Let `chunks = floor(buffer_size / alignment)`. | ||
// Let `chunk` be the interval `[0, chunks]`. | ||
// Let `offset = alignment * chunk + r` where `r` is the interval [0, alignment - 4]. | ||
// Let `binding` be the interval `[offset, offset + 12]`. | ||
// Let `aligned_offset = alignment * chunk`. | ||
// Let `aligned_binding` be the interval `[aligned_offset, aligned_offset + r + 12]`. | ||
// Let `aligned_binding_size = r + 12 = [12, alignment + 8]`. | ||
// Let `min_aligned_binding_size = alignment + 8`. | ||
|
||
// `min_aligned_binding_size` is the minimum binding size required to address all 12 contiguous bytes in the buffer | ||
// but the last aligned_offset + min_aligned_binding_size might overflow the buffer. In order to avoid this we must | ||
// pick a larger `binding_size` that satisfies: `last_aligned_offset + binding_size = buffer_size` and | ||
// `binding_size >= min_aligned_binding_size`. | ||
|
||
// Let `buffer_size = alignment * chunks + sr` where `sr` is the interval [0, alignment - 4]. | ||
// Let `last_aligned_offset = alignment * (chunks - u)` where `u` is the interval [0, chunks]. | ||
// => `binding_size = buffer_size - last_aligned_offset` | ||
// => `binding_size = alignment * chunks + sr - alignment * (chunks - u)` | ||
// => `binding_size = alignment * chunks + sr - alignment * chunks + alignment * u` | ||
// => `binding_size = sr + alignment * u` | ||
// => `min_aligned_binding_size <= sr + alignment * u` | ||
// => `alignment + 8 <= sr + alignment * u` | ||
// => `u` must be at least 2 | ||
// => `binding_size = sr + alignment * 2` | ||
|
||
let binding_size = 2 * alignment + (buffer.size % alignment); | ||
binding_size.min(buffer.size) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this makes sense. It took me a while to figure out and write down what the constraints were and how we come up with a binding size that satisfies them.
It seems this PR causes rustdoc on windows to time out (related #4905). |
Ensure safety of indirect dispatch by injecting a compute shader that validates the content of the indirect buffer.
Part of #2431.