Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP linux-drm-syncobj-v1 #411

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

WIP linux-drm-syncobj-v1 #411

wants to merge 1 commit into from

Conversation

ids1024
Copy link
Member

@ids1024 ids1024 commented Apr 9, 2024

Based on Smithay/smithay#1356.

I think the blocker logic should be correct for handling acquire points (if I properly understand the transaction system in Smithay). Though I don't see rendering issues with Mesa git when the blocker is removed... maybe it needs to be tested with something heavier than vkcube. (Or is there something still forcing implicit sync?).

The logic I added in Smithay for signaling releases may be a little less correct. Though maybe not more incorrect that how buffer releases are currently handled? (If I understand DRM correctly, with direct scanout we should make sure not to release until we've committed a new buffer and are sure the display controller won't want to read the buffer.)

We'll be able to test more when the next Nvidia driver is released. This at least gives us a way to test the explicit sync support they're adding.

Presumably we should test if drmSyncobjEventfd is supported... maybe just creating a syncobj and calling that to see if it works? I'm also still a little unsure how this ends up working with multiple GPUs... particularly if one is Nvidia.

@Drakulix
Copy link
Member

I think the blocker logic should be correct for handling acquire points (if I properly understand the transaction system in Smithay). Though I don't see rendering issues with Mesa git when the blocker is removed... maybe it needs to be tested with something heavier than vkcube. (Or is there something still forcing implicit sync?).

For that you probably have to remove the old dmabuf.generate_blocker logic, which pulls a fence out of the dmabuf to do essentially the same thing. We basically should check, if the client uses explicit-sync and then use the acquire fence and otherwise fallback to polling the dmabuf directly.

The logic I added in Smithay for signaling releases may be a little less correct. Though maybe not more incorrect that how buffer releases are currently handled? (If I understand DRM correctly, with direct scanout we should make sure not to release until we've committed a new buffer and are sure the display controller won't want to read the buffer.)

Yeah, I am pretty sure smithay's code isn't correct in what it does today, but given direct-scanout mostly works, that is probably using implicit-sync in the background.

What needs to happen is storing the fence generated by compositing or in case of direct scanout, we need to get the OUT_FENCE_PTR property of the respective plane. Then we can wait for that fence, once the buffer is replaced and won't be used for new rendering/scanout operations and signal release once that fence is done. (Maybe there is even kernel api to "signal once this other fence is signalled"?)

Presumably we should test if drmSyncobjEventfd is supported... maybe just creating a syncobj and calling that to see if it works? I'm also still a little unsure how this ends up working with multiple GPUs... particularly if one is Nvidia.

Yeah, imo this needs a compile test similar to what we do for gbm in smithay: https://github.com/Smithay/smithay/blob/master/build.rs#L99-L125

So if the local kernel/drm headers of the system support it, we enable the feature and assume the kernel does as well. I don't know how well runtime detection would work, but we have to make sure to not advertise the global, if this function isn't supported.

@ids1024
Copy link
Member Author

ids1024 commented Apr 10, 2024

For that you probably have to remove the old dmabuf.generate_blocker logic, which pulls a fence out of the dmabuf to do essentially the same thing. We basically should check, if the client uses explicit-sync and then use the acquire fence and otherwise fallback to polling the dmabuf directly.

This should already be doing that. If there's an acquire point, it adds a DrmSyncPointBlocker and skips adding the DmabufBlocker.

Yeah, I am pretty sure smithay's code isn't correct in what it does today, but given direct-scanout mostly works, that is probably using implicit-sync in the background.

I'm not sure if implicit sync does does something to help with releases (blocking client writes to the buffer until the display controller is done with the buffer... but yeah, it does seem to mostly work. If implicit sync isn't involved, this won't be more problematic with explicit sync and the same limitation.

(Maybe there is even kernel api to "signal once this other fence is signalled"?)

It should be possible to do something with drmSyncobjTransfer and such. Probably the other implementations of the protocol do something like that, so we can look at those.

We just have to make sure to only do that once the buffer is no longer used elsewhere in the compositor.

@Drakulix
Copy link
Member

I'm not sure if implicit sync does does something to help with releases (blocking client writes to the buffer until the display controller is done with the buffer... but yeah, it does seem to mostly work. If implicit sync isn't involved, this won't be more problematic with explicit sync and the same limitation.

It definitely does, but the nvidia-driver at least isn't doing that correctly, when you send dmabufs directly to KMS without going through egl-gbm and allocating a EGLSurface. Which is why we have the needs_sync workaround in smithay. But this should work for mesa.

(Maybe there is even kernel api to "signal once this other fence is signalled"?)

It should be possible to do something with drmSyncobjTransfer and such. Probably the other implementations of the protocol do something like that, so we can look at those.

👍

We just have to make sure to only do that once the buffer is no longer used elsewhere in the compositor.

I am pretty sure, that is what our current release logic does, with the exception of handling direct-scanout. Which would be handled by the fence anywhere in this case, so that should be correct. We just need to adjust the DrmCompositor to extract the out-fence and expose that in the RenderResult.

@ids1024
Copy link
Member Author

ids1024 commented Apr 11, 2024

I am pretty sure, that is what our current release logic does, with the exception of handling direct-scanout. Which would be handled by the fence anywhere in this case, so that should be correct. We just need to adjust the DrmCompositor to extract the out-fence and expose that in the RenderResult.

I mean if we want to have OUT_FENCE_PTR directly signal the release point:

  • We can't do that if we are using the same buffer for direct scanout or rendering elsewhere (on a different monitor; screencopy)
  • We can't do that if we might use the buffer again. Which could even be the case if another buffer has been committed, since the new buffer may still be blocked when we want to render the next frame.

So I'm not sure when we could actually do that? Maybe with commit-queue-v1, where we might know the next buffer is ready, but aren't using it until the next frame.

So at least for now I think we need to stick to signaling the release point from CPU? But should still track when OUT_FENCE_PTR has signaled scanout is done with the buffer.

@Drakulix
Copy link
Member

I mean if we want to have OUT_FENCE_PTR directly signal the release point:

* We can't do that if we are using the same buffer for direct scanout or rendering elsewhere (on a different monitor; screencopy)

Right, so this rather needs to be a list of fences to wait for. Meaning we probably have to wait and signal ourselves instead of relying on drmSyncobjTransfer.

* We can't do that if we might use the buffer again. Which could even be the case if another buffer has been committed, since the new buffer may still be blocked when we want to render the next frame.

But merge of the state should only happen, once all blockers are resolved. And only then we release, so I believe that issue is already handled correctly. Nothing would be able to use that buffer for rendering any more at that point.

But we still need to track the buffer to be able to signal later, so we might as well unify the approach and handle the release-event the same. I feel like this could benefit from some infrastructure and refactoring in smithay.

So I'm not sure when we could actually do that? Maybe with commit-queue-v1, where we might know the next buffer is ready, but aren't using it until the next frame.

I think we can implement both fifo and commit-queue with blockers as well.

So at least for now I think we need to stick to signaling the release point from CPU? But should still track when OUT_FENCE_PTR has signaled scanout is done with the buffer.

Yeah, I am coming to the same conclusion, but that isn't too bad, as that is just another fd in the loop and a very small action.

@@ -450,6 +451,9 @@ pub fn init_backend(
// Create relative pointer global
RelativePointerManagerState::new::<State>(&dh);

// TODO check if main device supports syncobj eventfd?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess worst case we can still fail the import_timeline-request, right?
Is Xwayland/mesa able to handle this / fallback to implicit sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any way to fallback once we expose the global. The protocol doesn't make import_timeline failable, except as a protocol error.

It looks like Mutter uses drmSyncobjEventfd (drm_fd, 0, 0, -1, 0) != -1 || errno != ENOENT) to check if the call is supported. So we probably want to do the same, and only create the global if it is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants