WIP linux-drm-syncobj-v1 #1356

ids1024 · 2024-03-08T02:57:59Z

So far just exposes the protocol but doesn't do anything. #1327 does indeed make that part mostly painless instead of having impl bounds to deal with and a delegate macro.

src/wayland/drm_syncobj/mod.rs

ids1024 · 2024-04-16T02:41:37Z

Reading a bit more about IN_FENCE_FD and OUT_FENCE_PTR, I think we need to use both if we want DRM scanout to not use implicit sync?

https://github.com/ValveSoftware/gamescope/blob/master/src/drm.cpp uses an g_nAlwaysSignalledSyncFile as the IN_FENCE_FD. Which is probably what we want as well (for direct scanout) if we only commit the buffer when we have already waited for it to be ready.

Then we can poll the fence produced by OUT_FENCE_PTR, at not signal release points for buffers used in direct scanout until that.

Drakulix · 2024-04-16T09:55:58Z

Reading a bit more about IN_FENCE_FD and OUT_FENCE_PTR, I think we need to use both if we want DRM scanout to not use implicit sync?

I am not sure, if you can use OUT_FENCE_PTR without IN_FENCE_FD. It wouldn't really make sense for the driver to support one but not the other. We probably should just set them both all the time.

https://github.com/ValveSoftware/gamescope/blob/master/src/drm.cpp uses an g_nAlwaysSignalledSyncFile as the IN_FENCE_FD. Which is probably what we want as well (for direct scanout) if we only commit the buffer when we have already waited for it to be ready.

Yes, if we do CPU latching (that is polling the fence), then this seems to be the right approach for direct-scanout.

Though I would still just send the fence for composited frames, although we obviously could apply the principle there as well, as I tried here: pop-os/cosmic-comp#291

However I don't see a benefit outside of cases, where the driver supports polling fences, but not IN_FENCE_FD. In those we could skip a CPU blocking glFinish, which also becomes a non-issue once we have a thread per surface.

Perhaps there are some advantages for VRR use cases, but I don't think that is worth the trouble for a first iteration.

Then we can poll the fence produced by OUT_FENCE_PTR, at not signal release points for buffers used in direct scanout until that.

👍

codecov-commenter · 2024-05-21T19:48:11Z

Codecov Report

Attention: Patch coverage is 92.30769% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 20.29%. Comparing base (c982040) to head (a17585c).
Report is 63 commits behind head on master.

Files	Patch %	Lines
src/backend/renderer/utils/wayland.rs	93.75%	1 Missing ⚠️
src/wayland/shell/xdg/handlers/positioner.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1356      +/-   ##
==========================================
+ Coverage   20.27%   20.29%   +0.01%     
==========================================
  Files         161      162       +1     
  Lines       26041    26351     +310     
==========================================
+ Hits         5281     5348      +67     
- Misses      20760    21003     +243

Flag	Coverage Δ
wlcs-buffer	`17.72% <92.30%> (+0.04%)`	⬆️
wlcs-core	`17.39% <92.30%> (+0.03%)`	⬆️
wlcs-output	`7.08% <0.00%> (-0.15%)`	⬇️
wlcs-pointer-input	`19.25% <88.46%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Drakulix · 2024-05-23T09:05:04Z

src/wayland/drm_syncobj/mod.rs

+pub trait DrmSyncobjHandler {
+    // TODO better way to deal with this?
+    /// DRM device for importing syncobj file descriptors
+    fn import_device(&self) -> &DrmDeviceFd;


Can't we at least pass the Client here? I am not sure, if any part of the stack expects a particular device, but I would like to make this the same device we advertise via wl_drm or as a main device in cosmic's implementation.

Good point. I think it shouldn't matter? In which case it would be simpler to take a DrmDeviceFd when creating the global.

But if it does matter, that a reason to stick with a trait method like this. Although, we probably would want to derive the node from the Dmabuf. Which should handle that, but also a device specified with #1430. Right?

We could pass the WlBuffer. But if the implementation here assumes the buffer is created with zwp_linux_dmabuf_v1, we could pass the Option<DrmNode>.

[...] we could pass the Option<DrmNode>

That would only be set for dmabuf <= 3 (and potentially >= 6), right? We have more information given the main device in cosmic. So imo the WlBuffer is the most flexible, as it also allows to get the Client.

Whether we make use of this in the end is a separate question, but I think we should offer this in the api.

Drakulix · 2024-05-23T09:08:48Z

src/backend/drm/compositor/mod.rs

+impl<B: Buffer> ScanoutBuffer<B> {
+    fn acquire_point(&self) -> Option<SyncPoint> {
+        if let Self::Wayland(buffer) = self {
+            return buffer.acquire_point().cloned().map(SyncPoint::from);


So if a blocker is used this point will always be signalled, right? Then we don't need to create our own signalled fence.

That was my question yesterday.

If we assume every smithay-based compositor will use DrmSyncpointBlocker, we don't need to get a fence from the sync point. But should create a fence that is already signaled to use as an IN_FENCE_FD to make sure no implicit sync occurs.

If DmabufBlocker is used for any surface that isn't using explicit sync, we can also submit a signaled IN_FENCE_FD for implict sync surfaces. (So we can just do that for every direct scanout plane.)

(If some Smithay compositors might not want to use blockers like that, they'll need a way to submit an IN_FENCE_FD from the acquire point.)

If we assume every smithay-based compositor will use DrmSyncpointBlocker, we don't need to get a fence from the sync point. But should create a fence that is already signaled to use as an IN_FENCE_FD to make sure no implicit sync occurs.

If DmabufBlocker is used for any surface that isn't using explicit sync, we can also submit a signaled IN_FENCE_FD for implict sync surfaces. (So we can just do that for every direct scanout plane.)

So maybe we can make these blockers attach a hint on the WlSurface, that signals the DrmCompositor, that the buffer should be ready? And just fall back to implicit sync (no IN_FENCE_FD) otherwise?

I suppose no smithay compositor would want to use a different fence for a buffer. At least not when using the DrmCompositor, which just does direct scanout or submits an IN_FENCE_FD for compositing.

Anything else that might need custom fences (e.g. multi-gpu transfers directly scanned out or composing onto a plane) is not supported right now by the DrmCompositor anyway.

We could make the blocker take a reference to the Buffer instead of the DrmSyncPoint and put an AtomicBool in the Buffer indicating that it's already been blocked on. Though it might be simpler for DrmCompositor to take a bool of whether or not to assume buffers are already ready.

Though it might be simpler for DrmCompositor to take a bool of whether or not to assume buffers are already ready.

Yeah, though then we need good docs to explain when it is safe to pass that bool. (And use the fences from the acquire points otherwise?)

ids1024 · 2024-05-24T01:31:46Z

Perhaps not using a blocker could make sense for a cage-like compositor, or as an optimization when there is only a single fullscreen application. Though I'm not sure what issues there might be if the IN_FENCE_FD isn't signaled for a long time (say, a broken client that never signals it?). At some point we'll want to do testing and see what works and if it can improve things.

Without blockers, a renderer may also need to handle the acquire points. A Vulkan renderer should import the timeline as a VkSemaphore. I think we also would need to use EGL_ANDROID_native_fence_sync in the EGL renderer, and wait on the acquire fence? I don't think we can rely on implicit sync.

(Perhaps it would be best to document that the blocker (or something equivalent) is required for Smithay's syncobj-v1 implementation, and generalize things later?)

Testing this: I see vkgears on cosmic-comp is only using direct scanout to an overlay plane when full-screened. In which case, export_sync_file runs successfully. Though I can't really tell if implicit sync is being avoided, and that this is preventing sync errors, since at least with vkgears I don't see artifacts if I removed the blocker and use a signaled in fence.

Wayland EGL on the Nvidia 555 driver, and Mesa Vulkan on git, both freeze if BufferInner::drop doesn't signal the release fence. So as long as we make sure to keep a reference to Buffer as long as we need the buffer, that seems to be working. Which seems correct for direct-scanout if we hold it until we get a page flip event.

Drakulix · 2024-05-24T11:42:00Z

Perhaps not using a blocker could make sense for a cage-like compositor, or as an optimization when there is only a single fullscreen application. Though I'm not sure what issues there might be if the IN_FENCE_FD isn't signaled for a long time (say, a broken client that never signals it?). At some point we'll want to do testing and see what works and if it can improve things.

Given commits could be blocking, this might permanently lock up a display without a timeout. It might be reasonable to never send any user-provided fences and instead require adding a Blocker, if one chooses the implement linux-drm-syncobj.

Without blockers, a renderer may also need to handle the acquire points. A Vulkan renderer should import the timeline as a VkSemaphore. I think we also would need to use EGL_ANDROID_native_fence_sync in the EGL renderer, and wait on the acquire fence? I don't think we can rely on implicit sync.

We kinda do that already? SyncPoints can be awaited via the GPU driver in smithay currently and we use that for synchronizing multi-gpu copies and block for compositing before submitting.

(Perhaps it would be best to document that the blocker (or something equivalent) is required for Smithay's syncobj-v1 implementation, and generalize things later?)

Yes, agreed.

So to sum up:

Never pass user-provided fences to IN_FENCE_FD, only for compositing like we do now (and denylist the nvidia driver here for now).
Pass a signalled fence for direct-scanout, if downstream signals that it is using blockers (either automagically or with a flag on the DrmCompositor).
Otherwise just fallback to implicit sync for direct-scanout.

Sounds good?

Testing this: I see vkgears on cosmic-comp is only using direct scanout to an overlay plane when full-screened. In which case, export_sync_file runs successfully. Though I can't really tell if implicit sync is being avoided, and that this is preventing sync errors, since at least with vkgears I don't see artifacts if I removed the blocker and use a signaled in fence.

That is the issue with races, I guess.

Wayland EGL on the Nvidia 555 driver, and Mesa Vulkan on git, both freeze if BufferInner::drop doesn't signal the release fence. So as long as we make sure to keep a reference to Buffer as long as we need the buffer, that seems to be working. Which seems correct for direct-scanout if we hold it until we get a page flip event.

Sounds good!

ids1024 · 2024-05-24T15:53:09Z

Given commits could be blocking, this might permanently lock up a display without a timeout. It might be reasonable to never send any user-provided fences and instead require adding a Blocker, if one chooses the implement linux-drm-syncobj.

Apparently fences have timeouts? Since the kernel wants to prevent an indefinite lock like this: https://docs.kernel.org/driver-api/dma-buf.html#indefinite-dma-fences

Presumably the timeout behavior would be similar to implict sync without a DmabufBlocker? In which case that's just consistent with the expected problems when not using a blocker.

So that does sound like something that could be a viable optimization for things like fullscreen applications, even if a general purpose compositor probably want to use blockers under most circumstances (for implicit or explicit sync). But it's an optimization that can be handled later, with more testing.

We kinda do that already? SyncPoints can be awaited via the GPU driver in smithay currently and we use that for synchronizing multi-gpu copies and block for compositing before submitting.

Ah right, we already have support for that in the renderer. So if we want to use the acquire_fence without a CPU wait, we'll want to pass the fence to EGL there. But that can be handled later.

Never pass user-provided fences to IN_FENCE_FD, only for compositing like we do now (and denylist the nvidia driver here for now).

Pass a signalled fence for direct-scanout, if downstream signals that it is using blockers (either automagically or with a flag on the DrmCompositor).

Otherwise just fallback to implicit sync for direct-scanout.

I'm not sure if we can automagically fallback to implicit sync. We could just not pass an IN_FENCE_FD, and ignore the acquire fence, but I don't think ignoring the explicit fence is guaranteed to work (with all drivers)?

I think we need to either:

Document that, at least for now, use of DrmSyncPointBlocker is required with the Smithay syncobj-v1 protocol implementation.
- If drmSyncobjEventfd isn't supported by the kernel, the protocol shouldn't be exposed.
- I think this matches what Mutter is doing.
If a blocker isn't used, use IN_FENCE_FD, or import the fence into EGL
- Since IN_FENCE_FD doesn't work on Nvidia, for now, DrmSyncPointBlocker is the only way to get proper synchronization with the Nvidia driver?

Drakulix · 2024-05-24T16:00:11Z

So that does sound like something that could be a viable optimization for things like fullscreen applications

Right. For fullscreen surfaces, we probably want to just pass through the fence, not for normal desktop usage though. I agree this is a later optimization.

Ah right, we already have support for that in the renderer. So if we want to use the acquire_fence without a CPU wait, we'll want to pass the fence to EGL there. But that can be handled later.

Yes, but we kinda don't want that (except for the fullscreen case again), because then we would block compositing for an undefined amount of time. The goal is to render as late as possible and use the latest ready buffer.

We could just not pass an IN_FENCE_FD, and ignore the acquire fence, but I don't think ignoring the explicit fence is guaranteed to work (with all drivers)?

Yes we can absolutely do that, that is what we are doing right now. We would only block drivers not supporting implicit sync (nvidia). Anyone serious about nvidia support, just needs to implement the protocol and I don't want to mandate that.

So strong preference for 1.

DemiMarie · 2024-05-29T21:37:38Z

My understanding is that DRM syncobjs are, at least in the future, not guaranteed to ever signal. This is because future drivers may support Userspace Memory Fences (UMFs) and userspace can deadlock themselves with them. Therefore, I think Smithay needs to do a (maybe asynchronous) explicit wait (possibly with a timeout) so that the display cannot be locked up forever.

ryanabx mentioned this pull request Mar 8, 2024

Question: XWayland and Nvidia explicit-sync pop-os/cosmic-comp#343

Closed

ids1024 force-pushed the drm-syncobj branch from f431bc2 to 2029153 Compare March 11, 2024 19:48

ids1024 force-pushed the drm-syncobj branch 2 times, most recently from b732491 to f6cd415 Compare April 8, 2024 17:04

ryzendew mentioned this pull request Apr 9, 2024

Add support for linux-drm-syncobj-v1 pop-os/cosmic-comp#409

Open

ids1024 force-pushed the drm-syncobj branch 4 times, most recently from d026857 to e933521 Compare April 9, 2024 21:22

ids1024 mentioned this pull request Apr 9, 2024

WIP linux-drm-syncobj-v1 pop-os/cosmic-comp#411

Draft

Drakulix reviewed Apr 10, 2024

View reviewed changes

src/wayland/drm_syncobj/mod.rs Show resolved Hide resolved

PolyMeilex mentioned this pull request Apr 12, 2024

Implemented & Planned Wayland protocols #781

Open

ids1024 force-pushed the drm-syncobj branch from f802a8a to e69775a Compare April 18, 2024 20:33

ids1024 force-pushed the drm-syncobj branch from d4773d1 to 32333d0 Compare May 1, 2024 04:26

ids1024 force-pushed the drm-syncobj branch 2 times, most recently from 2fb6a6d to fbb5ca6 Compare May 13, 2024 16:50

ids1024 force-pushed the drm-syncobj branch 6 times, most recently from a4246d7 to 37c2c7e Compare May 21, 2024 19:16

ids1024 force-pushed the drm-syncobj branch 2 times, most recently from 57ae573 to e5b0a4b Compare May 23, 2024 01:07

WIP linux-drm-syncobj-v1

e1a602f

ids1024 force-pushed the drm-syncobj branch from e5b0a4b to e1a602f Compare May 23, 2024 01:35

Drakulix reviewed May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP linux-drm-syncobj-v1 #1356

WIP linux-drm-syncobj-v1 #1356

ids1024 commented Mar 8, 2024

ids1024 commented Apr 16, 2024

Drakulix commented Apr 16, 2024

codecov-commenter commented May 21, 2024 •

edited

Drakulix May 23, 2024

ids1024 May 23, 2024

Drakulix May 23, 2024

Drakulix May 23, 2024

ids1024 May 23, 2024

Drakulix May 23, 2024

ids1024 May 23, 2024

Drakulix May 23, 2024

ids1024 commented May 24, 2024

Drakulix commented May 24, 2024 •

edited

ids1024 commented May 24, 2024 •

edited

Drakulix commented May 24, 2024 •

edited

DemiMarie commented May 29, 2024 •

edited

WIP linux-drm-syncobj-v1 #1356

Are you sure you want to change the base?

WIP linux-drm-syncobj-v1 #1356

Conversation

ids1024 commented Mar 8, 2024

ids1024 commented Apr 16, 2024

Drakulix commented Apr 16, 2024

codecov-commenter commented May 21, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ids1024 commented May 24, 2024

Drakulix commented May 24, 2024 • edited

ids1024 commented May 24, 2024 • edited

Drakulix commented May 24, 2024 • edited

DemiMarie commented May 29, 2024 • edited

codecov-commenter commented May 21, 2024 •

edited

Drakulix commented May 24, 2024 •

edited

ids1024 commented May 24, 2024 •

edited

Drakulix commented May 24, 2024 •

edited

DemiMarie commented May 29, 2024 •

edited