-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP linux-drm-syncobj-v1 #1356
base: master
Are you sure you want to change the base?
WIP linux-drm-syncobj-v1 #1356
Conversation
b732491
to
f6cd415
Compare
d026857
to
e933521
Compare
Reading a bit more about https://github.com/ValveSoftware/gamescope/blob/master/src/drm.cpp uses an Then we can poll the fence produced by |
I am not sure, if you can use
Yes, if we do CPU latching (that is polling the fence), then this seems to be the right approach for direct-scanout. Though I would still just send the fence for composited frames, although we obviously could apply the principle there as well, as I tried here: pop-os/cosmic-comp#291 However I don't see a benefit outside of cases, where the driver supports polling fences, but not Perhaps there are some advantages for VRR use cases, but I don't think that is worth the trouble for a first iteration.
👍 |
2fb6a6d
to
fbb5ca6
Compare
a4246d7
to
37c2c7e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1356 +/- ##
==========================================
+ Coverage 20.27% 20.29% +0.01%
==========================================
Files 161 162 +1
Lines 26041 26351 +310
==========================================
+ Hits 5281 5348 +67
- Misses 20760 21003 +243
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
57ae573
to
e5b0a4b
Compare
pub trait DrmSyncobjHandler { | ||
// TODO better way to deal with this? | ||
/// DRM device for importing syncobj file descriptors | ||
fn import_device(&self) -> &DrmDeviceFd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we at least pass the Client
here? I am not sure, if any part of the stack expects a particular device, but I would like to make this the same device we advertise via wl_drm or as a main device in cosmic's implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think it shouldn't matter? In which case it would be simpler to take a DrmDeviceFd
when creating the global.
But if it does matter, that a reason to stick with a trait method like this. Although, we probably would want to derive the node from the Dmabuf
. Which should handle that, but also a device specified with #1430. Right?
We could pass the WlBuffer
. But if the implementation here assumes the buffer is created with zwp_linux_dmabuf_v1
, we could pass the Option<DrmNode>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] we could pass the
Option<DrmNode>
That would only be set for dmabuf <= 3 (and potentially >= 6), right? We have more information given the main device in cosmic. So imo the WlBuffer
is the most flexible, as it also allows to get the Client
.
Whether we make use of this in the end is a separate question, but I think we should offer this in the api.
impl<B: Buffer> ScanoutBuffer<B> { | ||
fn acquire_point(&self) -> Option<SyncPoint> { | ||
if let Self::Wayland(buffer) = self { | ||
return buffer.acquire_point().cloned().map(SyncPoint::from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a blocker is used this point will always be signalled, right? Then we don't need to create our own signalled fence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my question yesterday.
If we assume every smithay-based compositor will use DrmSyncpointBlocker
, we don't need to get a fence from the sync point. But should create a fence that is already signaled to use as an IN_FENCE_FD
to make sure no implicit sync occurs.
If DmabufBlocker
is used for any surface that isn't using explicit sync, we can also submit a signaled IN_FENCE_FD
for implict sync surfaces. (So we can just do that for every direct scanout plane.)
(If some Smithay compositors might not want to use blockers like that, they'll need a way to submit an IN_FENCE_FD
from the acquire point.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assume every smithay-based compositor will use DrmSyncpointBlocker, we don't need to get a fence from the sync point. But should create a fence that is already signaled to use as an IN_FENCE_FD to make sure no implicit sync occurs.
If DmabufBlocker is used for any surface that isn't using explicit sync, we can also submit a signaled IN_FENCE_FD for implict sync surfaces. (So we can just do that for every direct scanout plane.)
So maybe we can make these blockers attach a hint on the WlSurface
, that signals the DrmCompositor
, that the buffer should be ready? And just fall back to implicit sync (no IN_FENCE_FD
) otherwise?
I suppose no smithay compositor would want to use a different fence for a buffer. At least not when using the DrmCompositor, which just does direct scanout or submits an IN_FENCE_FD
for compositing.
Anything else that might need custom fences (e.g. multi-gpu transfers directly scanned out or composing onto a plane) is not supported right now by the DrmCompositor
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make the blocker take a reference to the Buffer
instead of the DrmSyncPoint
and put an AtomicBool
in the Buffer
indicating that it's already been blocked on. Though it might be simpler for DrmCompositor
to take a bool
of whether or not to assume buffers are already ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it might be simpler for DrmCompositor to take a bool of whether or not to assume buffers are already ready.
Yeah, though then we need good docs to explain when it is safe to pass that bool. (And use the fences from the acquire points otherwise?)
Perhaps not using a blocker could make sense for a Without blockers, a renderer may also need to handle the acquire points. A Vulkan renderer should import the timeline as a (Perhaps it would be best to document that the blocker (or something equivalent) is required for Smithay's syncobj-v1 implementation, and generalize things later?) Testing this: I see Wayland EGL on the Nvidia 555 driver, and Mesa Vulkan on git, both freeze if |
Given commits could be blocking, this might permanently lock up a display without a timeout. It might be reasonable to never send any user-provided fences and instead require adding a Blocker, if one chooses the implement linux-drm-syncobj.
We kinda do that already? SyncPoints can be awaited via the GPU driver in smithay currently and we use that for synchronizing multi-gpu copies and block for compositing before submitting.
Yes, agreed. So to sum up:
Sounds good?
That is the issue with races, I guess.
Sounds good! |
Apparently fences have timeouts? Since the kernel wants to prevent an indefinite lock like this: https://docs.kernel.org/driver-api/dma-buf.html#indefinite-dma-fences Presumably the timeout behavior would be similar to implict sync without a So that does sound like something that could be a viable optimization for things like fullscreen applications, even if a general purpose compositor probably want to use blockers under most circumstances (for implicit or explicit sync). But it's an optimization that can be handled later, with more testing.
Ah right, we already have support for that in the renderer. So if we want to use the
I'm not sure if we can automagically fallback to implicit sync. We could just not pass an I think we need to either:
|
Right. For fullscreen surfaces, we probably want to just pass through the fence, not for normal desktop usage though. I agree this is a later optimization.
Yes, but we kinda don't want that (except for the fullscreen case again), because then we would block compositing for an undefined amount of time. The goal is to render as late as possible and use the latest ready buffer.
Yes we can absolutely do that, that is what we are doing right now. We would only block drivers not supporting implicit sync (nvidia). Anyone serious about nvidia support, just needs to implement the protocol and I don't want to mandate that. So strong preference for 1. |
My understanding is that DRM syncobjs are, at least in the future, not guaranteed to ever signal. This is because future drivers may support Userspace Memory Fences (UMFs) and userspace can deadlock themselves with them. Therefore, I think Smithay needs to do a (maybe asynchronous) explicit wait (possibly with a timeout) so that the display cannot be locked up forever. |
So far just exposes the protocol but doesn't do anything. #1327 does indeed make that part mostly painless instead of having impl bounds to deal with and a delegate macro.