Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

video: Only prefer Wayland if fifo-v1 and commit-timing-v1 are available #9383

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Joshua-Ashton
Copy link
Contributor

Description

Wayland has a myriad of unresolved problems regarding surface suspension blocking forever in QueuePresent/SwapBuffers when occludedand the FIFO (vsync) implementation being fundamentally broken leading to reduced GPU-bound performance and 'barcoding' frametimes due to swapchain starvation.

There are two protocols used to solve these two problems together -- fifo-v1 and commit-timing-v1, which implement the commit queue on the compositor side, and a timestamp that frames are intended to be displayed for/discarded respectfully.

To avoid severe performance regressions for developers targeting SDL3, only pick Wayland as the default backend when these two protocols are supported -- otherwise fallback to X11/XWayland.

We do this by having two VideoBootStraps, one which is tests the preferred case, "wayland_preferred" (ie. if fifo-v1 + commit-timing-v1 are available init time), and the fallback, which is just "wayland", the same name as before, which does no such tests. Thus, forcing with SDL_VIDEO_DRIVER=wayland will go onto the fallback option, and pick Wayland always, as usual, so there is no behaviour change.

In the case that X11/XWayland is not available (ie. no DISPLAY), we will still fallback to using Wayland without these protocols available.

Existing Issue(s)

Previous MR: #9345

Sam's comment saying to do exactly this: #9345 (comment)

@Joshua-Ashton Joshua-Ashton force-pushed the prefer-wayland-with-fifo branch 2 times, most recently from 0e957ff to 1815484 Compare March 27, 2024 18:44
Wayland has a myriad of unresolved problems regarding surface suspension
blocking forever in QueuePresent/SwapBuffers when occludedand the FIFO
(vsync) implementation being fundamentally broken leading to reduced
GPU-bound performance and 'barcoding' frametimes due to swapchain
starvation.

There are two protocols used to solve these two problems together --
fifo-v1 and commit-timing-v1, which implement the commit queue on the
compositor side, and a timestamp that frames are intended to be
displayed for/discarded respectfully.

To avoid severe performance regressions for developers targeting SDL3,
only pick Wayland as the default backend when these two protocols are
supported -- otherwise fallback to X11/XWayland.

We do this by having two VideoBootStraps, one which is tests the
preferred case, "wayland_preferred" (ie. if fifo-v1 + commit-timing-v1
are available init time), and the fallback, which is just "wayland",
the same name as before, which does no such tests.
Thus, forcing with SDL_VIDEO_DRIVER=wayland will go onto the fallback
option, and pick Wayland always, as usual, so there is no behaviour
change.

In the case that X11/XWayland is not available (ie. no DISPLAY), we will
still fallback to using Wayland without these protocols available.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
@slouken
Copy link
Collaborator

slouken commented Mar 27, 2024

Thanks!

We'll leave this open for now so people can continue to test and report Wayland issues and give the proposed protocols a chance to be approved.

@slouken slouken added this to the 3.2.0 milestone Mar 27, 2024
@Conan-Kudo
Copy link
Contributor

FYI: @Zamundaaa @davidedmundson @zzag @jadahl @jonas2515 @AlanGriffiths @wmww

@Conan-Kudo
Copy link
Contributor

Also FYI: @Drakulix

@Kontrabant
Copy link
Contributor

I'm OK with this as "Wayland is the default, if your distro is new enough" since it still signals to developers that Wayland is ultimately the preferred default and shouldn't be ignored in their Vulkan/GL loader code, and avoids any potential drama that would inevitably accompany changing the default after the final release.

Incidentally, I like this pattern, and I'm thinking this could be applied to a solution that was suggested for "Pipewire by default, but only if set to mix audio", as well.

@Conan-Kudo
Copy link
Contributor

Incidentally, I like this pattern, and I'm thinking this could be applied to a solution that was suggested for "Pipewire by default, but only if set to mix audio", as well.

These are not equivalent scenarios. On systems with PipeWire audio servers, it doesn't matter which API you use, everything will be going to PipeWire anyway. Using the PipeWire API means a richer interface which can make things easier for other parts of the user's stack.

Copy link
Collaborator

@flibitijibibo flibitijibibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good solution, though I wonder if, between this and PipeWire, we should make it easier to have the pattern of "this thing is better if x/y/z, otherwise use a different thing". PW is a bit different in that our solution for that is looking to be "if we connect Pulse and it turns out it's PipeWire, bail so we can go straight to PW," if it's possible to do this for X (I think Flatpak's sandbox does this somehow?) that would simplify this further.

In any case, definitely better than a full revert ✔️

@Kontrabant
Copy link
Contributor

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

@Conan-Kudo
Copy link
Contributor

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

Is this actually fixed somewhere or by something?

@Kontrabant
Copy link
Contributor

Kontrabant commented Mar 27, 2024

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

Is this actually fixed somewhere or by something?

Currently, if you use SDL's internal GL presentation function, SDL always sets the EGL swap interval to 0 and busy-waits on the frame callback with a timeout when presenting in order to avoid the issue where presentation can block forever if the frame callback stops, which is messy and undesirable as it doesn't allow swap intervals greater than 1 and can have timing issues. These protocols fix the blocking-forever issue, and if they are basically required to use Wayland without forcing it on, we can probably just get rid of our own internal mess in the presentation path.

This workaround is only for the internal Wayland_GLES_SwapWindow function, so it doesn't help if using Vulkan, or if apps/games directly call the presentation functions themselves. Still, I think everyone who has had the displeasure of dealing with the workaround would love to see it gone in favor of things working properly :)

Copy link
Contributor

@Kontrabant Kontrabant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general problem that I see with this approach is that, if these protocols are present, the video driver name will be set to "wayland_preferred", which can break any code that does something like

if (SDL_strcmp(SDL_GetCurrentVideoDriver(), "wayland") == 0) {
  // Do Wayland stuff
}

This breaks testnative with

ERROR: Couldn't find native window code for wayland_preferred driver

and probably other applications, as it's a common pattern.

The name of the driver should be reported as "wayland" regardless of whether the preferred or fallback bootstrap struct was used.

I'm thinking that it would be easiest to modify the code in SDL_video.c that loops over the video drivers to check all the list items for a given name instead of bailing as soon as the first one is found, as then the list can have multiple entries with the same name for cases like this.

Line 510 in SDL_video.c

break;

becomes

if (video) {
    break;
}

@@ -398,6 +446,18 @@ static SDL_VideoDevice *Wayland_CreateDevice(void)
}
}

/*
* If we are checking for preferred Wayland, then let's query for
* fifo-v1 and commit-timing-v1's existance so we don't regress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
* fifo-v1 and commit-timing-v1's existance so we don't regress
* fifo-v1 and commit-timing-v1's existence so we don't regress

@flibitijibibo
Copy link
Collaborator

Been thinking about this for FNA3D/GPU and one possible solution to the "preferred" issue is adding a field to the bootstrap system like SDL_bool strict;, where most are not strict but occasionally you might have a case where a driver may be preferred over another, but with more strict requirements.

You could maybe add the bool to both the bootstrap struct as well was the Create function, so Wayland's create would still be one function and the bootstraps for Wayland and WaylandStrict would be identical minus the bool value. It would then try WaylandStrict, X11, then Wayland in that order. (This would be more helpful with GPU where Vulkan support might be preferred over D3D12, but only if it has a bunch of shiny new extensions that make it perform better)

@cgutman
Copy link
Collaborator

cgutman commented Mar 30, 2024

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

I'd prefer not to do this because it will be regressing apps that already opt-in to the Wayland backend today. Users may also override SDL_VIDEODRIVER for their own reasons and we don't want to make their experience worse.

@slouken
Copy link
Collaborator

slouken commented Mar 30, 2024

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

I'd prefer not to do this because it will be regressing apps that already opt-in to the Wayland backend today. Users may also override SDL_VIDEODRIVER for their own reasons and we don't want to make their experience worse.

Agreed.

@Kontrabant
Copy link
Contributor

Kontrabant commented Mar 30, 2024

Potential solution to the problem where one video driver now has multiple names:
Kontrabant@4083c58

Allows multiple entries to have the same name by trying all of the entries that match the string before reporting failure, and deduplicates the driver list available to client applications.

@slouken
Copy link
Collaborator

slouken commented Mar 30, 2024

Potential solution to the problem where one video driver now has multiple names: Kontrabant@4083c58

Allows multiple entries to have the same name by trying all of the entries that match the string before reporting failure, and deduplicates the driver list available to client applications.

This seems like a good solution to me.

Allow multiple bootstrap entries for a single video driver with the same name, which internally allows preferential and fallback init conditions while hiding the implementation details from applications (e.g. applications will just see "wayland", regardless of whether it's using the preferred or fallback driver list entry).

If a driver is requested, all instances of it in the list will be tried before reporting failure, and client applications programmatically enumerating the video drivers will be presented with a deduplicated list of entries.
Copy link
Collaborator

@flibitijibibo flibitijibibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea is still good but I think we now have a better way to handled the "preferred" driver model: The bootstrap logic should be redone to look more like #9473's, which will avoid the string duplication issue.

@Kontrabant
Copy link
Contributor

I have the needed changes in a separate branch, I'll just push the changes onto this pull.

@Joshua-Ashton
Copy link
Contributor Author

Go for it

@Kontrabant
Copy link
Contributor

Pushed the commit to mirror the Pipewire-by-default changes.

@slouken
Copy link
Collaborator

slouken commented Apr 15, 2024

Great, let's hold on this for now and we'll discuss when in the SDL3 release timeline we want to merge it.

@slouken slouken modified the milestones: 3.2.0, 3.0 ABI May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants