Thread-safety for rendering related states #1393

Drakulix · 2024-04-17T13:16:54Z

Best reviewed commit-by-commit.

The biggest is easily 5b2e919, which makes the CacheMap thread-safe. A lot of the double buffered state is used to inform rendering, so I am not sure if this can be avoided.

The rest of the changes are most likely less controversial.

The renderer id itself is valid across threads, as it may be shared by shared EGLContexts. This doesn't make the renderer `Send` or `Sync`, but allows multiple renderers on different threads to share textures.

ids1024 · 2024-04-17T21:25:03Z

Where RwLock is used, we expect a lot more readers than writers? Otherwise it may perform worse than just a simple Mutex.

Though really, in a lot of cases it won't make much of a performance difference; nor will using a Mutex instead of RefCell. (It would be nice to have benchmarks, but that isn't really easy here.)

For CacheMap... if rendering is in a different thread from the thread handling Wayland requests and updating the cached state, would it make sense to have cached and pending in different locks? For the most part, the only place both are needed should be when applying the commit.

Though again, it might not make much difference. If the lock isn't held for very long, and we don't have a whole bunch of threads competing for it.

In general using Mutex/RwLock instead of RefCell is a fairly uncontroversial change. Though multiple threads opens up the possibility of dead locks based on inconsistent acquire order of multiple locks, race conditions of various sorts, and all that.

YaLTeR · 2024-04-18T05:27:34Z

Ngl all these locks throughout make me very uneasy. I feel like it can lead to a lot of issues like: rendering thread gets some value (lock/unlock), then main thread gets a surface commit and updates values, then rendering thread later in the function gets another value (lock/unlock) and expects it to "match" the first value it got earlier in the function, but it no longer does.

cmeissl · 2024-04-18T07:07:37Z

I agree with @ids1024 that using RwLock might not yield any benefit here and a Mutex might be the easier option. RwLock also has a few footguns which can result in "unexpected" deadlocks.

I also have similar concerns regarding atomicity of wayland operations as @YaLTeR. This could easily lead to unexpected behavior.

So instead of going into more details regarding this PR changes I want to talk about the actual requirements and if there
might be an alternative to solve them. AFAICT the motivation behind the changes is to allow concurrent rendering of multiple outputs (and probably also screencasts). I also already thought about how we could support offloading of screencopy to a separate thread because this is currently one issue I see on a low-performance board.
I did something similar in an old non rust based implementation, the way it worked was that the main thread still processed the wayland events and the rendering "sampled" (mostly cloning stuff around) the current state blocking wayland processing only for a short time and then offloaded the actual rendering to a separate thread.

I haven't thought that completely through, but from a quick look it should be possible to make the render elements (especially the WaylandSurfaceRenderElement) sample the state and implement Send. The biggest issue I see at the moment is how we handle importing memory/data into the renderer. This is currently done during realizing the render elements, so we would have to re-think that to keep time low during the "sample" phase. But maybe it is possible to split this into multiple smaller issues we can address. For example "sampling" the state in WayandSurfaceRenderElement only once would already yield better performance because we could lock the state once instead of in every time a function is invoked on the object (This is something that got visible in one of my last profiling sessions after getting rid of a lot of other stuff :)).

So some requirements to support this that come to my mind are:

Share Textures whenever possible (multiple renderers with a shared context)
Async upload of memory/buffers or making it "lazy" and move it to a render/upload thread (potentially only locking per renderer/id or using some kind of optimistic concurrency and in worst case wasting some bandwidth)
Keep atomicity of wayland operations
Avoid lock contention as far as possible

Drakulix · 2024-04-18T11:13:13Z

Where RwLock is used, we expect a lot more readers than writers? Otherwise it may perform worse than just a simple Mutex.

The idea was that this is the primary rendering-related state, which might be processed multiple times for rendering, but is more rarely updated. Surely a Mutex wouldn't make a huge difference though.

For CacheMap... if rendering is in a different thread from the thread handling Wayland requests and updating the cached state, would it make sense to have cached and pending in different locks? For the most part, the only place both are needed should be when applying the commit.

Though again, it might not make much difference. If the lock isn't held for very long, and we don't have a whole bunch of threads competing for it.

I thought about this as well and concluded the same thing and thus went with the easier solution.

Ngl all these locks throughout make me very uneasy. I feel like it can lead to a lot of issues like: rendering thread gets some value (lock/unlock), then main thread gets a surface commit and updates values, then rendering thread later in the function gets another value (lock/unlock) and expects it to "match" the first value it got earlier in the function, but it no longer does.

Yeah this definitely needs to be tested and probably bug fixed a lot, once any compositor actually makes use of it. It might lead to a few imperfect frames in it's current state. I can try to update and carry these patches until I have a proper implementation to show off (hopefully soon) to properly asses if this really is an issue.

I did something similar in an old non rust based implementation, the way it worked was that the main thread still processed the wayland events and the rendering "sampled" (mostly cloning stuff around) the current state blocking wayland processing only for a short time and then offloaded the actual rendering to a separate thread.

Yes that is the general idea, however I feel like doing this after accumulating all the elements is the wrong approach. The idea of parallel rendering comes from the fact that you simply can't drive two 240hz displays on one main loop realistically, because the timings to commit in time are so tight. We actually profile and measure the "elements"-step separately from "drawing" in cosmic-comp and the former takes a good amount of time even after the last round of allocation optimizations.

I really want to move this step into a render-thread as well, so while I like the theoretical idea of snapshotting the surface state, I don't feel like this would actually work as intended.

For what is worth, cosmic-comp now puts it's whole Shell-state behind an RwLock to allow parallel read-access from the rendering-threads, which means all wayland-events that need to call back into that are already blocked, most importantly commit. So I don't know if any of this would actually affect cosmic-comp. As such I would be interested to hear if any compositors are planning to implement this differently?

The biggest issue I see at the moment is how we handle importing memory/data into the renderer.

And that problems runs deeper. The renderers do mostly cache things using insert_if_missing not insert_if_missing_threadsafe, which is why I didn't update the user-data structs of the GlesRenderer or the MultiRenderer yet. Sharing contexts with the MultiRenderer also isn't exactly straight forward.. Right now I am instantiating a complete GpuManager per thread even though I would really like to share textures.

I don't think we should handle the import on the main thread btw. Importing with implicit sync might stall for quite a long time, with the MultiRenderer it might even involve a CPU copy. So locking EGLContexts to use them across multiple threads (except for creating shared contexts) is out of question luckily, which is another reason to move element-creation into the render threads.

Async upload of memory/buffers or making it "lazy" and move it to a render/upload thread (potentially only locking per renderer/id or using some kind of optimistic concurrency and in worst case wasting some bandwidth)

So the idea would be to decouple import from element-creation? That might be a good idea and would easily allow to move this to another thread. (in just continue using old buffers, while the import thread is still busy?)

For example "sampling" the state in WayandSurfaceRenderElement only once would already yield better performance because we could lock the state once instead of in every time a function is invoked on the object (This is something that got visible in one of my last profiling sessions after getting rid of a lot of other stuff :)).

In general I think this is a good idea, no matter on which thread this takes place. This ensures a consistent rendered state and might already solve 99% of the problems @YaLTeR has with this PR?

YaLTeR · 2024-04-18T12:16:14Z

I don't think we should handle the import on the main thread btw. Importing with implicit sync might stall for quite a long time, with the MultiRenderer it might even involve a CPU copy.

SHM uploads are especially painful on the profiles. Which happen every time you mouse over a different window (cursor surfaces are usually SHM), and also for clients like foot and maybe swww that can/do go fullscreen.

For example "sampling" the state in WayandSurfaceRenderElement only once would already yield better performance because we could lock the state once instead of in every time a function is invoked on the object (This is something that got visible in one of my last profiling sessions after getting rid of a lot of other stuff :)).

In general I think this is a good idea, no matter on which thread this takes place. This ensures a consistent rendered state and might already solve 99% of the problems @YaLTeR has with this PR?

That will be very welcome regardless. I had to make my own poor man's implementation of this for animations (window resize, window close): https://github.com/YaLTeR/niri/blob/47f6c85f64a20a9e32bf402941ecb78065aff80a/src/render_helpers/surface.rs No opaque regions or damage there though.

Drakulix requested review from cmeissl, elinorbgr and ids1024 April 17, 2024 13:17

Drakulix force-pushed the feature/thread-safe-rendering branch 3 times, most recently from f3ec96f to cb9f536 Compare April 17, 2024 13:36

Drakulix added 7 commits April 17, 2024 15:46

desktop/layer: Make interactions with LayerShell thread-safe

f022091

renderer: Make RenderSurfaceState thread-safe

858a826

multigpu: Make it possible to use multiple instances

49043dd

space: Make output operations thread-safe

d42a9fc

viewporter: Make surface access thread-safe

a6cc09f

compositor: Make cached_state access thread-safe

19df61e

renderer/gles: Insert RendererId threadsafe

1fe3c70

The renderer id itself is valid across threads, as it may be shared by shared EGLContexts. This doesn't make the renderer `Send` or `Sync`, but allows multiple renderers on different threads to share textures.

Drakulix force-pushed the feature/thread-safe-rendering branch from cb9f536 to 1fe3c70 Compare April 17, 2024 13:47

Drakulix mentioned this pull request Apr 18, 2024

elements/surface: Snapshot render state #1395

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread-safety for rendering related states #1393

Thread-safety for rendering related states #1393

Drakulix commented Apr 17, 2024

ids1024 commented Apr 17, 2024

YaLTeR commented Apr 18, 2024

cmeissl commented Apr 18, 2024

Drakulix commented Apr 18, 2024 •

edited

YaLTeR commented Apr 18, 2024

Thread-safety for rendering related states #1393

Are you sure you want to change the base?

Thread-safety for rendering related states #1393

Conversation

Drakulix commented Apr 17, 2024

ids1024 commented Apr 17, 2024

YaLTeR commented Apr 18, 2024

cmeissl commented Apr 18, 2024

Drakulix commented Apr 18, 2024 • edited

YaLTeR commented Apr 18, 2024

Drakulix commented Apr 18, 2024 •

edited