Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Another GPU API #9312

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

flibitijibibo
Copy link
Collaborator

@flibitijibibo flibitijibibo commented Mar 19, 2024

With 3.0 quickly on the horizon, we've decided to fast-track this so we can get more eyeballs sooner:

A close relative to the FNA project is the MoonWorks project, which intends to be more of a successor to XNA rather than a reimplementation like FNA. The graphics component is called Refresh, which is like FNA3D but targets modern APIs like Vulkan. With SDL_gpu still in the early stages, and with Refresh in the process of revamping its API for a 2.0 release anyway, we've decided to just turn Refresh into a possible candidate for the final GPU API.

Currently, Refresh supports Vulkan and the PS5 graphics API, with D3D11 deferred context support on the way. It is being used in production for Samurai Gunn 2 on PC and console. It also includes a shader system, but it takes a different approach by doing compilation offline, at least for now fixed!

@thatcosmonaut will be the main Point of Contact since he's the author, but the FNA core team will try to be involved as well if this seems okay to move forward with.

Things we would need to do to call this ready:

Known Issues:

  • The formatting doesn't match SDL's
  • Windows wasn't cooperating with CMake for some reason, so SDL_internal.h has some hardcoded defines

@darkerbit
Copy link

Would like to state my support for this, with the following points:

  1. Compute is a first-class feature of Refresh, and I feel SDL_GPU's previous draft not including it was a massive oversight. Compute is a feature in every modern graphics API (and even some older ones) and Refresh's API makes them delightfully simple to work with, which I feel is good to encourage. A more niche point is that compute shaders also solve every practical usecase of a GPU-timeline buffer->texture and texture->buffer copy, which is not a feature that is possible to implement in every graphics API (notably, D3D11)
  2. The shader formats being existing formats is a good thing in my opinion, as there exists standard tooling for creating and debugging them. The benefits are addressed in detail in Proposal: Consume Shaders in DXBC Format SDL_shader_tools#17, though that isn't the solution Refresh is using right now.
  3. Refresh already has an excellent Vulkan backend. This is going to save a tremendous amount of development time, as Refresh's Vulkan backend not only already exists but is also, since 2.0, incredibly performant.
  4. Refresh 2.0 puts great care into ensuring performant resource safety via cycling when possible. SDL_GPU's previous draft included APIs for this, however I don't think this is something that should be exposed to the user, as this is fundamentally backend-specific and complicates API usage. Refresh performs resource safety internally, with fantastic results.

@thatcosmonaut
Copy link

Hello, I'm the author so I'll take some time to explain this proposal.

First, the shader situation. This API's shader solution is a script called shaderbuild.py, it's essentially a frontend for offline shader building tools on the client's machine, and it globs various formats together so they can individually be sent to the appropriate render backends. I'm aware that the original proposal included online shader compilation. This solution doesn't forbid this, because future SDLSL source could just be included in the binary and the CreateShaderModule function can translate it into the desired backend's bytecode on the fly. We wouldn't have to break the public API to allow this, which is a nice plus, and it prevents the shader compiler from being a blocker on using this API. When authoring a shader the API expects certain set layouts depending on the resource and shader stage: vertex samplers are set 0, fragment samplers are set 1, vertex uniforms are set 2 and fragment uniforms are set 3.

This is a modern-style rendering API, so almost all tasks occur in a deferred context and are broken up into render passes, compute passes, and copy passes. All operations that write to a resource have the ability to cycle to avoid inter-frame dependencies - handles to graphics resources like GpuBuffers are just containers so we can cycle references to internal resources. There are some data quirks due to AMD D3D11 drivers being not great - this is why there are a few different WriteOptions enums. I'd like to get rid of these but I haven't fully been able to work around the fact that D3D11 data APIs do not work as advertised on AMD.

Presentation is handled via SDL_GpuAcquireSwapchainTexture, which associates the given command buffer with a swapchain image. When SDL_GpuSubmit is called with this command buffer, the presentation structures are automatically configured and submitted. SDL_GpuSubmit automatically handles submission fences, but the client can choose to explicitly synchronize by calling SDL_GpuSubmitAndAcquireFence and using the returned SDL_GpuFence handle.

The rest of the API is fairly bog-standard binding, render, and compute dispatch calls.

@thatcosmonaut
Copy link

I noticed that the CI treats all warnings as errors, which is fine, but there are some things like unused function variables or pointer size casts that I need to be able to work around. Are there macros for handling these warnings?

I was also having a difficult time getting preprocessor defines showing up correctly in the project from CMake, so I could use some help with that.

@slouken slouken requested review from icculus and madebr March 19, 2024 17:25
include/SDL3/SDL_gpu.h Outdated Show resolved Hide resolved
@icculus
Copy link
Collaborator

icculus commented Mar 19, 2024

This is not what I was expecting to see in my inbox this morning.

I'm going to take some time to study this and think on it. I want to be clear, this might get combined with the other branch somewhere between "this particular idea was better than mine so a pivot makes sense" to "this piece got copy/pasted into the my branch," to "this is better in all ways, so I put my branch in the trash." I don't know how this will shake out yet.

Also to be clear: there is no scenario where either this PR or my branch is landing in the 3.2.0 release in the name of expediency.

@slime73
Copy link
Contributor

slime73 commented Mar 19, 2024

This API's shader solution is a script called shaderbuild.py, it's essentially a frontend for offline shader building tools on the client's machine,

Some 2c from an outsider: I understand some of the reasons for purely offline shader compilation (runtime compilation isn't supported on every possible platform anyway, including third party cross-compile libraries might bloat the binary size considerably – or maintaining custom cross-compilation is a massive maintenance burden, optimization can be more aggressive, etc.) – but purely offline compilation incompatible with the engine I develop, so if SDL_gpu does this it immediately puts it off the table for me.

Also, the way it's set up right now seems to need a lot of setup from developers using it: an install of python in their path, an install of glslc in their path, an install of spirv-cross in their path, some per-project setup to run the script on the appropriate files if you don't want to manually run everything in a terminal, etc. If I did use this, that part would probably feel really frustrating to me.

@thatcosmonaut
Copy link

thatcosmonaut commented Mar 19, 2024

As I mentioned, I know there are perfectly good reasons to support online shader compilation and the proposed format intentionally does not preclude that from being possible. This was the best solution I can think of that does not involve pulling in an enormous dependency and works today. At the end of the day something has to compile a shader language to actual bytecode or else we are completely blocked. I mostly went with python because there are a few other scripts in SDL that use it and there's not really a precedent for SDL shipping something like this. But as long as the input format is well-defined it wouldn't really matter what developers use to generate it.

@slime73
Copy link
Contributor

slime73 commented Mar 19, 2024

and works today.

I don't want to speak for the SDL team too much but I don't get the sense this is really the top priority for SDL_gpu in the current moment - figuring out something that works cleanly and robustly tomorrow is likely more important than something simply working in any state immediately. If you are blocked by a lack of SDL_gpu right now then it probably makes sense to use something else in the meantime instead of hoping it ships in the very near future.

Given the rest of SDL's ecosystem, the most idiomatic offline shader compilation tool for SDL I can think of would be a separate satellite library, which could avoid needing system-wide packages like python/glslc for people using it, and might also let people ship it with their app if they want runtime or otherwise on-device compilation I suppose.

But maybe that's getting a bit too far into the weeds for now given Ryan's previous message, and I should just let him think on it without all the noise from me. :)

@1bsyl
Copy link
Contributor

1bsyl commented Mar 20, 2024

(if possible, SDL could run (with compilation flags) in different flavors: legacy SDL_Render, Ryan's SDL_GPU , etc.. )

@sezero
Copy link
Contributor

sezero commented Mar 20, 2024

Got type redefinition errors:

In file included from /tmp/SDL3-gpu-9312/src/gpu/../video/khronos/vulkan/vulkan.h:11,
                 from /tmp/SDL3-gpu-9312/src/gpu/SDL_gpu_vulkan.c:31:
/tmp/SDL3-gpu-9312/src/gpu/../video/khronos/vulkan/vulkan_core.h:101: error: redefinition of typedef ‘VkInstance’
/tmp/SDL3-gpu-9312/include/SDL3/SDL_vulkan.h:52: note: previous declaration of ‘VkInstance’ was here
In file included from /tmp/SDL3-gpu-9312/src/gpu/../video/khronos/vulkan/vulkan.h:11,
                 from /tmp/SDL3-gpu-9312/src/gpu/SDL_gpu_vulkan.c:31:
/tmp/SDL3-gpu-9312/src/gpu/../video/khronos/vulkan/vulkan_core.h:7513: error: redefinition of typedef ‘VkSurfaceKHR’
/tmp/SDL3-gpu-9312/include/SDL3/SDL_vulkan.h:53: note: previous declaration of ‘VkSurfaceKHR’ was here
make[2]: *** [CMakeFiles/SDL3-shared.dir/src/gpu/SDL_gpu_vulkan.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/SDL3-shared.dir/all] Error 2
make: *** [all] Error 2

Easy fix:

diff --git a/src/gpu/SDL_gpu_vulkan.c b/src/gpu/SDL_gpu_vulkan.c
index 7feafda..acf57d2 100644
--- a/src/gpu/SDL_gpu_vulkan.c
+++ b/src/gpu/SDL_gpu_vulkan.c
@@ -22,14 +22,14 @@
 #include "SDL_internal.h"
 
 #if SDL_GPU_VULKAN
-#include <SDL3/SDL_vulkan.h>
-
 /* Needed for VK_KHR_portability_subset */
 #define VK_ENABLE_BETA_EXTENSIONS
 
 #define VK_NO_PROTOTYPES
 #include "../video/khronos/vulkan/vulkan.h"
 
+#include <SDL3/SDL_vulkan.h>
+
 #include "SDL_gpu_driver.h"
 
 #define VULKAN_INTERNAL_clamp(val, min, max) SDL_max(min, SDL_min(val, max))

Minor annoying 'maybe used uninitialized' warning (seems bogus):

/tmp/SDL3-gpu-9312/src/gpu/SDL_gpu_vulkan.c: In function ‘VULKAN_INTERNAL_DeterminePhysicalDevice’:
/tmp/SDL3-gpu-9312/src/gpu/SDL_gpu_vulkan.c:11338: warning: ‘suitableQueueFamilyIndex’ may be used uninitialized in this function

Silencing:

diff --git a/src/gpu/SDL_gpu_vulkan.c b/src/gpu/SDL_gpu_vulkan.c
--- a/src/gpu/SDL_gpu_vulkan.c
+++ b/src/gpu/SDL_gpu_vulkan.c
@@ -11385,6 +11385,7 @@ static Uint8 VULKAN_INTERNAL_DeterminePhysicalDevice(
 
     /* Any suitable device will do, but we'd like the best */
     suitableIndex = -1;
+    suitableQueueFamilyIndex = 0;
     highestRank = 0;
     for (i = 0; i < physicalDeviceCount; i += 1)
     {

Noticed that dynapi isn't touched: Running gendynapi.py under src/dynapi/ will update it.
(It will emit lots of missing documentation warnings along the way..)

@meyraud705
Copy link
Contributor

As I am interested in SDL_gpu and am working on an OpenGL implementation of it (icculus#7), here what I think:

I like the idea of SDL_gpu handling cycles. I am not sure why some combination are not allowed: no safe for GPU buffer and no unsafe for texture.

Most of the structs are similar: Device, Pipeline... TransferBuffer which is like the CpuBuffer but looks useful. There is no *Pass struct, user can't do anything with them but they avoid mixing of render and copy command so I think it is better to have them.

For the functions I prefer @icculus version: "put this buffer/texture at this location". Also, while passing ten parameters to a function is ugly, pointer to struct for 2 parameters and nested struct (TextureRegion -> TextureSlice -> Texture) is not better.

I would like to see the spinning cube test (https://github.com/icculus/SDL/blob/5db1fd7491ccf1d9b209c91ba163da0b2fb54f6f/test/testgpu_spinning_cube.c) with this API (including its shader), especially how you handle the absence of binding location.

@flibitijibibo
Copy link
Collaborator Author

flibitijibibo commented Mar 26, 2024

Latest push starts the work on Metal support, and reworks the shader system slightly to make a clear distinction between "raw" shaders and "portable" shaders - in particular, we're using the FNA3D port as a stress test for supporting shader creation at runtime. We're currently using the MojoShader SPIR-V emitter, which can be passed directly when Vulkan is active, and for other backends we're using a separate (and more importantly, optional) library to translate:

https://github.com/thatcosmonaut/SDL_shader

We're using SPIR-V since MojoShader already supports it, but the same idea would apply to SDL's shader format: A separate library would take in the blob and emit what the active SDL_gpu backend needs, and SDL itself doesn't know or care where the shaders came from.

One thing we haven't really touched at all is the formatting - I seem to remember formatting tools getting involved when the decision was made to change some of SDL's style guidelines, but I can't find them anywhere. If that exists somewhere the next push should have the formatting cleaned up.

@flibitijibibo
Copy link
Collaborator Author

As of the latest push, we're in-game:

image

Still some stuff to iron out but we've now got this working with no offline shader compilation needed! Once we've taken care of all the FNA3D TODOs/FIXMEs we'll start cleaning up the SDL side next, and that should set the stage for porting to D3D12/Metal. I believe we have a test program ported from Ryan's draft as well, it just hasn't been added to this repo yet.

@flibitijibibo
Copy link
Collaborator Author

Activity on this will probably calm down over the weekend, so the code shouldn't move too much if anyone's waiting to review this... in the meantime, here's the current revision booting up Streets of Rage 4 (with some bugs on our end, but still!)

image

@thatcosmonaut
Copy link

The main thing I want to look at is streamlining the various WriteOptions. I'll look into that and writing up SDL-style doc comments on the header next week.

@flibitijibibo
Copy link
Collaborator Author

Latest push gets Wizorb running on Metal! We're still pushing forward via FNA3D but if anyone is interested in looking at Metal/D3D12 support in particular, please get in touch.

@flibitijibibo
Copy link
Collaborator Author

Latest push gets in-game for a good chunk of our trace database - for fun, here's Celeste:

image

Interestingly enough, the one component that's actually pretty close to done is the shader system; at this point I'm really happy with how the MojoShader implementation turned out and it shows a good example of supporting shaders without necessarily needing an SDLSL first:

https://github.com/FNA-XNA/FNA3D/blob/sdl_gpu/src/mojoshader_sdlgpu.c

There are some remaining features left before we start focusing on the backends, but we're now pretty much at a point where the overall design is there but may not have the minimum feature set needed just yet (for example, the next push will add hardware instancing support).

@thatcosmonaut
Copy link

Instancing and occlusion queries are in, just need to fill those in on Metal. We'll just keep going through our test cases to ferret out bugs.

@thatcosmonaut
Copy link

Boiled the cycle concept down to a bool and got rid of all the various WriteOption enums. I've documented how this works thoroughly in the code now.

Right now we're working on making the shader module creation struct take the respective IR format of each backend so that SDL itself doesn't have to wrap any shader compilers.

@corentin35000
Copy link

I can't wait for this to be ready, on PC / PS5 / Windows / Xbox.. with SDL3 <3

@flibitijibibo
Copy link
Collaborator Author

Latest push includes improved support for async readback as well as lots of regression fixes - we're back down to no known bugs at this point. We have one more expected change for transfer support, and then we'll be focused on Metal support and any changes that come along with that (mainly pipeline resource sets which will likely be simplified in the next major push), Alpha should be soon after that!

@thatcosmonaut
Copy link

After a lot of experimentation and consideration of a few different backends we decided to scrap the resource set idea and go back to the slot-based binding model - binding is accomplished via functions like BindVertexSamplers and BindComputeRWStorageBuffers, and pipelines must declare the number of resources in each category at creation time. I have also added an async readback API and a transfer buffer mapping API.

Now that we're back to slot-based binding, we are very close to Ryan's original API design. The only real deviations are compute pipelines, storage buffers/texture support, and the built-in cycle concept. Once we finish revising Metal and tracking down a few edge cases in our tests I think we'll be in a good spot for a serious review.

@flibitijibibo
Copy link
Collaborator Author

The latest push is mostly for Vulkan; @thatcosmonaut spent a ton of time on making sure the renderer passed sync validation, and so we had to do Yet Another Barrier Rewrite to accommodate those warnings.

Aside from maybe swapchain present mode this is expected to be the last API change until after Alpha, so I'll be renaming this thread to [RFC] in a moment. Alpha is still waiting on Metal, and maybe reintroducing a cleaner way to expose mailbox vsync in particular.

@flibitijibibo flibitijibibo changed the title Another GPU API [RFC] Another GPU API May 23, 2024
@flibitijibibo
Copy link
Collaborator Author

flibitijibibo commented May 24, 2024

Ended up mostly bringing back the old present mode system, it's just vsync/immediate/mailbox as an enum now. The common changes needed for Metal are done, so only the Metal files should change between now and Alpha; the public-facing API is ready for feedback.

@corentin35000
Copy link

corentin35000 commented May 24, 2024

Ended up mostly bringing back the old present mode system, it's just vsync/immediate/mailbox as an enum now. The common changes needed for Metal are done, so only the Metal files should change between now and Alpha; the public-facing API is ready for feedback.

Hello and congratulations on the progress! I know the alpha is coming and there is still a risk that the API will be broken until the Release Candidate, but remember that I am available to create the official GPU API documentation! I say this in relation to your comment "the public API is ready to receive comments.", apart from this PR we have almost no information on how to use it via SDL3. I can take my time to discuss together and then I build the documentation if necessary if ever from the alpha if the API no longer moves too much. A documentation of my game framework / library, a wrapper over SDL2, as an example if you wish: https://librc2d.crzcommon2.com/, develop with Vitepress, with only .md files, very maintainable!

@flibitijibibo
Copy link
Collaborator Author

Ended up mostly bringing back the old present mode system, it's just vsync/immediate/mailbox as an enum now. The common changes needed for Metal are done, so only the Metal files should change between now and Alpha; the public-facing API is ready for feedback.

Hello and congratulations on the progress! I know the alpha is coming and there is still a risk that the API will be broken until the Release Candidate, but remember that I am available to create the official GPU API documentation! I say this in relation to your comment "the public API is ready to receive comments.", apart from this PR we have almost no information on how to use it via SDL3. I can take my time to discuss together and then I build the documentation if necessary if ever from the alpha if the API no longer moves too much. A documentation of my game framework / library, a wrapper over SDL2, as an example if you wish: https://librc2d.crzcommon2.com/, develop with Vitepress, with only .md files, very maintainable!

I should have clarified: The GPU API does have comments in the headers, "comments" was referring to "early review", that's on me!

There are probably some bugs in our docs so any notes/fixes to SDL_gpu.h are appreciated; additionally we have testgpu in the PR which may help with starting examples, and we've also started a repo for more GPU examples over here (we're not sure where this will fit in the repo yet):

https://github.com/TheSpydog/SDL_gpu_examples

We have FNA3D and some internal stuff for testing but we could definitely use more bare-bones samples that are public, particularly ones that use compute shaders.

@corentin35000
Copy link

Ended up mostly bringing back the old present mode system, it's just vsync/immediate/mailbox as an enum now. The common changes needed for Metal are done, so only the Metal files should change between now and Alpha; the public-facing API is ready for feedback.

Hello and congratulations on the progress! I know the alpha is coming and there is still a risk that the API will be broken until the Release Candidate, but remember that I am available to create the official GPU API documentation! I say this in relation to your comment "the public API is ready to receive comments.", apart from this PR we have almost no information on how to use it via SDL3. I can take my time to discuss together and then I build the documentation if necessary if ever from the alpha if the API no longer moves too much. A documentation of my game framework / library, a wrapper over SDL2, as an example if you wish: https://librc2d.crzcommon2.com/, develop with Vitepress, with only .md files, very maintainable!

I should have clarified: The GPU API does have comments in the headers, "comments" was referring to "early review", that's on me!

There are probably some bugs in our docs so any notes/fixes to SDL_gpu.h are appreciated; additionally we have testgpu in the PR which may help with starting examples, and we've also started a repo for more GPU examples over here (we're not sure where this will fit in the repo yet):

https://github.com/TheSpydog/SDL_gpu_examples

We have FNA3D and some internal stuff for testing but we could definitely use more bare-bones samples that are public, particularly ones that use compute shaders.

Thank you for your message, ok if you do like SDL, the doc will be the comments in the code no problem. The examples are very good, but I think there is a real lack of documentation, many of the tools are incredible but the documentation is almost non-existent, I find that a shame. I understand that this is still in the development phase so it is completely understandable, but I would have seen a README.md which explains how to write shaders, the specifics for SDL in terms of back-end functions..etc and surely a lot of other information.

@corentin35000
Copy link

It's a bit like SDL2, SDL2 was incredibly good and powerful. But the documentation was almost non-existent with only the prototype of the function and we had to make do with that. I hope SDL3 has reviewed this for documentation as well.

@corentin35000
Copy link

I understand that for the moment by digging into the different PRs, that SDL_Render will not work with the current GPU API

@corentin35000
Copy link

I think there are a lot of people looking for the equivalent of SDL_gpu, which allowed us to use SDL_Render directly with any backend and was easily usable.

@thatcosmonaut
Copy link

It would definitely be possible to implement SDL_Render on top of this API. In fact we were thinking that it might be useful for a maintainer to implement it as part of the evaluation process.

@corentin35000
Copy link

Il serait certainement possible d'implémenter SDL_Render en plus de cette API. En fait, nous pensions qu'il pourrait être utile qu'un responsable l'implémente dans le cadre du processus d'évaluation.

I think few people use it if they haven't used SDL_Render directly, I think it should be a priority

@thatcosmonaut
Copy link

Implementing a tonemapping example has exposed some deficiencies in the compute API, so we're working on some changes right now.

@flibitijibibo
Copy link
Collaborator Author

Latest push is mostly focused on HDR and some changes needed to support it well - one notable exception is a SetHDRMetadata call, which isn't hard to write (just forwards directly to a single DXGI/Vulkan function) but just hasn't been written yet.

The example for HDR tonemapping is here, it handled internal color transforms but doesn't factor in the display metadata yet: https://github.com/TheSpydog/SDL_gpu_examples/blob/tonemap/Examples/ToneMapping.c

@thatcosmonaut
Copy link

The other major change was to BeginComputePass, which now binds all the writeable resources used in the compute pass. This fixes a lot of synchronization issues and brings compute passes more conceptually in line with render passes. As a result, BindComputeRWStorageTextures and BindComputeRWStorageBuffers are removed.

The tonemap example has been now merged into the main branch of the examples repo and can be found here: https://github.com/TheSpydog/SDL_gpu_examples/blob/main/Examples/ToneMapping.c

thatcosmonaut and others added 2 commits May 31, 2024 01:47
Co-authored-by: Ethan Lee <flibitijibibo@gmail.com>
Co-authored-by: Caleb Cornett <caleb.cornett@outlook.com>
Co-authored-by: Zakary Strange <zakarystrange@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet