Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless #493

Open
xqms opened this issue Feb 8, 2021 · 9 comments

Comments

@xqms
Copy link
Contributor

xqms commented Feb 8, 2021

Hey @mosra, I'm working on a VR application with low-latency JPEG streaming and I'm trying to reduce latency even further by using a background thread to upload the data. Not sure yet if I can gain something using that setup, but I noticed a problem along the way:

The GL::defaultFramebuffer instance is a global variable and it gets written to (at least) in

void DefaultFramebuffer::initializeContextBasedFunctionality(Context& context) {
Implementation::FramebufferState& state = *context.state().framebuffer;
/* Initial framebuffer size */
GLint viewport[4];
glGetIntegerv(GL_VIEWPORT, viewport);
defaultFramebuffer._viewport = state.viewport = Range2Di::fromSize({viewport[0], viewport[1]}, {viewport[2], viewport[3]});
CORRADE_INTERNAL_ASSERT(defaultFramebuffer._viewport != Implementation::FramebufferState::DisengagedViewport);
}
, which is called during GL::Context creation.

So setting up a second GL::Context will mess up the main context's defaultFramebuffer. In a multi-threading scenario, it might even end up with some completely bogus viewport range.

Since the docs seem to suggest that multi-threading with multiple contexts is supported (e.g. https://doc.magnum.graphics/magnum/platform.html#platform-windowless-contexts) I guess we should fix that somehow.

Some thoughts on a fix:

  • Will be rather difficult since defaultFramebuffer cannot be made thread-local and exported on windows. And even if we did that, we would need to overwrite it on GL::Context::makeCurrent().
  • We could make the default framebuffer a property of the context state (currentContext is already thread-local), and change defaultFramebuffer to a function returning currentContext->state_->defaultFramebuffer or similar). That would be a larger API change - is it worth it?
  • Maybe write a wrapper class that forwards all method calls to the correct framebuffer instance? Involves duplicating all methods...

I might have time to work on this later this week, but I guess I'll wait for your input on this ;)

@mosra
Copy link
Owner

mosra commented Feb 8, 2021

Hey Max! :)

At some point I was even considering making GL::defaultFramebuffer constexpr, because apart from this one there are no global constructors/destructors being run, but due to all its API being non-const it wasn't really possible.

Practically though, I don't see such a big issue here -- if there's multiple GL thread-local contexts, it's likely that only one of them has / should have access to the window framebuffer, with the others being windowless. (And then, in case of multiple windows, it doesn't make sense performance-wise to have a dedicated context for each, instead a single GL context should drive all of them. Not sure what exactly happen if you'd attempt to have multiple windows with thread-local GL contexts and thread-local event handling, but I bet that's not really possible on all platforms either.)

So, given that the extra GL contexts should be windowless, with either a trivial (32x32, as GLX/EGL does) or a completely nonexistent (CGL) default framebuffer, I'd say one way to fix this would be to only modify the defaultFramebuffer internals from the context that has a window attached.

I'm trying to reduce latency even further by using a background thread to upload the data.

This should be doable even without multiple GL contexts -- create a BufferImage, map its memory (or make it persistently mapped), pass the view to the other thread and write to it from there. Then you need some synchronization to avoid GL reading the data before they're made available to the GPU, and I'm not sure if all needed GL sync primitives are exposed, you may need to go with raw GL calls for that.

Or you could go with Vulkan, haha. But I still need a few weeks to complete texturing and shader uniform support (descriptor sets etc), until then you'd be forced to use a lot of raw Vulkan APIs to get the work done.

@mosra mosra added this to the 2020.0b milestone Feb 8, 2021
@xqms
Copy link
Contributor Author

xqms commented Feb 8, 2021

So, given that the extra GL contexts should be windowless, with either a trivial (32x32, as GLX/EGL does) or a completely nonexistent (CGL) default framebuffer, I'd say one way to fix this would be to only modify the defaultFramebuffer internals from the context that has a window attached.

Sounds good! So we would need some kind of flag that tells GL::Context::tryCreate not to call DefaultFramebuffer::initializeContextBasedFunctionality? Or is there some way to detect that there is no attached window?

This should be doable even without multiple GL contexts -- create a BufferImage, map its memory (or make it persistently mapped), pass the view to the other thread and write to it from there. Then you need some synchronization to avoid GL reading the data before they're made available to the GPU, and I'm not sure if all needed GL sync primitives are exposed, you may need to go with raw GL calls for that.

That's my baseline implementation - but it creates more lag spikes on the main thread, which is doing the GL::Texture2D::setSubImage(). Having a second thread with a queue of textures seems to eliminate that. I got the idea from
https://on-demand.gputechconf.com/gtc/2012/presentations/S0356-GTC2012-Texture-Transfers.pdf - a bit old but interesting.
My data is not coming from the CPU, but rather from CUDA (nvJPEG).

But in the end I'm just randomly trying different streaming architectures, hoping to squeeze out a few more ms ;)

@mosra
Copy link
Owner

mosra commented Feb 8, 2021

That's my baseline implementation

Haha okay, I should have expected you're already past that :) I remember that presentation, but only the PBR part of it.

So we would need some kind of flag that tells GL::Context::tryCreate not to call

Basically. I briefly considered the idea of initializeContextBasedFunctionality() checking if the viewport has size below certain threshold and assuming there's no default viewport based on that, but that feels kinda brittle. So an enum flag, then. Default would be assuming a windowed context to avoid breaking existing code, in most cases this would be done inside various [Windowless]*Application implementations anyway.

TIL about nvJPEG, thanks! Too bad it's tied to CUDA... I wonder how would that compare to having ASTC-encoded images as the input instead, for example.

@xqms
Copy link
Contributor Author

xqms commented Feb 8, 2021

TIL about nvJPEG, thanks! Too bad it's tied to CUDA... I wonder how would that compare to having ASTC-encoded images as the input instead, for example.

Hmm, nice idea! I haven't looked into texture compression formats yet... I'd guess the decoder is much faster (nvJPEG needs around 10ms for a 4K image). But JPEG has much better image quality at the same size - I'm streaming 2x 4K with 45fps over GBit Ethernet, so I'm kind of bandwidth-sensitive ;)

@mosra
Copy link
Owner

mosra commented Feb 8, 2021

Ah, then saving bandwidth in a GPU-GPU copy at the cost of needing more network bandwidth probably doesn't make sense :D

@mosra
Copy link
Owner

mosra commented Mar 3, 2021

Implemented in 46df57f, which was a variation of #494.

@mosra mosra closed this as completed Mar 3, 2021
GL automation moved this from TODO to Done Mar 3, 2021
@mosra
Copy link
Owner

mosra commented Oct 12, 2021

As of 673022b, the GL::defaultFramebuffer global isn't touched during context creation anymore, the default framebuffer viewport is instead stored in the (thread-local) state tracker. Because of that, the Flag::Windowless isn't really needed anymore -- it only prevents a single glGet(GL_VIEWPORT) call, which is harmless -- and I'm wondering if:

  • I should remove it again to reduce the API complexity
  • or if I should leave it there as a placeholder for various potential new features, such as adding assertions for when the default framebuffer is touched on a windowless context (which is almost always an error)

Thoughts? :)

@mosra mosra reopened this Oct 12, 2021
GL automation moved this from Done to In Progress Oct 12, 2021
@mosra mosra changed the title GL::defaultFramebuffer is neither multi-context nor multi-thread safe GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless Oct 12, 2021
@xqms
Copy link
Contributor Author

xqms commented Oct 20, 2021

Hm, I guess leaving the flag there even if it has no impact right now is OK. But I don't have a strong opinion on this 😅

@mosra
Copy link
Owner

mosra commented Oct 20, 2021

Given recent experiences where it took several days to figure out why nothing is rendered on iOS when multiple framebuffers are used (there's no default framebuffer and so there's no such thing as "binding back the framebuffer with ID 0"), I might eventually turn it into an assertion. Sounds like a good thing to do, after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
GL
  
In Progress
Development

Successfully merging a pull request may close this issue.

2 participants