GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless #493

xqms · 2021-02-08T14:08:38Z

Hey @mosra, I'm working on a VR application with low-latency JPEG streaming and I'm trying to reduce latency even further by using a background thread to upload the data. Not sure yet if I can gain something using that setup, but I noticed a problem along the way:

The GL::defaultFramebuffer instance is a global variable and it gets written to (at least) in

magnum/src/Magnum/GL/DefaultFramebuffer.cpp

Lines 116 to 124 in 41b59d8

    
           void DefaultFramebuffer::initializeContextBasedFunctionality(Context& context) { 
        
               Implementation::FramebufferState& state = *context.state().framebuffer; 
        
               /* Initial framebuffer size */ 
        
               GLint viewport[4]; 
        
               glGetIntegerv(GL_VIEWPORT, viewport); 
        
               defaultFramebuffer._viewport = state.viewport = Range2Di::fromSize({viewport[0], viewport[1]}, {viewport[2], viewport[3]}); 
        
               CORRADE_INTERNAL_ASSERT(defaultFramebuffer._viewport != Implementation::FramebufferState::DisengagedViewport); 
        
           }

, which is called during GL::Context creation.

So setting up a second GL::Context will mess up the main context's defaultFramebuffer. In a multi-threading scenario, it might even end up with some completely bogus viewport range.

Since the docs seem to suggest that multi-threading with multiple contexts is supported (e.g. https://doc.magnum.graphics/magnum/platform.html#platform-windowless-contexts) I guess we should fix that somehow.

Some thoughts on a fix:

Will be rather difficult since defaultFramebuffer cannot be made thread-local and exported on windows. And even if we did that, we would need to overwrite it on GL::Context::makeCurrent().
We could make the default framebuffer a property of the context state (currentContext is already thread-local), and change defaultFramebuffer to a function returning currentContext->state_->defaultFramebuffer or similar). That would be a larger API change - is it worth it?
Maybe write a wrapper class that forwards all method calls to the correct framebuffer instance? Involves duplicating all methods...

I might have time to work on this later this week, but I guess I'll wait for your input on this ;)

The text was updated successfully, but these errors were encountered:

mosra · 2021-02-08T15:34:15Z

Hey Max! :)

At some point I was even considering making GL::defaultFramebuffer constexpr, because apart from this one there are no global constructors/destructors being run, but due to all its API being non-const it wasn't really possible.

Practically though, I don't see such a big issue here -- if there's multiple GL thread-local contexts, it's likely that only one of them has / should have access to the window framebuffer, with the others being windowless. (And then, in case of multiple windows, it doesn't make sense performance-wise to have a dedicated context for each, instead a single GL context should drive all of them. Not sure what exactly happen if you'd attempt to have multiple windows with thread-local GL contexts and thread-local event handling, but I bet that's not really possible on all platforms either.)

So, given that the extra GL contexts should be windowless, with either a trivial (32x32, as GLX/EGL does) or a completely nonexistent (CGL) default framebuffer, I'd say one way to fix this would be to only modify the defaultFramebuffer internals from the context that has a window attached.

I'm trying to reduce latency even further by using a background thread to upload the data.

This should be doable even without multiple GL contexts -- create a BufferImage, map its memory (or make it persistently mapped), pass the view to the other thread and write to it from there. Then you need some synchronization to avoid GL reading the data before they're made available to the GPU, and I'm not sure if all needed GL sync primitives are exposed, you may need to go with raw GL calls for that.

Or you could go with Vulkan, haha. But I still need a few weeks to complete texturing and shader uniform support (descriptor sets etc), until then you'd be forced to use a lot of raw Vulkan APIs to get the work done.

xqms · 2021-02-08T16:24:23Z

So, given that the extra GL contexts should be windowless, with either a trivial (32x32, as GLX/EGL does) or a completely nonexistent (CGL) default framebuffer, I'd say one way to fix this would be to only modify the defaultFramebuffer internals from the context that has a window attached.

Sounds good! So we would need some kind of flag that tells GL::Context::tryCreate not to call DefaultFramebuffer::initializeContextBasedFunctionality? Or is there some way to detect that there is no attached window?

This should be doable even without multiple GL contexts -- create a BufferImage, map its memory (or make it persistently mapped), pass the view to the other thread and write to it from there. Then you need some synchronization to avoid GL reading the data before they're made available to the GPU, and I'm not sure if all needed GL sync primitives are exposed, you may need to go with raw GL calls for that.

That's my baseline implementation - but it creates more lag spikes on the main thread, which is doing the GL::Texture2D::setSubImage(). Having a second thread with a queue of textures seems to eliminate that. I got the idea from
https://on-demand.gputechconf.com/gtc/2012/presentations/S0356-GTC2012-Texture-Transfers.pdf - a bit old but interesting.
My data is not coming from the CPU, but rather from CUDA (nvJPEG).

But in the end I'm just randomly trying different streaming architectures, hoping to squeeze out a few more ms ;)

mosra · 2021-02-08T16:44:10Z

That's my baseline implementation

Haha okay, I should have expected you're already past that :) I remember that presentation, but only the PBR part of it.

So we would need some kind of flag that tells GL::Context::tryCreate not to call

Basically. I briefly considered the idea of initializeContextBasedFunctionality() checking if the viewport has size below certain threshold and assuming there's no default viewport based on that, but that feels kinda brittle. So an enum flag, then. Default would be assuming a windowed context to avoid breaking existing code, in most cases this would be done inside various [Windowless]*Application implementations anyway.

TIL about nvJPEG, thanks! Too bad it's tied to CUDA... I wonder how would that compare to having ASTC-encoded images as the input instead, for example.

xqms · 2021-02-08T17:08:49Z

TIL about nvJPEG, thanks! Too bad it's tied to CUDA... I wonder how would that compare to having ASTC-encoded images as the input instead, for example.

Hmm, nice idea! I haven't looked into texture compression formats yet... I'd guess the decoder is much faster (nvJPEG needs around 10ms for a 4K image). But JPEG has much better image quality at the same size - I'm streaming 2x 4K with 45fps over GBit Ethernet, so I'm kind of bandwidth-sensitive ;)

mosra · 2021-02-08T17:15:33Z

Ah, then saving bandwidth in a GPU-GPU copy at the cost of needing more network bandwidth probably doesn't make sense :D

mosra · 2021-03-03T16:25:23Z

Implemented in 46df57f, which was a variation of #494.

mosra · 2021-10-12T11:00:30Z

As of 673022b, the GL::defaultFramebuffer global isn't touched during context creation anymore, the default framebuffer viewport is instead stored in the (thread-local) state tracker. Because of that, the Flag::Windowless isn't really needed anymore -- it only prevents a single glGet(GL_VIEWPORT) call, which is harmless -- and I'm wondering if:

I should remove it again to reduce the API complexity
or if I should leave it there as a placeholder for various potential new features, such as adding assertions for when the default framebuffer is touched on a windowless context (which is almost always an error)

Thoughts? :)

xqms · 2021-10-20T09:13:03Z

Hm, I guess leaving the flag there even if it has no impact right now is OK. But I don't have a strong opinion on this 😅

mosra · 2021-10-20T09:17:21Z

Given recent experiences where it took several days to figure out why nothing is rendered on iOS when multiple framebuffers are used (there's no default framebuffer and so there's no such thing as "binding back the framebuffer with ID 0"), I might eventually turn it into an assertion. Sounds like a good thing to do, after all.

mosra added this to the 2020.0b milestone Feb 8, 2021

xqms mentioned this issue Feb 10, 2021

GL::Context: pass InternalFlags as parameter #494

Closed

mosra added this to TODO in GL via automation Feb 27, 2021

mosra added the changelog mention added label Feb 27, 2021

mosra closed this as completed Mar 3, 2021

GL automation moved this from TODO to Done Mar 3, 2021

mosra reopened this Oct 12, 2021

GL automation moved this from Done to In Progress Oct 12, 2021

mosra changed the title ~~GL::defaultFramebuffer is neither multi-context nor multi-thread safe~~ GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless #493

GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless #493

xqms commented Feb 8, 2021

mosra commented Feb 8, 2021

xqms commented Feb 8, 2021 •

edited

mosra commented Feb 8, 2021

xqms commented Feb 8, 2021

mosra commented Feb 8, 2021

mosra commented Mar 3, 2021

mosra commented Oct 12, 2021

xqms commented Oct 20, 2021

mosra commented Oct 20, 2021

GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless #493

GL::defaultFramebuffer behavior with GL::Context::Configuration::Flag::Windowless #493

Comments

xqms commented Feb 8, 2021

mosra commented Feb 8, 2021

xqms commented Feb 8, 2021 • edited

mosra commented Feb 8, 2021

xqms commented Feb 8, 2021

mosra commented Feb 8, 2021

mosra commented Mar 3, 2021

mosra commented Oct 12, 2021

xqms commented Oct 20, 2021

mosra commented Oct 20, 2021

xqms commented Feb 8, 2021 •

edited