Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash creating and using OSLCompiler instances in mutliple threads #1427

Open
brechtvl opened this issue Oct 28, 2021 · 6 comments
Open

Crash creating and using OSLCompiler instances in mutliple threads #1427

brechtvl opened this issue Oct 28, 2021 · 6 comments

Comments

@brechtvl
Copy link
Contributor

Problem

In the Cycles renderer we are getting crashes doing different renders in different threads, each creating their own OSLCompiler instances. From #654 it appears that this is supposed to work.

I think I understand the cause and am willing to contribute a fix, but looking for advice on the best way to solve this.

What happens is that OSLCompiler ends up calling this in its destructor:

void
SymbolTable::delete_syms()
{
    for (auto& sym : m_allsyms)
        delete sym;
    m_allsyms.clear();
    TypeSpec::struct_list().clear();
}

This TypeSpec::struct_list() is a global variable used by OSLCompiler, and also ShadingSystem. So fully clearing it here is not safe.

I can think of a few solutions:

  1. Never clear this list and accept the memory usage. ShadingSystem does not clear it as far as I can tell, so you already have this when not compiling shaders.
  2. Add reference counting to only clear this list when there are zero classes using it. This also involves adding mutex protection in a bunch of places.
  3. Add a struct list per OSLCompiler and ShadingSystem, instead of making it global.

Solution (3) seems like the best to me, does that seem reasonable? Or is there some performance reason that this should be reused.

Versions

  • OSL branch/version: master branch, 6b6637e
  • OS: Ubuntu Linux
  • C++ compiler: Clang
@lgritz
Copy link
Collaborator

lgritz commented Oct 28, 2021

1 - seems reasonable to me; I don't imagine that there are enough unique struct definitions to worry about the memory "leak."

2 - seems like unnecessary work, and TypeSpec only stores the index (as a uint16!), not a pointer, so that would have to change to accommodate a ref-counted pointer.

3 - the problem here is that it's referenced by the TypeSpec (by index, as I said), and that's the whole reason why the struct table is a global singleton -- because the TypeSpec doesn't have a back-pointer to which OSLCompiler or ShadingSystem it belongs to. Having each TypeSpec be specific to one instance of a compiler or shading system doesn't seem fatal, but it does seem potentially error-prone for a renderer that's juggling multiple compilers, because the StructSpec that it refers to also contains a scope ID, which is specific to the compiler, ick.

I would be inclined to favor choice (1) as the short term solution to your dilemma and just get you unstuck. There's no down side except for a very small amount of leaked memory, I think. And then longer term, think about how we can disentangle it all without needing either ref-counted pointers or back-pointers to shading systems.

But before we move on...

Can you explain a bit about WHY you are using different compilers? I wonder if the correct long-term solution is just to fix OSLCompiler to be re-entrant for whatever operation you need to be doing in parallel.

@brechtvl
Copy link
Contributor Author

brechtvl commented Oct 28, 2021

We have a compiler as part of rendering, and multiple renders might be running in parallel (for example a viewport render, and a small preview render for a material). Within each render we might also be compiling multiple shaders in parallel. We also have a button to compile a shader from the text editor. Maybe in the future we use OSL for texture patterns for other parts of the application, like texture painting. Those might all run in parallel.

We're using multiple OSL compilers mainly because the code is simpler that way. We could have some mechanism share an OSLCompiler globally across the application, but it's extra work. The other reason is that OSLCompiler::compile is not re-entrant so it didn't work anyway.

We've been using multiple instances ofOSLCompiler and ShadingSystem in parallel for a long time, and it has been stable as far as I know. I think we just have not hit cases where multiple shaders are using structs much.

Making OSLCompiler re-entrant seems fine. But from my point of view as a user of the OSL API, it's easier to just create an OSLCompiler instance on the stack whenever I need it. So it's not clear to me that's actually worth spending time on.

I'm fine contributing a patch for (1).

@lgritz
Copy link
Collaborator

lgritz commented Aug 21, 2023

I'm rereading this all now, and I'm worried about solution (1) because it would also need additional mutex operations to make that singleton thread-safe, and that could be complicated.

Now I'm tempted to favor (3): make the struct_list be part of the SymbolTable (owned by the compiler) so it's owned and used by one and only one compiler at a time. So TypeSpec::structspec() would need a second parameter, a reference to the SymbolTable. That seems logical, I'm just not sure I know all the places that will in turn need to be modified. Maybe it's not so bad?

I'm tempted to give it a try and see how far I get. Unless somebody else has time and wants to try it.

@lgritz
Copy link
Collaborator

lgritz commented Sep 4, 2023

Update: I've been trying this and boy is it tricky. I've done like 3 rewrites so far and keep running into corners that are hard to back out of. It's kind of amazing how much rests on the assumption of a single structure table. But I'm still working at it.

@ZapAndersson
Copy link

You can also not compile twice with the same OSLCompiler object.
It seems like anything in an #include gets included twice when you do?
Do you want me to make a separate GitHub issue about this, or can we keep it here @lgritz ?

@ZapAndersson
Copy link

Update: I've been trying this and boy is it tricky. I've done like 3 rewrites so far and keep running into corners that are hard to back out of. It's kind of amazing how much rests on the assumption of a single structure table. But I'm still working at it.

We tested the 1 fix suggested by Brecht and it "seems to work". Isn't that the simplest?

Also, I am wondering, all these talk about a global struct list, does it mean that you can't have two different structs with the same name? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants