-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Ghost sample crashes with Segmentation fault from time to time #3484
Comments
Here is the stack trace from
|
The |
BTW, the correct English word is "examples" not "samples". |
HMm. OK, I can reproduce the crash, but somehow you disabled the normal crash-processing code? I'm not able to debug this in the usual way, something is preventing a normal crash dump... |
SO, in gdb:
which is confusing. I don't understand where the backtrace printing is coming from. Who is priting that? |
Hmm, I see this:
So |
Assorted checks in various places show consistently, reproducibly, that both TV and AV use-counts become too low, leading to access of deleted memory and other crazy symptoms. How, exactly that comes to be remains mysterious to me, but is presumably because the |
Found it. But only after 8+ hours of difficult debugging. Someone .. not sure who -- either this example demo, or ghost itself, has issued the string What is happening is that Here's how I debugged this, in case you ever want to do the same thing. First, notice that the use-count of So edit
and the custom routine
We now have a stack trace of whodunnit:
Why the heck would
OK, right. Someone types in |
Anyway .. I'm done debugging. I leave it up to you to figure out who is telling guile to quit and force-run exit handlers (thus shutting down not just truth values, but all of the atomspace) while the atomspace is still in use. |
I believe that the problem is the following. (ghost-run) calls (psi-run ghost-component) and it runs a loop by call-with-new-thread to spawn a new thread. (use-modules
(opencog)
(opencog nlp)
(opencog nlp relex2logic)
(opencog openpsi)
(opencog ghost)
(opencog ghost procedures)
)
(ghost-parse "
r: (where [do does] _* work) '_0 work in SomeCompany company.
")
(ghost-run) There is the method (ghost-halt) which aim is to stop the running loop. It actually sets the false value to the (Predicate "run-loop") node. However, ghost-halt has the same lack. It sets the flag to terminate the loop and after that guile can frees Atomspace while the run-loop is still running. When the run-loop tries to read a value from the predicate node it is possible that it does not exists anymore. I can implement a solution there psi-halt sets the false value to the predicate node and then joins to the loop thread to wait it is finished. (define run #t)
(define (repl)
(display "run-loop\n")
(if run
(begin
(sleep 1)
(repl))
(display "Exit!\n")))
(define repl-thread
(call-with-new-thread
(lambda () (repl))))
(sleep 3)
(set! run #f)
(join-thread repl-thread 5)
(sleep 3) The solution requires that (ghost-halt) is always called before exiting from guile. |
I'm guessing that ghost starts one or more threads. So, The problem with exit-hooks is that they are run in arbitrary order. There is no particular way to make one exit hook run before another; there's no priority ordering. To be clear: even if you did use |
@stellarspot, just want to make sure, are you getting this segfault only when exiting Guile when the loop was still running? Or does it crash even without exiting Guile? |
I prepared the suggested fix #3487 It also requires that a user always needs to call psi-halt for each created by psi-run function thread |
I have not seen the crash during working in Guile, only when it is exiting. |
I run the code sample-ghost-notebook.scm and it crashes with Segmentation fault error sometimes.
Here is the result of call
catchsegv guile samples/ghost/sample-ghost-notebook.scm
The text was updated successfully, but these errors were encountered: