-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job worker randomly restarts? #213
Comments
Hey @krschacht, how odd! The restart you see in the logs is the Supervisor realising one of the workers has terminated, and replacing it with another one. solid_queue/lib/solid_queue/supervisor.rb Lines 150 to 151 in 1540864
"Restarting" is the wrong word there, TBH, because it's actually replacing a terminated fork. I'll fix that in #208. In any case, what this indicates is that something outside Solid Queue is killing that worker 🤔 The lack of status code in your logs:
makes me think this might be a Is it possible you're running something that kills processes that go over certain time or memory usage? |
Hi @rosa, thank you so much for the quick response! I recently moved from Heroku over to Render and I don't know this platform nearly as well, but I think your hunch is right that the app got killed because it exceed memory usage. I have a ticket into Render support to confirm this but that's my best read of my metrics. And indeed, if that's the case, then it looks like this issue is on me and has nothing to do with SolidQueue. I think it's safe to close out this Issue unless I get new clues that suggest otherwise. While you're fixing the "Restarting ..." error message, it might be worth even adding a suggested cause and maybe add a conditional on blank |
Going to close this one as I already merged #208, with improved logging so this is not as obscure. Thanks again for raising this one, @krschacht! |
I've had SolidQueue setup and working for awhile on a low traffic app. But I sporadically get a job that takes a long time to complete (I just had one that took 7 minutes). When I check the logs, it looks like the job starts, then the worker gets killed and takes a bit to restart. After it restarts, it picks up the job again and completes it quickly (the usual 10 seconds). I'm struggling to figure out the root cause of this since my UI needs this job to update things so the user is just sitting there.
Notably, the pattern is the sam: My app is warmed up and running. I'm taking a series of user actions on the front-end which each trigger a job to complete. I'm having success many times in a row, which I can tell because the UI quickly updates, and then randomly the UI stops updating. I check the logs and see it's doing this weird restart thing.
My database is postgres. I'm deployed to Render. I'm running Solid Queue's supervisor together with Puma (using
plugin :solid_queue
).Here is the log output for a recent failure. I tried to delete lines that were definitely unrelated. Also, any lines beginning with "### " are simple
puts
messages from within my job:The text was updated successfully, but these errors were encountered: