Skip to content

SolidQueue crashes if database connection is lost, and takes Puma with it. #512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
darinwilson opened this issue Feb 7, 2025 · 5 comments

Comments

@darinwilson
Copy link

darinwilson commented Feb 7, 2025

We're running SolidQueue as a Puma plugin on a Rails 8 app, as our job processing load is currently quite small.

We recently had an incident where the server running Puma temporarily lost the connection to Postgres. This caused SolidQueue to crash with this message:

PQconsumeInput() FATAL:  terminating connection due to administrator command (PG::
ConnectionBad)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

and this in turn took down Puma:

Detected Solid Queue has gone away, stopping Puma...
- Gracefully stopping, waiting for requests to finish

I was able to reproduce this locally by shutting down Postgres after starting Rails.

When running Rails without the SolidQueue Puma plugin, if the database goes away, Rails throws an error when it tries to do something with the database, but Puma stays up and the connections recover when the database comes back online.

If I run SolidQueue separately, via bin/jobs, it also crashes if the database goes away.

Obviously SolidQueue can't be expected to do much without a database, but would it be reasonable for it to behave as Rails does when the db goes offline, i.e. pause its activity and reconnect when the db is available again?

Thanks for all your work on this - SolidQueue has been a fantastic addition to Rails!

@rosa
Copy link
Member

rosa commented Feb 10, 2025

Oh, interesting. This happens for the supervisor only, if any of the supervised processes crashes, the supervisor makes sure a new one is started 🤔 I think the supervisor would need some kind of recovery mechanism if the DB fails, but it could also crash for other reasons. I think it makes sense to do this, but I won't have time in the next couple of months at least, so if someone wants to submit a PR doing this, I'll be happy to review.

@darinwilson
Copy link
Author

Thanks for the feedback - that's good to know that it must be something at the supervisor level.

I'll dig into the code a bit, and see if I can find a solution that might work.

@asgeo1
Copy link

asgeo1 commented Apr 24, 2025

Had this same issue myself last week, where DB was briefly uncontactable, causing SolidQueue to shut down Puma and Rails.

What are the chances of the PR getting merged soon?

@rosa
Copy link
Member

rosa commented Apr 24, 2025

Oh, I completely forgot about this one, sorry! I'll take a look at the PR.

@rosa
Copy link
Member

rosa commented Apr 24, 2025

I'm struggling to reproduce this locally so I can test the PR and an alternative approach. In all cases both Puma and Solid Queue remain running 😕 This is quite strange. @darinwilson, you said:

I was able to reproduce this locally by shutting down Postgres after starting Rails.

Are you running different PostgreSQL instances for your app and for Solid Queue? I haven't managed to reproduce with a single instance (and multiple DBs, basically the default you get in this repo). I've also tried different things with MySQL: dropping the DB, stopping the server...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants