Rolling Restarts

feature-request

One feature I'd really love is the ability to execute a restart as a "rolling" restart. Right now, manual restarts (hitting the button, applying a config change, etc) stop all instances at once producing app downtime. And without a defined health check policy there's probably no way around that. But when a health check policy IS defined, I would love to be able to set the default restart method to a rolling restart where each subsequent instance restart does not begin until the previous instance reaches healthy status. That functionality would be incredibly valuable in such a wide variety of situations...

This would also be really useful to me. I hadn't considered using the health check, that's a nice approach :+1:.
Posted by Thomas van der Pol
Jan 9 2026
Thanks Casey for the detail and the request, and Thomas for adding support here. You're both internally tagged for updates on this as it progresses.

Casey - the healthcheck failing should cause the individual instance to restart. Are you saying that healthchecks where multiple instances fail would trigger a synchronous restart queue or for that speicifc case I don't know that I understand the use case.
Posted by Christopher Aubuchon
Jan 9 2026
Hey Chris - the restart behavior when a health check fails seems fine to me. Let's ponder a different scenario - for example, say I need to update an environment variable on a container... When I save the update, Cycle will restart the instances to pick up the change. But in a production situation, I need to ensure that there is a functional application in a health state at all times. So my "ideal" would be for Cycle to identify that there is a defined health check, and when that is the case restart only a single instance, wait for that instance to return to a healthy state, then restart the next instance and repeat. This concept of a "rolling restart" ensures that the environment can be updated while remaining healthy.

A good example of when you might need to do this would be when an error is occurring and you are trying to rapidly make changes to resolve the issue. Going through a full build and deployment loop for every incremental modification (enabling more detailed logging, making an experimental configuration change, etc) massively slows the process vs being able to make a quick change and a quick restart. I traditionally try to expose a lot of control elements as environment variables on an app so that I have a lot of flexibility to explore and resolve issues, but right now in Production I cannot use any of those tools because saving changes causes downtime when all the instances restart concurrently.
Posted by Casey Dement
Jan 10 2026
That makes sense and would probably be helpful in a scenario where the service is only partially degraded but mostly working. You want to be able to resolve the issue for the partially degraded portion without downstream users losing access - especially if that degraded portion of the service is small.
Posted by Christopher Aubuchon
Jan 10 2026
Yep - exactly. The most common scenario is where a bug is happening and we're trying to identify the source, so we would temporarily escalate the logging level for that part of the application (normally WARN in Production so highly performant but not very verbose) to try to get more information regarding the issue. A quick reconfigure gets you verbose logging in a few seconds, then you execute the failing event to capture the log output and reconfigure back to WARN (and then go do some forensics on the log output).

Having to push builds to accomplish that (factoring for test runs, code reviews, etc) really blows up the scope and slows you down.
Posted by Casey Dement
Jan 10 2026
This is an awesome idea. (so I get tagged, too)
Posted by Michael Third
Jan 12 2026
I started on this functionality today, it'll be set using <container>.config.deploy.ready_check (similar to health_check). More info soon
Posted by Jake Warner
Jan 12 2026
❤️❤️❤️❤️❤️
Posted by Casey Dement
Jan 12 2026
Fantastic; keep it coming team.
Posted by Jeff Klink
Jan 12 2026
Excellent news!
Posted by Thomas van der Pol
Jan 13 2026

🍪 Help Us Improve Our Site