Liveness & Readiness Checks (with Rolling Restarts)
Ideally, this should work alongside rolling restarts as well:
https://cycle.io/community/threads/696096c156fd5c2c2fdcf882/rolling-restarts
Here are the key points:
Why Liveness and Readiness Matter
Instance startup times aren’t always predictable.
Liveness and readiness checks help by separating two concerns:
- Readiness: Can the instance do useful work?
Example: disk is available, dependencies are reachable, core functions succeed.
- Liveness: Is the instance healthy enough to keep serving traffic?
Hard restarts are painful with many containers.
When restarting an instance (especially one running ~8 containers), a hard stop causes unnecessary disruption.
Rolling restarts help reduce impact, and liveness checks support this by keeping unhealthy instances from flapping or taking traffic too early.
Temporary overload shouldn’t trigger termination.
Sometimes the system is simply under-provisioned or scaling up and briefly struggling to keep up.
A liveness check can effectively say: “Don ’t kill me — I’m okay, I just need a moment.”
In practice, I’d rather have 2,000 clients served successfully and 50 unable to connect temporarily,
than 2,050 clients experience an outage because an instance keeps restarting repeatedly and gets saturated again and again.
Operational Benefits
Safer debugging under load
Liveness checks let you debug an instance that’s struggling while ensuring it does not take additional traffic during that time.
A strong complement to readiness
Readiness determines whether traffic should be routed to the instance.
Liveness determines whether the instance should continue running at all.
Together, they provide better stability during restarts and scaling events.