feature-request

Liveness/Readiness checks

Hey team, I'd love to see readiness checks added to stack! While the LBs do a good job of assessing latency for packets; they truly can't tell if a a container is in trouble and 'just needs a moment to process/recover'. A readiness check is a method to tell the deployment manager (don't reboot me, but I need a second, stop talking to me). The readiness check is separate from the health check (which is really a liveness check) - as it purely indicates if the instance can serve traffic at the moment.

We all need a moment to compose ourselves sometimes, so do our instances.. Give them a fighting chance!

avatar
3
  • Seconded - this would be very useful 👍

    avatar
  • Thanks Jeff and Thomas. I have this tagged, definitely on our radar.

    Jeff - in your case above are you just looking for a mechanic specifically that says "dont send any traffic to this container instance" or is there more to it?

    What would you visualize being the best case if theres a single instance or if all instances become unready at once?

    Who would dictate the readiness is that something that emits from the application itself?

    avatar
    platform
  • Liveness & Readiness Checks (with Rolling Restarts)

    Ideally, this should work alongside rolling restarts as well:
    https://cycle.io/community/threads/696096c156fd5c2c2fdcf882/rolling-restarts

    Here are the key points:

    Why Liveness and Readiness Matter

    Instance startup times aren’t always predictable.
    Liveness and readiness checks help by separating two concerns:

    • Readiness: Can the instance do useful work?
      Example: disk is available, dependencies are reachable, core functions succeed.
    • Liveness: Is the instance healthy enough to keep serving traffic?

    Hard restarts are painful with many containers.
    When restarting an instance (especially one running ~8 containers), a hard stop causes unnecessary disruption.
    Rolling restarts help reduce impact, and liveness checks support this by keeping unhealthy instances from flapping or taking traffic too early.

    Temporary overload shouldn’t trigger termination.
    Sometimes the system is simply under-provisioned or scaling up and briefly struggling to keep up.
    A liveness check can effectively say: “Don’t kill me — I’m okay, I just need a moment.”

    In practice, I’d rather have 2,000 clients served successfully and 50 unable to connect temporarily,
    than 2,050 clients experience an outage because an instance keeps restarting repeatedly and gets saturated again and again.

    Operational Benefits

    Safer debugging under load
    Liveness checks let you debug an instance that’s struggling while ensuring it does not take additional traffic during that time.

    A strong complement to readiness
    Readiness determines whether traffic should be routed to the instance.
    Liveness determines whether the instance should continue running at all.
    Together, they provide better stability during restarts and scaling events.

    avatar
v2025.12.19.01 © 2024 Petrichor Holdings, Inc.

🍪 Help Us Improve Our Site

We use first-party cookies to keep the site fast and secure, see which pages need improved, and remember little things to make your experience better. For more information, read our Privacy Policy.