CSPs, Clusters, Servers and Instances

question

CSPs, Clusters, Servers and Instances

Hey team! A couple of questions about Cycle. We currently had an instance that we were testing in our Staging environment. For some reason that we are trying to figure out, it's running on a 100% CPU and maybe 100% RAM also.

So one of our servers is down in our cluster.

Questions about compute:

If I restart the server, does the Cluster rebalances itself?
Is there a difference of restarting the server from Google Cloud and from Cycle?
"Restart Compute" would be killing all instances, by saying "minimal downtime" this would be related to how instances are being distributed and/or replicated inside the cluster?
"Restart Compute Spawner" would be kill instances and networking, would this mean then restarting the whole cycle underlying configurations supporting the cluster, am I right?
"Restart Server" does rebalance the cluster after it goes online again? Does
Does those options work even if the server is "Offline" and unresponsive while using 100% of it resources?

Questions about instances:

Why are the instances "live" even if a member of the cluster becomes unresponsive?
If the container have a health check, and it becomes unresponsive, does it spawn a new instance in another server or it does on the same one?
Are there best practices to monitor our current stacks? Like CPU / RAM utilization?
Are there any way to retrieve logs or do you suggest using log aggregators?

Server Control & Restart Behavior

1. If I restart the server, does the cluster rebalance itself?

Not today. Cycle follows an active-active approach, where you run multiple instances across multiple regions or providers. If something goes wrong, other healthy instances pick up the slack.

A new load balancer update is coming soon that will:
- Automatically drop unresponsive instances from the pool
- Re-add them when they come back online
Cycle intentionally avoids "reactive failover" because bringing up instances after failure introduces risk. Instead, we assume healthy copies are already live and ready to scale.

2. Is there a difference between restarting the server from Google Cloud and from Cycle?

Yes — critical difference.
- You should only restart servers through Cycle.
- Restarting from the cloud provider side causes Cycle to interpret it as a network failure or a crash.
- May result in inconsistencies or false failure states in Cycle.
Always use the Cycle interface or API to manage server lifecycle.

3. What does “Restart Compute” actually do?

It restarts Cycle's compute process (used to manage containers).
- Does NOT restart or stop containers themselves.
- Zero downtime for applications.
The compute process is logically separate from the containers running on the server.

4. What about “Restart Compute Spawner”?

This does result in downtime, but only affects one server. It restarts critical runtime components like:
- Network bridges
- Internal cluster plumbing
- Should be used rarely and with caution.
5. Will restarting a server trigger cluster rebalancing?

Same behavior as question 1 — no rebalance, but Cycle does automatically shut down the instances on the server before restart. This prevents false positives.

6. Will these restart options work even if the server is offline or stuck using 100% resources?
- Possibly. Newer Cycle agents reserve a small amount of memory for recovery tasks.
- If the agent is too old or the server is completely unresponsive, restart commands might fail.
- Always ensure your servers are up to date for the best recovery support.
Instance Behavior & Health Checks

1. Why are instances marked "live" even if their node is unresponsive?

Cycle can’t assume an instance is dead just because the host is unreachable — it could be a temporary network issue.
- Instead of marking them as failed, Cycle will soon show a "?" icon over these containers.
- This signifies "unknown state" and is intended to help you debug issues more clearly.
2. If a container has a health check and fails, does Cycle restart it on a new server?

No automatic re-creation elsewhere. Instead:
- You should run multiple copies across multiple servers.
- Health checks inform you of failures but don’t auto-evacuate to other nodes.
- Active-active is preferred, not reactive failover.
Monitoring & Logging

1. How do we monitor stacks (CPU/RAM/etc)?

Cycle tracks telemetry at:
- Container level
- Server level
- Cluster level
But the current UI doesn’t expose everything yet. Improvements are in progress.

In the meantime:
- You can expose /proc to containers
- Install agents like Datadog or other tools to monitor resource usage
2. How should we handle logs?

Basic logging is available:
- View logs under container panel > logs
- Run a search with an empty query to get full logs
You can also:
- Send logs to external aggregators
- Configure this under Container Config > Integrations > External Logging
Cycle plans to significantly improve native logging & monitoring soon.
Posted by Christopher Aubuchon
May 8 2025

Server Control & Restart Behavior

1. If I restart the server, does the cluster rebalance itself?

2. Is there a difference between restarting the server from Google Cloud and from Cycle?

3. What does “Restart Compute” actually do?

4. What about “Restart Compute Spawner”?

5. Will restarting a server trigger cluster rebalancing?

6. Will these restart options work even if the server is offline or stuck using 100% resources?

Instance Behavior & Health Checks

1. Why are instances marked "live" even if their node is unresponsive?

2. If a container has a health check and fails, does Cycle restart it on a new server?

Monitoring & Logging

1. How do we monitor stacks (CPU/RAM/etc)?

2. How should we handle logs?