Latest Threads

Page 0
announcement

June 9-13th Cloud Hyperscalar Outages

Hey Everyone!

Many of you have reported hearing that different providers are being affected by outages.

So far we've heard reports of:

  • Google Cloud Platform
  • Amazon Web Services
  • Cloudflare

These have been corroborated by our team via down detector and other avenues, but other than things like Google Meets not loading we are not hearing of major interruptions to compute nodes running Cycle.

If you are having an issue with your compute, definitely let us know as we want to share that information within our ecosystem as much as possible and help each other.

If you go through this week and haven't even had to think about the word outage, consider posting something on LinkedIN about it and tag our official page.

avatar
platform
0
announcement

EC Key Compatibility — Named Curves vs Explicit Parameters

Note before reading. If you're using a resource that uses SSH auth, the following will not retroactively affect anything. If a key has been working it will continue to work.

The official Cycle documentation suggests generating SSH keys using the following pattern.

ssh-keygen -t ecdsa -b 256 -m PEM -f your-filename.pem

This pattern will generate an ecdsa key, which we've recently found can cause compatibility issues with the Golang x509 package IF the ssh backend is using LibreSSL instead of OpenSSL.

LibreSSL is the default library used by ssh/ssh-keygen on Mac.

The Issue

  • OpenSSL-based keygens (most Linux) use named curves (like prime256v1) — minimal, portable, and widely supported.
  • LibreSSL-based keygens (macOS) default to explicit parameters — verbose, less compatible.

While the formats are functionally equivalent, they're not always compatible.

Checking Your Key

If you've created a key and added it to a stack or image source and are getting an x509 error, use the following pattern to check your key.

openssl ec -in YOURKEYFILENAME -text -noout

And check to see if there is a line: ASN1 OID: prime256v1

If you do, the key should work, if its not ping us on Slack or leave a comment here. Its likely something else.

If you do not see that line and want to convert the private key to named curve from explicit parameters use the following pattern:

openssl ec -in YOURKEYFILENAME -out NEWFILENAME -param_enc named_curve

The other option is to use OpenSSL directly on Mac or generate the keys from within a container.

avatar
platform
0
announcement

Upcoming Monitoring Improvements: Streamlined Log Drain Configuration

Hey everyone,

The next series of updates are all about improving monitoring across the platform.

To kick things off, we're cleaning up how log drains work. Starting with the next update, log drain configuration is moving from the container level to the environment level.

This means instead of setting log drain up for each container, you'll be able to drop in a single log destination for the entire environment.

Impact:

This will require everyone using log drains to update their config after the release. And if you've been using a custom URL format to tell downstream systems which container is sending logs, you'll need to rework that part a bit.

While this change is minor, it will help us lay the foundation for much stronger monitoring features coming soon.

avatar
platform
0
random

Gotcha with redis and IPv6

Just a heads up for those who might run into the same issue:

I have an environment with a redis container named redis. In another container in the same environment, I had a nodejs server trying to connect to the redis server via the ioredis library. Basically I had a file like

import { Redis } from "ioredis";
export const redis = new Redis("redis://redis:6379");

On server start, I was seeing a stream of errors along the lines of

[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26)
    at GetAddrInfoReqWrap.callbackTrampoline (node:internal/async_hooks:130:17)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26)
    at GetAddrInfoReqWrap.callbackTrampoline (node:internal/async_hooks:130:17)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26)
    at GetAddrInfoReqWrap.callbackTrampoline (node:internal/async_hooks:130:17)

This was very strange as the two containers were on the same network and redis should have been (and turns out, was) a valid hostname. Furthermore I was able to run via SSH on the node server instance:

# redis-cli -h redis PING
PONG

So the hostname was valid. What was going on?

After some discussion with the cycle team (thanks!!), it turns out the issue is that the internal networking within the environment is all done via IPv6, and by default, the ioredis client doesn't allow DNS resolution into IPv6. For whatever reason 🤷🏻‍♂️. But the fix was very simple. This works:

import { Redis } from "ioredis";
export const redis = new Redis({ host: "redis", family: 6 });

Explicitly tell the client to use IPv6, and it will.

Perhaps this will come in handy for someone later 😄 again, thanks to the cycle team for finding the solution to this!

avatar
3
announcement

RELEASE: v2025.05.21.02

This release (2025.05.21.02) is a smaller quality of life improvement patch.

It's mainly focused on:

  • Billing access improvements (download invoice via email without logging in)
  • Container networking visibility
  • a fix for custom DNS resolvers

Billing Access Improvements

Sometimes invoices need to be collected by users who do not want a Cycle login. We've added the ability to download the invoice directly from the email without the need for the user to log in to do so!

Container Instance Attached Networks

Container instance networking has always been front and center on the containers modal and corresponding instances page, but there were some variables not shown when it came to SDN networks. You'll see that we've added a section under the instance console that shows all attached networks for the container instance.

Custom Resolvers

A bug was found that would cause custom DNS resolvers to only work with CNAME records. This has been resolved.

VPN Configs Over IPv6

Our team has always been a proponent of IPv6 adoption and most of the platform is built with an IPv6 native attitude (where possible). There was a case where, if a load balancer only had IPv6 enabled, VPN files could fail to download. So we added some new functionality that allows users to download the VPN config files through load balancers that only have IPv6 enabled. One more step in the right direction!

avatar
platform
0
feature-request

Show All IP Addresses in the portal

I have found that 50% of the time I connect to the container SSH endpoint it is to find an IP address on one of the interfaces. Most of my containers don't have the ip command so I have to install that, too. It would be great if we could see all interface IP assignments directly in the portal.

avatar
3
random

Debian slim-bookworm - intermittent DNS failures

Just a friendly note to the community after talking it through with the support guys - slim-bookworm is having intermittent DNS resolver issues. We've narrowed one of our stack issues down to the actual image itself having issues and wanted to warn the community to save yourself a bit of aggravation. If you were debating moving everything to Alpine; let this be a final kick in that direction.

avatar
1
feature-request

Built in HTTP Health Check

Our containers are generally built with minimal dependencies so as to minimize the attack surface. This means they don't normally have curl/wget/netcat. There is a funky shell trick, but it's .... ugly. Would it be possible to add a cycle-native HTTP/HTTPS health check?

Ugly Script

exec 3<>/dev/tcp/localhost/5000 && \
  echo -e "GET /_ah HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n" >&3 && \
  cat <&3 | grep 200
avatar
5
question

CSPs, Clusters, Servers and Instances

Hey team! A couple of questions about Cycle. We currently had an instance that we were testing in our Staging environment. For some reason that we are trying to figure out, it's running on a 100% CPU and maybe 100% RAM also.

So one of our servers is down in our cluster.

Questions about compute:

  1. If I restart the server, does the Cluster rebalances itself?
  2. Is there a difference of restarting the server from Google Cloud and from Cycle?
  3. "Restart Compute" would be killing all instances, by saying "minimal downtime" this would be related to how instances are being distributed and/or replicated inside the cluster?
  4. "Restart Compute Spawner" would be kill instances and networking, would this mean then restarting the whole cycle underlying configurations supporting the cluster, am I right?
  5. "Restart Server" does rebalance the cluster after it goes online again? Does
  6. Does those options work even if the server is "Offline" and unresponsive while using 100% of it resources?

Questions about instances:

  1. Why are the instances "live" even if a member of the cluster becomes unresponsive?
  2. If the container have a health check, and it becomes unresponsive, does it spawn a new instance in another server or it does on the same one?
  3. Are there best practices to monitor our current stacks? Like CPU / RAM utilization?
  4. Are there any way to retrieve logs or do you suggest using log aggregators?
avatar
1
discussion

Race Conditions in glibc (Debian based containers)

Some of you may have run into DNS issues when using a Debian based container.

This discussion is a place to discuss

  • Race conditions or other issues found in glibc (technical details)
  • Different approaches to mitigation (including using Alpine)
  • Reasons for avoiding Debian in the first place

Per my research:

From inside a container on Cycle I ran tcpdump -i any port 53 -vvv. This gave me the following, interesting information.

  • Every DNS query for someother.domain.com resulted in both A and AAAA requests being sent in parallel.
  • The resolver returned correct responses (CNAME + A/AAAA records).
  • Despite this, the container still saw intermittent failures.

So at this point I knew, the internal resolver was working correctly and that the failure was happening inside the container's DNS client logic.

So I dove deeper into some research on glibc and specifically getaddrinfo() since it handles DNS resolution and found that:

  • It does in fact send A and AAAA queries simultaneously.
  • If AAAA returns first (and fails with NXDOMAIN, SERVFAIL, or is empty), glibc may prematurely fail the entire resolution, even if a valid A record arrives milliseconds later.

And the second part there, where it prematurely fails seems to be the major issue.

Luckily, the Alpine resolver musl libc performs the same actions but serially and predictably, which has so far eliminated any occurrence of this error. So if you're in the position to use Alpine, its more reliable (and generally more secure).

Looking forward to hearing some insights and opinions here!

avatar
platform
2
announcement

NEW RELEASE: v2025.04.24.02

Hey everyone! We're trying something new for this release, by creating a place for discussion around updates we push out for Cycle. It's a more discussion oriented version of our changelog, where we can engage with all of you about what's new.

This release (2025.04.24.02) is a huge release that has been in the works for nearly a month now. It brings with it a lot of stability and performance improvements, but also tons of feature requests we've received from all of you.

New Stuff

We've added a few new goodies to the platform based on your feedback.

A New Network Telemetry Graph

We've added a new graph to the server dashboard, that shows network traffic transmission on a per network interface basis. It's now possible to see data transmitted over the private network, public network, or even SDNs.

You Can Now Restart a Container

Finally, right? Well, you could always stop and start them, but there was one major issue with this method - Cycle would restart all of them at once...

With the new "restart" functionality, the platform will respect the stagger set in the container's configuration, preventing downtime while your restart is in progress.

Instance State Uncertainty

Instance states have a 'normal state', such as running, but also have health checks, migration state, traffic draining, and more. One thing we've heard from our users is that sometimes their instance state will still say 'running', but the server that instance was running on went offline. What gives?

Well, the TL;DR is that we don't actually know that it went offline. Cycle relies on checkins from the underlying host to know what state that instance is in, and if it misses a checkin, or the network drops, the instance may still be running, even if Cycle can't prove it.

This led to some uncertainty, but we didn't want to alert people that an instance was offline just because of a network hiccup. In this release, we've tackled the issue by introducing an 'uncertainty' marker on top of container instances where the underlying host has missed a couple checkins.

Now, you'll be alerted that something may be off about an instance even if we're not sure what state it might be in anymore. Here's what that looks like:

instance state unknown

Server Nicknames

Last but certainly not least, we've added the ability to set a custom name on your servers, that will be visible throughout the interface. It will appear anywhere a hostname previously did. If no nickname is set, you'll still see the same hostname from before.

(we wouldn't want to hide Michael T's latest server hostname).

Improvements

Along with the new features, we've improved a handful of things as well.

Source IP Routing

We've introduced a new load balancer routing mode, dubbed Source IP. this mode will attempt to provide sticky sessions for all requests coming from a specific IP address.

Better SFTP Lockdown Intelligence

Cycle has had SFTP lockdown intelligence for over a year now, but some clients would open up dozens of new connections when navigating or transferring files, possibly for better throughput. These clients would quickly put the SFTP connection into lockdown, blocking all new connections.

In this release, we've made it smarter - lockdown will not count new connections from a recently authenticated IP address toward the lockdown criteria. Clients can be greedy with new connections, while bad actors still get locked out.

Scoped Variable Files: Users, Groups, and Permissions

We've added support for a UID, GID, and file permissions to be set on scoped variable files that are injected into the container. Some applications require specific permissions on files to play nice, and this alleviates the need for any funky workarounds that were previously required.

Load Balancer IP Display

Prior to this release, the environment dashboard would show the CIDR (the entire address space) allocated to a load balancer instance. While useful in some circumstances, most people (ourselves included) just wanted to see the specific IP attached to that load balancer instance. Now, when you go to an environment dashboard, you'll see the correct IP.


There were quite a few other minor tweaks and bug fixes, along with a LOT of work on something we'll be revealing very soon. Leave a comment with your thoughts on the latest update, questions you may have, or any issues you run into. (You can also message any of our team in slack)

Our next release will be historic...you won't want to miss it.

avatar
platform
0
feature-request

Deployment Scoped Variables

One of the deployment patterns we have been using from K8S is to generate unique configmaps per deployment of a service so that we can version variables with the code (but outside of the image). We have been able to achieve that using the existing Stack spec (nice work on this, btw), but it would be great if we could clean them up when the deployments get removed in the pipeline step.

avatar
1
random

Should I remove this from quarantine?

What's the worst that could happen?

avatar
2
feature-request

Specify volume filesystem

Microsoft recommends the XFS filesystem for SQL Server on Linux data volumes. Would it be possible to allow us to specify which filesystem should be used when provisioning volumes?

From https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-performance-best-practices?view=sql-server-ver16:

SQL Server supports both ext4 and XFS filesystems to host the database, transaction logs, and additional files such as checkpoint files for in-memory OLTP in SQL Server. Microsoft recommends using XFS filesystem for hosting the SQL Server data and transaction log files.

avatar
1
feature-request

Base image monitoring breakout

The current server view depicts base storage usage and trending over time; however finding out what's consuming that space currently isn't possible. As you reach your threshold, there are no granular views to figure out what might be consuming the space.

Since base storage expansion is possible but decreasing it is not, we're thinking that a way to determine if we need to expand it is necessary to make decisions on whether we have a machine with runaways storage logs (consuming base storage) or too large of images for a given machine.

avatar
1
feature-request

Having a re-run button on pipelines would be helpful

When pipelines make use of parameters it is sometime cumbersome to fill in all the info to re-run a failed pipeline.

This is especially true when debugging pipelines that were triggered by automatic processes that make use of parameters to identify builds.

avatar
1
feature-request

Server Storage Provisioning View

For server storage, it would be nice to see the potentially allocated volumes (thin provisioned) that exists for container volumes which reside on that server. If we have an instance A that has thin-provisioned a 30GB disk, and another instance B with 100GB of thin-provisioned space, then we could potentially have overprovisiond according to what physically exists on disk. This is especially crucial if for some reason the workload is placed on a server with other running volumes that we are unaware are taking up chunks of the disk that are unussed.

While thin-provisioning is the way to go, it presents problems if not visible when the disk begins to quickly be consumed. This can cause other resources to starve or the entire machine to go unresponsive.

avatar
1
question

Load Balancer - Public IP Addresses

Hey all, we were wondering whether every load balancer instance gets a separate public IP address. If we have multiple LB instances, do they all have separate public IPs, or are all instances available via the same IP address?

avatar
2
feature-request

Slack feed for status updates

Hey all. We've got a slack channel that has feeds from our main providers in it where any service degradation or outages get sent, like in the attached image.

Given Cycle's a pretty load bearing part of our infrastructure, I'd appreciate a slack feed like the ones we have for other several other providers, as this channel is my first port of call when something weird is happening.

avatar
1
feature-request

Add cluster ID/name to more data sources

In DNS -> Zones and Cluster Drain JSON data ( as well as other places) it would be super helpful to have cluster ID information. Since environments can be replicated, it is important for the API to carry the cluster ID / name to differentiate environments.

avatar
1
v2025.04.18.01 © 2024 Petrichor Holdings, Inc.

We use cookies to enhance your experience. You can manage your preferences below.