Dec 19th, 2024 - Chris Aubuchon, Head of Customer Success

Distributed WordPress on Cycle and GCP

Recently I've had the great privilege of working on creating a distributed WordPress deployment that leverages GCP compute and services alongside containers running on the Cycle platform.

This blog dives into a bit of the history of why WordPress is difficult to deploy in a distributed way, how we approached it, some really interesting things we found, and finally, the solution we put in place.

History and Challenges

WordPress is a legacy application that still quietly dominates the web, powering over 450 million websites globally.

Its monolithic architecture makes it incredibly easy to stand up. The basic WordPress deployment includes an instance of WordPress and a relational database such as MySQL or Postgres.

Vertical scaling by increasing the amount of resources available to a single install is somewhat trivial, but does not solve the problem of availability. In modern cloud environments, it's almost universally required that the application be highly available and able to be recovered from disaster. So the goal with distributed WordPress is to achieve an architecture that will allow for seamless scaling of the application across regions or even globally.

WordPress doesn't make that type of horizontal deployment and scaling easy.

Here are some of the pieces that get in the way:

  1. Stateful storage for uploads and plugins.
  2. Relational database scaling and management.
  3. Tightly coupled, monolithic architecture.
  4. Hard-coded configurations.

The database piece isn't as difficult as it once was. As long as you're willing to pay for something like RDS or GCP's Cloud SQL, it shouldn't be difficult to just offload that entire piece to the cloud provider.

WordPress stores uploaded media files, themes, and plugins directly to the file system, typically within the wp-content directory. In a distributed setup, with multiple instances spread across geographically diverse compute nodes, ensuring file changes are effectively synced to each node can become tricky.

One of the things that's really interesting about WordPress is how it handles domains. You can't just point a domain to a running instance of WordPress and hope to resolve it. You have to define both a site_url and home. These values are configuration options that tell WordPress about the domain that will be pointing to the application.

In many cases, these values are the same. However, if they do not match the incoming domain, the user is redirected to the saved domain which can cause a whole bucket of issues. The really striking thing here is that the value for both home and site_url is stored in the database. While it can also, optionally, live in wp-config.php, once set to the database, the config file alone does not seem to override it.

So with tightly coupled configurations that make updates less straightforward, stateful data that has to be offloaded from the main application, and relational databases that can be clunky, how should we approach deploying WordPress in a modern way?

Low Hanging Fruit

  • Solve the database piece using a managed database solution, in this case Cloud SQL. Deployed within the same VPC, globally accessible to the containers deployed through Cycle to the compute nodes.
  • Offload the media files using WP Offload Media plugin. Pairing this with GCP's object storage and the CDN. Faster load times, reduced server load, and much higher reliability.
  • Use GCP's Filestore to mount a net filesystem to each server allowing the WordPress instances to share a single filesystem.

Deploying the WordPress instances on Cycle, the container can directly access out of band networks so there is no need to add a public IP address to the Cloud SQL instance, an immediate win for security. While GCP does support VPC peering, adding the database into the Cycle VPC can be done as long as the service doesn't consume the same network space that the compute nodes pull from. In this case using a 10. address space worked perfectly.

Offloading the media was all done through the WordPress backend and the setup to GCP's object storage and CDN was relatively easy. The big wins here were:

  • Faster load times - serving the media files through the CDN gets the content as close to the user device as possible.
  • Reduced server load - the WordPress instances don't have to actually serve any of the media files back to the requesting device.
  • Reliability - the object storage and CDN are both automatically backed up and replicated across regions.

For the net filesystem, attaching the NFS to the server on Cycle was really simple, but the read and write speeds were abysmal. This was really surprising because the tier of the GCP Filestore that I had deployed was over 500$ a month.

GCP Filestore Configuration Summary
Service Tier Regional
Location us-central1
Capacity Range 1 - 9.75 TiB (in increments of 0.25 TiB)
Cost Estimate $460.80/month ($0.63/TiB/hr)
Read IOPS 12,000
Write IOPS 4,000
Read Throughput 120 MiB/s
Write Throughput 100 MiB/s

The way the offload media plugin works, media is uploaded to the site and then migrated to the object storage which then distributes through the CDN. With write throughput of 100MiB/s (about 75x slower than my m3 mac) there was no way this tier would keep up with the demands of a reasonable system.

Given this was already a steep price for a WordPress backend, even if it could manage multiple sites, I decided to re-architect this approach and see if I could create a primary secondary type setup that would allow the compute node disk to be utilized.

As an aside, the GCP Filestore tier I saw that would have an acceptable IO spec (>1k MiB/s for writes) was over 25k a month. This seems to be due to the fact that on GCP the only way to scale network speed is to scale filestore capacity.

Moving to the Primary and Secondary Architecture

This plan was simple. Create a primary container that would serve as the source of truth for all WP files and then secondary containers that would sync against the primary to stay up to date.

While simple, this also unlocks us from most of the remaining constraints that come along with WordPress:

  1. The secondaries can be stateless because they are syncing files on startup and then progressively syncing over time in the background based on changes.
  2. The primary can still be updated in a way that is simple and straightforward for both developers and editors.

And not to digress, but that second point carries a lot of weight. With WordPress you have non-technical editors maintaining and creating posts/pages/etc within the backend. The last thing you'd want to do is make them learn new technical skills just to maintain the application.

Spinning up the main, stateful, WordPress instance and connecting it to the CloudSQL is a snap. On Cycle, it was as easy as:

  1. Creating the container.
  2. Making sure it had the appropriate environment variables for connecting to the database.
  3. Going through the 5 minute install (can also be automated through WP CLI if you wish).
  4. Uploading/importing the backup (or you can start here from scratch and build things out).

The primary container would need to run an rsync daemon that would allow the secondary instances to sync against it. Rsync works best here because the delta-transfer algorithm reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. It's also not compute hungry or memory inefficient.

With the primary instance set up to run the rsync daemon in the background, the secondary nodes can now run a startup command to sync against the primary and then an additional script that syncs with changes to the primary.

Restarting the secondary nodes when they need updates would be ideal, but WordPress is so tightly coupled that a change to plugins and other components actually creates database entries / changes that can cause the secondaries to crash if they're not updated in a timely manner. Therefore, the background sync client is needed on the secondary nodes.

It's essential, when implementing the secondary nodes sync algorithm, to make sure that there are no writes currently taking place during the sync.

The Result

One of the major benefits of doing this on Cycle is the immediate scale of the network. The secondary WordPress instances can be distributed by Cycle to compute nodes in any data center on GCP and because of how GCP treats their regions (they're all connected into a single VPC), the compute nodes can talk to the database regardless of which data center they're in.

It was surprising to find out just how expensive and slow the net filesystem solution would be. Luckily using simple Linux mechanics and a bit of magic on top of rsync we were able to get rid of the NFS mounts and in the process unlock an architecture that's much more appealing without any added cost.

If you haven't heard of Cycle before, we're a kubernetes alternative that simplifies container orchestration and infrastructure management.

💡 Interested in trying the Cycle platform? Create your account today! Want to drop in and have a chat with the Cycle team? We'd love to have you join our public Cycle Slack community!