Adobe's Platform Story: A New Platform Paradigm for Scaling the Kubernetes Fleet

by Alex Petrică, Adobe

📍 Atlas 1 Platform Engineering Intermediate

14:00 – 14:30

What does an ideal Kubernetes platform look like? You likely think of one that maintains high delivery speed, consistency, and reliability, while being easy to maintain and quick to recover. However, as a platform grows from a handful of clusters to a global fleet spanning AWS, Azure, and on-premise datacenters, these qualities often fade. To prevent this, we must adapt, shifting from managing clusters as a collection of individual tasks to orchestrating them as a unified, declarative fleet. We need to leverage cutting-edge paradigms to achieve our ultimate goal: managing a complex, multi-cloud fleet with the simplicity and consistency of a single cluster.

At Adobe, our legacy Python-based system for provisioning and managing clusters reached its limits at 500+ clusters, causing pipelines to take approximately a week for delivering a new cluster, while upgrades became a painful process due to other dependencies and manual steps. We took the bold step of rebuilding the platform from its core, shifting to a GitOps model fueled by Cluster API, AWS Controllers for Kubernetes (ACK), and ArgoCD. These components form the backbone of our new platform, each with a dedicated role:
- Cluster API provides a declarative, consistent foundation for provisioning across environments.
- ArgoCD continuously reconciles cluster configuration.
- AWS Controllers for Kubernetes allows us to provision dependent cloud resources using the same declarative API as the clusters themselves.

Within Adobe, we call it Ethos Core, a modular platform that separates infrastructure from configuration and allows us to scale reliably. This approach has considerably cut delivery time, reduced upgrades to a single configuration change, and enabled safe, parallel operations across hundreds of clusters.

In this session, we will share the lessons we learned making Cluster API and cloud provider controllers the epicentre of our fleet: why decoupling infrastructure from configuration is critical, how to retire long-running pipelines, and the pitfalls to avoid when scaling to hundreds of clusters.