The story behind our Container Deployment Pipeline and dev-friendly operations

cloud66-blog-container-deployment-pipeline-with-dev-friendly-0operations

This week we announced a set of features and open source integrations that will help you deploy your applications onto any Kubernetes cluster with ease and confidence.

But before I get into what those features are, let me tell you why we felt the need for them.

Kubernetes day 1: making infra easier

Around a year ago, we moved our entire infrastructure from AWS running on EC2, RDS, S3 and our own orchestrator, to Google and Kubernetes (as you may have read about on our GCP case study). Why the move?

We had been running on over 40 servers with six or seven different "flavors" (frontend, backend, Redis, NAS,...) for over three years. While this was working fine for our skilled Ops team, we needed our Devs to innovate faster without disrupting Ops. One example was the addition of new components to our infrastructure: but even minor additions like ElasticSearch had to be deployed and maintained in their own special way on our infrastructure, and added new servers to our AWS monthly bill. Surely as a container Ops company we could do better!

After migrating, setting up a Kubernetes cluster became easy and efficient: we ended up with a small cluster of only 10 servers to take on the load we were previously running on 40 servers. Even better, we had no more snowflakes. Adding a new server to the infrastructure was just as simple as adding a new server to the cluster, and adding services was hassle-free since they were pre-configured containers themselves. This reduced our bills, simplified our infrastructure and allowed us to move much faster and improve our service quality. (On a macro level, it made me ponder wider changes in the industry, as you may have seen on InfoWorld.)

Where is the catch, you might ask? It was waiting for us in the pipeline.

Kubernetes day 2: a pipeline that moves as fast as the code

Before moving to Kubernetes, we had a very normal build and deployment pipeline:

Git -> CI (unit tests) -> Servers -> DB migrations -> Traffic switch (from old code to new) -> Post deployment tasks (APM, metric annotations, etc).

Thinking about our deployment process for Kubernetes, we knew we needed to add a step to the flow of building the Docker images, but that was not all: we needed to think about read-built images (like Redis and RabbitMQ), Kubernetes configuration files, environments, management of configuration values and secrets, and more.

By the end of this mapping exercise, our deployment flow looked very different:

Step 1: Building and Retagging

Services with our own code:
Git -> CI (Unit tests) -> Builder -> Docker Registry

Services with ready-built images:
Docker image -> Retagging -> Docker Registry

The result of this became BuildGrid, our build service, which is part of Skycap. By the end of step 1, we had all the images we needed to deploy the application, all with the same unique Deployment Tag.

As part of this step, we also needed to obfuscate secrets from the Docker images to protect our IP and comply with security, and also to reduce each image's vulnerability to attack by shrinking the runtime image to the bare minimum. That's how we came to writing and open sourcing Habitus.

Oh, and another takeaway of this effort? We figured out that the build phase is the least-complex part of a container pipeline...

Step 2: Tackling Configuration Files

As we've mentioned elsewhere, one side effect of deploying multiple instances of your application is that the role of environments becomes less clear: all of a sudden you might have three stacks all running as production, with the configuration read from the same section of database.yml or production.rb. What we realized is that we needed templates that allowed some parts of these configuration files to be replaced with values relating to the specific deployment.

Moreover, depending on the environment we were deploying to,the configuration files needed to be slightly different: in production, we could run our database on RDS (externally) and have log shipping sidecar containers on each pod, while in staging, we would have a database in a container and maybe on a different base Docker image (with some debugging packages installed for example).

But generating files was one thing—other challenges were the upkeep of existing files, clear observability on version control, and fine-grained user access definitions.

All these small variations of configuration made us rethink the concept of application environments, and come up with of Formations (video demo here). A Formation is a single instance of an application deployed to a cluster: a combination of environment and deployment destination, like "production on AWS" or "staging on bare metal".

Formations rely on Stencils (more on those below) and are a perfect complement to Helm charts. Helm is a great open source project that benefits from an amazing community, and where release cycles are predictable and configuration pre-baked (e.g., Redis), we use it extensively. Where release pace, configuration and security are more dynamic, and where RBAC and version control are key (like for an internal service), we use Formations and Stencils. This also works well when you want to keep configuration files "native" and don't want to use a scripting language (Go Templates) for a structured file type (like YAML).

In short, we wanted to offload the grunt work of configuration files from our Devs, and ensure easy maintanability and compliance with Ops policies. Today, Stencils and Formations work perfectly to take care of our own Kubernetes configuration file generation and maintenance.

Step 3: Freeing Our Devs from Secrets and Configuration

With environments being a more flexible concept in the Kubernetes world, we needed a tool to help us configure our application without having to hand-hold each deployment variation. As an example, this meant we needed to rotate an API key only once for all Formations. Our tool also needed to support secrets, but we didn't want to write yet another secret manager when most clouds have a perfectly good KMS. This led us to building an open source project we call Gifnoc (Config, turned on its head!), which is a cloud-friendly configuration tool that supports Vault or AWS/Google KMS.

We wanted to keep our secrets in Kubernetes configuration files and use them the Kubernetes way (environment variables or TempFS mounts) without worrying about leaving them in a file checked into the repository inadvertently. While there are solutions like git-crypt to encrypt the secret files, they don't have fine-grained user access control. Also, we wanted to leave the files partially open to developers to modify, without having to be exposed to the secrets themselves. Just like a stencil you use when you spray paint a wall, our Stencils feature provides selective access: for example, it can pull the secrets from an external place (e.g., through Gifnoc and Vault) right before applying them to Kubernetes on a secure server. Stencils are essentially vanilla Kubernetes configuration files with simple, single word placeholders and no control flow (if..then..else or for..loops).

Step 4: Unit-Testing Our Configuration Files

To enhance the self-service experience (curated by Ops, used by Devs) in our team, we wanted to allow Devs use the full Kubernetes API without any limitations. We didn't want to wrap the API in another one to limit their access to some features like storage classes or fixed IP assignments, but we also didn't want our Ops to lose sleep over every change that might bring the entire cluster down. Kubernetes RBAC is a great feature to control access to resources, and we wanted to augment that with something that looked like unit tests for our Kubernetes configuration files.

This is why we built Copper, an open source project that allows us to write and apply policies to our Kubernetes files before they hit the cluster. Examples of those policies could be banning use of latest as an image tag, or warning the Ops to sign off a deployment if the IP address for a load balancer is changed.

Step 5: Bringing it all together

Now we have configuration files generated based on the specifics of a Formation, with values coming from version controlled code repositories or Docker registries, and configuration files that are also git-version controlled. All the Devs need is to apply those files to the cluster and make sure the deployment has gone through successfully.

The last component in our deployment is a small Kubernetes ReplicaSet we call Event Relay. Event Relay sits on a Kubernetes cluster and listens to its events. Using the deploy tag it then reports back to each Formation about the success or failure of the entire deployment (which could trigger a rollback by simply applying the previous version of the Stencils from the same Formation).

Skycap, A CDP That Speaks Config and Security

After completing this part of our journey, we built all of these elements into Cloud 66 Skycap, our container deployment pipeline. Today I am proud to announce the GA release of Formations, Stencils, Copper integration and Event Relay as Cloud 66 Skycap features. All other features and projects had been live for a while.

Using these powerful projects and features makes it easy for us to build and deploy our stack, which powers more than 3,500 customer workloads, onto any Kubernetes cluster with confidence and following best practices. By making them available to our users, we hope to simplify DevOps for them, so that they can accelerate their operations too.