Everybody knows the benefits of having a scripted infrastructure. Tools like Chef, Puppet, Ansible or SaltStack all help you with scripting your infrastructure.
You can go further and say not only scripting infrastructure is the right way to go but immutable infrastructure is what we need: re-creatable and disposable infrastructure where you don’t modify the state of infrastructure to reflect outside change, but build it again every time from scratch.
Although I agree with both of the above practices, I think this model is broken and is breaking even more every day.
To explain why I think that, it's better if we align our understanding of the definition of “datacenter”, or more precisely its evolution over time and what I call the “New Datacenter”
The New Datacentre
There was a time when your datacenter was where your servers were in your rack and in your own building. You had to arrange for the power, cooling and network to be built for your datacenter needs. Overtime this changed: after a while it was still your servers, but in somebody else's rack and their building.
Nowadays with cloud, none of it is “yours”, and this goes beyond who’s asset sheet all of this is written on, but who is “responsible” for it. In the IaaS world, it certainly is not us anymore, and everybody is thankful for that.
I believe this goes beyond the servers, disks and network:
If you think about datacenter as the infrastructure that powers your application, naturally you have to think about everything that makes the application work but is not part of the code.
Services like hosted source control providers, database as a service, storage services, error handling services, alerts and notification providers and much more all power our applications and I would consider all of those part of the New Datacenter.
Scripting is broken
So why is infrastructure scripting broken? In an ideal world you write a bunch of scripts that define your infrastructure and get you from nothing to A, where your app is up and running. After that you have to react to changes happening around you: code changes, version updates and new business requirements that mean writing migration scripts to take you from A to B and B to C.
Even if you and your team are extremely disciplined and keep your scripts clean and up-to-date, there's always a high chance of scripts not working when you run them next time. Not because your scripts are broken but because of all of the external dependencies you cannot control: Linux Kernel updates from your cloud provider, apt repository changes or new library releases with loose dependency definitions are just a few that you cannot control and nobody warns you about so you can prepare for. They always get you at the worst time possible: when there is a fire.
As the number of external dependencies in the new datacenter grows the probability of your scripts breaking grows with it. As we lose full control over the entire stack, we are going to risk being more exposed to unwanted and unannounced changes we are not prepared for. Legal SLAs and contract based development is not going to solve most of those issues either.
Scripting languages like Chef and Puppet have their roots in pre-cloud days and are great tools to describe our desired infrastructure. But as we buy more of our infrastructure as services, scripting our desires is going to mean less and less and therefore the old tools that were suitable for the days of full stack control are not going to cut it anymore.
When I look at our customers and how we help them grow and stay up and running I see that what we do is not writing clever scripts or making sure that they are up-to-date. Instead what we do is to be on top of the changes that are happening in this growing network of external dependencies and make sure everything we do is always compatible with the rest of what our customers use. By being ahead of our customers we can be sure that the next time they are going to scale or rebuild their servers, they can do this without being caught out by a broken image or out of date library.
Building tools for humans
For a large part, this is not a problem to be solved by automation. There is a limit to what computers and automation can do. The rest lies with standards, documentation, agreements and contracts, but most importantly applying human intelligence when all else doesn’t work. That is why as the number of nodes in our networks of interdependent infrastructure components grow so will our dependence on human intelligence to identify and resolve issues. And that is an expensive problem to solve.
The tools of the new datacenter are not only automation tools that are built for the cloud era or runtime bundled containers, but those that make it easier for humans to detect and rectify issues at a larger scale and at a cost that doesn’t grow linear to the size of the infrastructure.
The sooner we leave the world of scripting, the better we can all embrace the true power of elastic infrastructure as a service!