Infrastructure-as-code cost-benefit analysis
So many hyphens! And there go half my subscribers :) But seriously, at some point, you may need to push for the benefits of infrastructure-as-code (IaC) at a new job or to new leadership.
Hey there, friend. It’s Ash with yet another analysis of an SRE-related topic, infrastructure-as-code (IaC).
My full post will have a primer on IaC, but I am guessing you like most of my newsletter subscribers are across what it is.
If not, send me an email and I’ll share the primer with you. Let’s begin…
The time, money and energy costs to do IaC are outweighed by the benefits. This is especially if you’re running at a reasonable scale, say high 10s of VMs.
If you’re starting out on IaC, your transformation cost will be minimised by 2 factors:
your primary infrastructure is already in the cloud and
there’s institutional or team knowledge of IaC tools and methods
The former is already a given in most modern organisations and the latter can be developed rapidly with an effective IaC capability development approach.
Here’s a quick rundown of the benefits:
code integrity - lowers technical debt of change through auditable, version-controlled code
lower cost - less engineer time ($$$) is spent on “yak shaving” (repetitive, manual tasks)
faster deployments - little lag time for new VMs to spool up once code changes are deployed
lower human error - no handoffs means less risk of human errors that lead to downtime and performance degradation
higher availability - reduction in non-availability of infrastructure during spikes in demand
Another sellable benefit of IaC is that it supports DevOps, which is very in right now. This is the case because an easy-to-share code paradigm allows developers to get more involved in configuration and collaborate with production-focused engineers.
Now, let’s cover some costs and benefits of IaC in more detail.
Cost: IaC takes time to learn
IaC is a new paradigm for engineers who may be used to SSHing into a server and directly making modifications. With IaC, engineers will note an additional step between their writing a change and the change being deployed to the infrastructure.
The engineer makes the necessary code addition or adjustments, pushes it to the provisioning tool, which then directs the changes to infrastructure. They first need to learn the code and secondly need to keep the habit and avoid the temptation to make direct “dial-in” changes to infrastructure.
The engineering group will need to invest in the ongoing development of engineers to ensure this happens. One path involves implementing a culture (change) that fosters continuous development. This could manifest as ongoing feedback and learning loops.
Benefit: IaC is flexible to many kinds of infrastructure
Infrastructure-as-code isn’t relegated to public cloud computing use cases. You can use it to define the physical infrastructure that you have on-premises.
The benefit of using IaC in this situation is that every application gets assigned a distinct set of resources from the outset. You gain greater visibility and granularity into how resources get allocated to applications.
Benefit: IaC assures consistency across environments
Anecdotally, I’ve seen quite a lot of code crumble in production. The cause was sometimes simple like differing environments between stages of the software development lifecycle.
Developers were testing on a different environment — “localhost” — to what production would be. The localhost was often souped up in comparison with the production environment planned by operators.
The concept of having a single source of infrastructure code for all stages reduces the risk of different resource allocations — and subsequently different performance — for the same feature or story.
This works all the way down to the granular level of matching OS versions, patch level etc. Differences in these granular properties are often the culprit behind code working well in testing, but not in production.
A live environment clone, created using the exact same IaC as the live environment, has the absolute guarantee that if it works in the cloned environment it will work in live. — Dan Merron & Shanika Wickramasinghe, DevOps consultants at BMC
IaC also ensures that different layers of infrastructure supporting your code are defined appropriately to suit your production requirements. These layers include:
IaaS artefacts like VMs, load balancers, databases
On-premises hardware
Platforms like Kubernetes
Cost: IaC needs constant maintenance
The IaC code that you have today may not be viable in the near future. This is because the underlying infrastructure is constantly updating. Kubernetes is releasing updates all the time. Operating systems need constant patching. New security rules get recommended.
IaC is a constantly moving target.
Subsequently, the necessary code for controlling infrastructure is always different from what it may have been earlier. This calls for a consistent testing routine, so you can assure that the code is up-to-date with all your up-to-date IaaS artefacts and platforms.
It also goes without saying that the engineers responsible for IaC will need to stay on top of the changes that occur — all the time.
You are now armed with insight for the next time someone questions the viability of transitioning or keeping on with IaC