Skip to content
Infrastructure as code

Infrastructure as code: stop clicking and start governing 🌍

Infrastructure as Code, or IaC, means defining infrastructure and operational configuration as versioned, reviewable, repeatable code. Practical translation: fewer console clicks, fewer “I only changed one tiny rule”, and fewer surprises at 6:47 PM on a Friday.

IaC is not only about creating VMs or clusters. More importantly, it is about making infrastructure explicit, repeatable, and verifiable instead of leaving it scattered across tickets, screenshots, tribal memory, and other historically unreliable storage systems.

What actually belongs to IaC 🧱

With IaC you can manage as code:

  • networks, subnets, security groups, and firewall rules
  • clusters, nodes, load balancers, and storage
  • technical identities, roles, and permissions
  • DNS, certificates, and cloud integrations
  • policy, tagging, naming conventions, and guardrails

The point is not “write Terraform because everyone else does”. The point is to treat infrastructure with the same discipline you expect from application software.

Lifecycle of an IaC change 🔄

When IaC works well, an infrastructure change stops being a handcrafted gesture and becomes a readable, reviewable, repeatable flow.

    flowchart LR
  A[Change module or stack] --> B[Lint and validate]
  B --> C[Plan]
  C --> D[Pull request and review]
  D --> E[Merge]
  E --> F[Controlled apply]
  F --> G[Remote state update]
  G --> H[Drift detection and observability]
  

If your process lacks a readable plan, review, and protected remote state, you are not doing mature IaC. You are only moving risk from a web console into a Git repository.

Main models: declarative vs imperative ⚖️

In practice you will find two main approaches:

  • declarative: you describe the desired state and the tool figures out how to get there
  • imperative: you describe the exact steps to execute in sequence

For provisioning and state management, declarative usually wins on readability and governance. For bootstrap, targeted configuration, or procedural orchestration, imperative still has a role. The trick is not to use a hammer for every kind of screw.

Common tools and useful differences 🛠️

  • Terraform / OpenTofu: the de facto standard for declarative multi-cloud infrastructure provisioning.
  • Pulumi: an IaC approach using general-purpose languages, useful when you want more expressive composition.
  • Ansible: excellent for configuration and operational automation, less ideal as the only source of truth for all cloud provisioning.
  • Crossplane: interesting when you want to bring infrastructure control models into Kubernetes ecosystems.

Choosing the right tool is useful. Choosing it and then using it without conventions, policy, or review is just an elegant way to create versioned chaos.

The opposite is also true: spending months debating the “perfect” tool while the team keeps clicking around in consoles is just procrastination with technical vocabulary.

How to separate layers and responsibilities 🏗️

One of the most effective ways to avoid chaos is not throwing everything into the same stack. A simple and robust model is to split by responsibility layer:

    flowchart TD
  A[Foundation layer] --> B[Platform layer]
  B --> C[Application layer]
  A --> A1[Network, accounts, IAM, state backend]
  B --> B1[Cluster, shared services, observability]
  C --> C1[App-specific config and integrations]
  

That separation reduces blast radius, clarifies ownership, and prevents an application change from accidentally touching half of your cloud foundation.

Best practices that actually matter ✅

1. Version everything that matters 📚

Not just the main files: modules, policy, shared variables, validation workflows, and operational documentation matter too. If an infrastructure change is important but not in Git, sooner or later it becomes a sad story.

2. Separate environments and responsibilities clearly 🧭

Production, staging, and development should not differ because of magic or oral tradition. Define clear conventions for:

  • repository structure
  • resource naming
  • promotion across environments
  • module ownership
  • boundaries between platform teams and application teams

3. Treat state as a critical asset 🗃️

If you use tools with remote state, protect it properly. Locking, backup, encryption, least-privilege access, and audit are not optional. Losing or corrupting state is a very efficient way to turn provisioning into digital archaeology.

4. Design small, readable modules 🧩

A good module reduces duplication and clarifies responsibility. A gigantic module that creates half a cloud region and 27 environment exceptions is just a monolith with different syntax. Prefer modules that are:

  • built around clear interfaces
  • limited in input surface and well documented
  • easy to test
  • easy to version and reuse

5. Validate, test, and review plans 🧪

IaC without validation is just hope with a different file extension. Always include:

  • linting and formatting
  • validate or syntax checks
  • plan output in pull requests
  • policy as code
  • module tests where the tooling supports them

6. Handle drift explicitly 🛰️

If someone changes a resource manually, you need to know. Drift should be detected, analyzed, and corrected intentionally. Self-healing is not always the right answer, but silent drift is almost always a bad one.

7. Integrate IaC with security and compliance 🔒

IaC is the perfect place to codify repeatable controls such as:

  • mandatory encryption
  • consistent tagging
  • network restrictions
  • limits on public exposure
  • constraints on regions, SKUs, and sensitive configuration

8. Design a minimal but rigorous pipeline ⚙️

You do not need a theatrical pipeline with 19 stages just to feel enterprise-grade. You need a sober pipeline that reliably does the important things:

  • format and lint
  • validate
  • publish plan output in pull requests
  • run policy and security checks
  • allow apply only on authorized branches or workflows
  • keep audit history and notify after apply

The value is not in the number of stages. It is in their reliability and in the fact that they behave the same way even when urgency starts knocking.

9. Document module inputs, outputs, and assumptions 📝

Many IaC repositories degrade not because the tool is weak, but because nobody can remember anymore:

  • which inputs are mandatory
  • which outputs are stable reusable contracts
  • which conventions are required
  • which security or naming constraints are implicit

If a module only works because “the original author knows how”, you have not standardized anything. You have only moved cognitive dependency from a console into a file.

IaC, GitOps, and CI/CD: who does what 🔗

The useful distinction is this:

  • IaC creates and governs infrastructure
  • CI/CD builds, tests, and publishes artifacts
  • GitOps reconciles application configuration and desired operational state from Git

In many contexts the best flow is: IaC for clusters, network, managed databases, and identity; CI/CD for images and packages; GitOps for application configuration and continuous rollout behavior.

Put more bluntly:

  • IaC creates and protects the operational context
  • CI/CD produces the software
  • GitOps governs what should run inside that context

Common mistakes to avoid 🚫

Modules that are too generic or too smart: if nobody can tell what resources they really create, you have already lost transparency.

Local state shared through optimism: if coordination depends on “nobody else will run it today”, you are already negotiating with disaster.

Variables without governance: random naming, obscure defaults, and endless environment overrides are a short road to hard-to-trace errors.

Messy mix of provisioning and runtime configuration: creating a cluster and installing half the universe in the same layer often makes the system harder to evolve.

No pull request checks: if the first time you inspect the plan is after apply, that is not review. That is post-incident archaeology.

In short 🧾

IaC is not just a more elegant way to create cloud resources. It is a way to bring discipline, review, security, and repeatability into infrastructure work. Done well, it reduces drift, manual clicking, and dependence on individual memory. Done badly, it only creates more YAML or HCL to decipher at 2 AM. Better aim for the first option.

Last updated on