Field Report
The Idempotency Audit: When Scripts Run Twice
Jan 17, 2026 · 6 min read
A key differentiator for senior engineers is the focus on idempotency. This report tells the story of a script that ran twice and broke production, highlighting why "check-then-act" logic is fragile compared to declarative state.
Outcome: Automated scripts replaced with declarative state enforcement, preventing race conditions and duplicate resources.
At a glance
- Goal
- Automate VM provisioning in a brownfield environment.
- Constraint
- Existing legacy network configuration must be preserved.
- Reality
- The script added duplicate NICs and corrupted routing tables when retried.
Engineering standards used
- Idempotency is code property. It's not the runner's job to be safe; the code must handle re-runs.
- Destructive actions need checks. Explicitly verify state before modifying or deleting resources.
- Distinguish change from no-op. Logs must clearly show when no action was taken vs. when a change occurred.
The "Smart" Script
The incident started with a well-intentioned script designed to provision VMs. It included logic to check if a VM already existed before attempting to create it.if (!exists(vm)) create(vm);
This logic works perfectly in isolation. However, in a distributed system, or even a slow one, the gap between the check and the act is a danger zone.
The Race Condition
During a deployment, the API response for the creation request timed out. The system, interpreting this as a failure, retried the script.
The first request had actually succeeded on the backend but failed to report back in time. The retry script checked for existence, but due to eventual consistency or simple timing, the new VM wasn't yet visible in the query result.
The script proceeded to "create" the resources again. Since the VM ID was reused, it attached a second network interface to the existing VM instead of failing or updating it. This duplicate NIC grabbed a new IP via DHCP, creating a routing loop that took the application offline.
The Fix: Declarative State
The solution wasn't to write better checks. It was to stop checking entirely.
We moved the provisioning logic to a declarative tool (Terraform/Ansible). Instead of saying "create this," we defined the end state: "This VM exists, and it has exactly one NIC."
When the declarative engine runs, it queries the *actual* state of the resource. If it sees two NICs, it removes one to match the definition. If the VM exists, it does nothing. The outcome is always the same, no matter how many times you run it.
Takeaway
If you can't run it twice safely, don't run it once automatically.
Idempotency is the foundation of automation that lets you sleep at night.
Related notes
All notesChecklist
Azure Foundations: The Governance Baseline
Jan 17, 2026 · 5 min read
The boring but essential checklist that prevents Azure environments from rotting into ClickOps chaos.
Checklist
What Operators Actually Check on Monday Morning
Jan 14, 2026 · 7 min read
The minimal checks that prevent silent regression when the consultants are gone.
Field Report
When Time Breaks Identity
Jan 14, 2026 · 8 min read
Why authentication failures feel random when clocks drift and trust boundaries are misunderstood.
Next step
If this problem feels familiar, start with the Health Check.
It measures drift and recovery evidence, then returns a scored report with a focused remediation plan.

