Perspective · AI in Operations
AI in infrastructure operations
Agentic ops, copilot-driven runbooks, and where automated reasoning actually pays off in the NOC.
By DM Cyber Solutions · 2026
The hype, briefly
"AI for IT operations" has been a vendor pitch since at least 2017. Most of what got sold as AIOps was statistical anomaly detection wearing a sweater. That changed in the last 18 months -- not because the math got better, but because language models can now read a stack trace, a config diff, and a runbook in the same context window, and reason across them.
Where it actually pays off
From the engagements we ran in 2025-2026, three categories of real value emerged:
- Runbook authoring & review. Senior engineers use LLM-assisted tools to draft, format, and quality-check runbooks. The model is bad at writing the runbook from scratch but excellent at reformatting a Slack thread into a structured procedure with rollback steps. Output still needs review -- but the floor on documentation quality moved up by a lot.
- Triage co-pilots. First-pass on a P2 ticket: parse the alert, pull related dashboards, summarize the last 48h of change history, suggest two or three hypotheses. Saves 10-20 minutes per ticket and dramatically improves the on-call experience for a junior engineer.
- Config diff review. Multi-thousand-line Terraform plans are unreadable. LLMs are very good at "here are the 7 things that matter in this 4000-line plan, ranked by blast radius." Pair-review with a human still required.
Where it does not pay off (yet)
- Fully autonomous remediation. The model that confidently writes a CLI command is the same model that confidently writes the wrong CLI command. We have not yet seen an autonomous-action setup we trust in production-OT environments. Read-only investigation, yes. Write actions, no.
- Novel architecture decisions. Models trained on the public internet are great at generic patterns and bad at your environment. They will confidently recommend Kubernetes for a workload that should be three EC2 instances.
- Compliance evidence. Regulators want deterministic, auditable, repeatable processes. LLM-generated evidence is not (yet) any of those.
What we are recommending right now
- Use AI assistants for authoring, review, and triage. Treat them as a senior pair-programmer who is fast, opinionated, and occasionally wrong.
- Keep humans on the commit and apply boundary. Read-only investigation can be AI-driven; mutation should require human review.
- Build the context layer the LLM needs to be useful: a clean source-of-truth (NetBox), good runbook hygiene, structured alerts, and consistent naming. Without those, the assistant is just guessing.
- Pick one tool per workflow, not a buffet. Engineers tune to the assistant they use daily. Half a dozen partial integrations is worse than one good one.
Talk to us about AI-assisted ops