Running Methods of Procedure in Production Without Praying: The MOP Pattern That Actually Works

Every post-mortem includes the phrase "no MOP was followed" or "the MOP was incomplete." MOPs — Methods of Procedure — are supposed to be the structured, pre-approved, step-by-step change plans that prevent production disasters. And somehow, they fail every single time.

The problem isn't that network engineers don't understand the need for structured change procedures. The problem is that most MOP tooling turns them into 40-line Word documents that nobody reads, no one can execute from directly, and that provide zero safety net when step 23 produces output the author didn't anticipate.

Why MOPs Fail in the Real World

A proper MOP has three sections that most tools get wrong:

Pre-checks verify the current state matches your assumptions before you touch anything. Is the BGP session up? Are the routes present? Is the target device reachable? If any pre-check fails, the MOP should stop immediately — but most teams run these manually and just "assume things are fine."

Change steps are the actual configuration commands. They need to be sequenced, validated, and — critically — each step needs to define what success looks like. "Add the route-map" isn't a complete step. "Add the route-map and verify it appears in show run | section route-map" is.

Post-checks confirm the change achieved what it was supposed to. Not "the device is reachable" — "the new next-hop is installed in the FIB and traffic is flowing through it."

The gap between what a MOP should be and what most teams actually use is enormous. Most MOPs live in Confluence, get copy-pasted into SSH sessions, and have zero programmatic validation of results.

The Approval Workflow Nobody Builds

Here's a gap most automation platforms ignore: MOPs need approval workflows. Not just "someone reviewed the Word doc." Actual, programmatic, pre-change approval gates where:

The MOP author defines who needs to approve
Approvers review it against the change window
Each step requires sign-off before execution
Rollback is pre-approved and pre-defined

Approvers review it against the change window
Each step requires sign-off before execution
Rollback is pre-approved and pre-defined

Each step requires sign-off before execution
Rollback is pre-approved and pre-defined

Rollback is pre-approved and pre-defined

Without this, you're just running scripts. The MOP isn't a script — it's a governance boundary that protects production.

How NetStacks Handles This Differently

NetStacks treats MOPs as first-class automation objects, not scripts or documents. A MOP in NetStacks has structured pre-checks that run automatically and halt execution if conditions don't match. Each change step has its own success criteria that gets evaluated against the device output in real-time. Post-checks aren't an afterthought — they're baked into the execution flow.

The approval workflow means that before any MOP runs in production, the right people review and authorize it. No bypassing, no "I'll just run it quickly." The system enforces the process.

And when something goes wrong — because something always goes wrong — the rollback is already defined and tested. You don't figure out how to undo the change while the NOC is watching. You hit rollback and the MOP executes the reverse steps in order.

The MOP Pattern Teams Should Copy

Even if you're not using NetStacks, here's the pattern every team should adopt:

Every MOP gets pre-checks, change steps with success criteria, post-checks, and rollback — all structured, not free-text.

Pre-checks run automatically. If BGP isn't established on the peer you're modifying, the MOP stops. No human judgment call needed.

Each change step defines its own validation. After "ip route 10.0.0.0/8 192.168.1.1," the MOP immediately runs "show ip route 10.0.0.0/8" and verifies the next-hop.

Post-checks are independent verification, not just re-running the change command. They answer: "Is traffic actually flowing the way we intended?"

Rollback is defined before execution, not after failure. Every successful MOP should end with "post-check passed, no rollback needed."

The teams that take MOPs seriously don't just write better documents. They build enforcement into the tooling so the process can't be skipped. That's the difference between "we have MOPs" and "our MOPs actually prevent outages."