NOC Agents
TeamsEnterpriseAutonomous AI agents that monitor, triage, and respond to network events using the ReAct pattern with tool approval workflows.
Overview
NOC Agents are autonomous AI agents that monitor, triage, and respond to network events without constant human intervention. They combine the reasoning capabilities of large language models with the ability to execute network commands, query knowledge bases, and create incidents, all governed by a tool approval system that prevents unauthorized changes.
Agent Types
| Type | Purpose | Example Use Case |
|---|---|---|
| Triage | Initial event assessment and categorization | Receive a BGP neighbor down alert, check neighbor status, classify severity, and create an incident with initial findings |
| Protocol-Specific | Deep expertise in a specific protocol (BGP, OSPF, STP) | Investigate OSPF adjacency failures by checking interface status, MTU mismatches, authentication, and area configuration |
| Custom | User-defined for specific workflows | Validate post-maintenance state by running a predefined set of checks and comparing to a baseline |
The ReAct Pattern
NOC Agents use the ReAct (Reasoning + Acting) pattern, a structured loop where the agent observes the current state, reasons about what to do next, takes an action, and then observes the result. This loop continues until the agent has enough information to produce a final answer or reaches its iteration limit.
Agents can be configured to require human approval before executing destructive commands (configuration changes, interface shutdowns, device reloads). Read-only commands like show can be auto-approved while write operations require explicit confirmation.
How It Works
Agent Architecture
Each agent is defined with a system prompt, a set of available tools, an LLM provider and model, and execution constraints (max iterations, tool timeout, total execution timeout). When triggered, the agent enters the ReAct loop:
- Observe — The agent receives the trigger event (alert, scheduled task, or manual invocation) along with relevant context from the knowledge base.
- Think — The LLM analyzes the observation and decides what action to take next. It explains its reasoning in the conversation log.
- Act — The agent requests execution of a tool (SSH command, API call, knowledge base query, incident creation). If the tool requires approval, execution pauses until approved.
- Observe — The result of the action is fed back to the agent. The loop continues from step 2 until the agent produces a final answer or reaches its iteration limit.
Trigger Types
Agents can be activated through three trigger mechanisms:
- Alert-Triggered — When an alert matches a configured pattern, the agent is automatically invoked. Agents with
auto_triage_enabledare evaluated against incoming alerts using theirauto_triage_alert_patterns. - Scheduled — Agents can run on a schedule for periodic health checks, compliance audits, or baseline comparisons.
- Manual — Users can invoke agents directly from the NOC Agents dashboard, providing a specific task or question as input.
Tool Approval System
The tool approval system controls what agents can do on your network. Each agent has a require_approval_for_destructive setting and a list of allowed_tool_categories that determine its permissions:
- Auto-Approved — Read-only commands (show, display, get) execute immediately without human intervention.
- Requires Approval — Configuration changes, interface modifications, and device reloads pause execution and notify an approver. The agent resumes after approval or terminates on denial.
Execution History and Auditing
Every agent execution is recorded with full details: trigger source, input task, conversation log (the complete ReAct loop), tool calls made, iterations used, final answer, execution state (running, completed, failed, timeout), and timestamps. Execution history supports filtering by agent, state, trigger type, and date range.
Creating Agents
Follow these steps to create and deploy a NOC Agent for automated network event response.
- Navigate to AI > NOC Agents — Open the NOC Agents dashboard from the main navigation menu.
- Create a new agent — Click “New Agent” and provide a name, description, and agent type (Triage, Protocol, or Custom). For protocol-specific agents, select the target protocol (BGP, OSPF, STP, MPLS).
- Configure the LLM — Select the LLM provider and model for this agent. Agents performing complex analysis benefit from more capable models (e.g., GPT-4o, Claude Sonnet), while simple triage agents can use faster models.
- Write the system prompt — Define the agent's personality, expertise area, and instructions. The system prompt guides how the agent reasons about problems and what approach it takes.
- Configure triggers — Set up alert patterns that activate this agent, enable auto-triage for automatic activation, or configure a schedule for periodic execution.
- Define available tools — Specify which tool categories the agent can use: SSH commands, API calls, knowledge base queries, incident management.
- Set tool approval policy — Enable
require_approval_for_destructiveto ensure configuration changes require human approval. Set execution limits: max iterations, tool timeout (seconds per tool call), and total execution timeout. - Test the agent — Run the agent manually with a simulated event. Review the conversation log to verify the agent's reasoning and actions are correct.
- Enable for production — Toggle the agent to enabled state. It will now respond to configured triggers automatically.
When deploying a new agent, start with read-only tool permissions and destructive approval required. Monitor the agent's execution history for several cycles before expanding its capabilities.
Code Examples
Agent Definition: BGP Triage Agent
Here is a complete agent configuration for a BGP triage agent that automatically investigates BGP neighbor state changes:
{
"name": "BGP Triage Agent",
"description": "Automatically triages BGP neighbor state change alerts",
"agent_type": "protocol",
"protocol": "bgp",
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"system_prompt": "You are a BGP expert NOC agent. When triggered by a BGP alert, investigate the neighbor state, check for common issues (authentication, reachability, AS mismatch), and create an incident with your findings. Always check both sides of the peering if accessible.",
"max_iterations": 10,
"tool_timeout_seconds": 30,
"max_total_seconds": 300,
"require_approval_for_destructive": true,
"allowed_tool_categories": ["ssh_read", "knowledge_query", "incident_create"],
"auto_triage_enabled": true,
"auto_triage_alert_patterns": ["BGP.*neighbor.*down", "BGP.*state.*Active", "NOTIFICATION.*BGP"]
}ReAct Loop in Action
This shows the agent's reasoning and actions during an actual triage of a BGP neighbor down event:
[Trigger] Alert: BGP neighbor 10.0.2.2 (AS 65004) state changed to Active
on device core-rtr-01
[Iteration 1]
Think: BGP neighbor 10.0.2.2 changed to Active state. I need to check the
detailed neighbor status to understand why the session dropped.
Act: ssh core-rtr-01 "show ip bgp neighbor 10.0.2.2"
Result: BGP neighbor is 10.0.2.2, remote AS 65004
BGP state = Active
Last read 00:05:23, Last write 00:05:23
Last reset 00:05:23, due to BGP Notification sent
Notification sent: hold time expired
[Iteration 2]
Think: The session was reset due to hold time expiration. This typically
indicates a connectivity issue or the remote router stopped sending
keepalives. I should verify IP reachability.
Act: ssh core-rtr-01 "ping 10.0.2.2 source Loopback0 repeat 5"
Result: Success rate is 0 percent (0/5)
[Iteration 3]
Think: No IP reachability to 10.0.2.2. Let me check the route to this
neighbor to see if there is a routing issue.
Act: ssh core-rtr-01 "show ip route 10.0.2.2"
Result: 10.0.2.0/30 is directly connected, GigabitEthernet0/0/2
[Iteration 4]
Think: The route exists as directly connected via Gi0/0/2. Let me check
the interface status.
Act: ssh core-rtr-01 "show interface GigabitEthernet0/0/2 | include line protocol"
Result: GigabitEthernet0/0/2 is up, line protocol is up
[Iteration 5]
Think: Interface is up/up but pings fail. This could be an issue on the
remote side. Let me check for any error counters.
Act: ssh core-rtr-01 "show interface GigabitEthernet0/0/2 | include errors|CRC|drops"
Result: 0 input errors, 0 CRC, 0 frame
847 output errors, 0 collisions, 0 interface resets
Input queue: 0/75/0/0, Total output drops: 312
[Final Answer]
BGP neighbor 10.0.2.2 (AS 65004) is down due to hold time expiration.
Root cause: IP connectivity failure despite interface up/up status.
Evidence: Ping fails, 847 output errors and 312 output drops on Gi0/0/2.
Severity: HIGH - Possible physical layer or remote-side issue.
Recommendation: Check physical connectivity and remote device status.
Incident created: INC-2026-0342Tool Approval Flow
When an agent requests a destructive operation, execution pauses until approved:
[Iteration 6]
Think: The interface has high output errors. I should clear the counters
and monitor for new errors to determine if the issue is ongoing.
Act: ssh core-rtr-01 "clear counters GigabitEthernet0/0/2"
[APPROVAL REQUIRED]
Agent "BGP Triage Agent" is requesting to execute a write command:
Device: core-rtr-01
Command: clear counters GigabitEthernet0/0/2
Reason: Clear interface counters to monitor for ongoing errors
Status: Waiting for approval...
Approved by: admin@netstacks.io at 14:23:05 UTC
Result: Clear "show interface" counters on interface GigabitEthernet0/0/2 [OK]Execution History Query
Execution history can be filtered by agent, state, trigger type, and date range:
# List recent executions for the BGP Triage Agent
curl -s https://controller.example.com/api/v1/agents/executions \
-H "Authorization: Bearer $TOKEN" \
-G -d "agent_id=a1b2c3d4-..." \
-d "state=completed" \
-d "triggered_by=alert" \
-d "limit=10" | jq '.[] | {id, state, input_task, iterations_used, tool_calls_made}'
# Example output
{
"id": "exec-9f8e7d6c-...",
"state": "completed",
"input_task": "BGP neighbor 10.0.2.2 state changed to Active",
"iterations_used": 6,
"tool_calls_made": 5
}Questions & Answers
- What types of NOC agents are available?
- NetStacks provides three agent types: Triage agents perform initial event assessment and categorization, routing alerts to the right team with initial findings. Protocol-specific agents (BGP, OSPF, STP, MPLS) have deep expertise in a single protocol and can perform detailed investigation of protocol-related issues. Custom agents are user-defined and can be tailored to any workflow, such as post-maintenance validation, compliance checks, or capacity planning.
- What is the ReAct pattern?
- ReAct (Reasoning + Acting) is a structured execution pattern where the agent alternates between thinking and acting. In each iteration, the agent observes the current state, reasons about what to do next (visible in the conversation log), executes a tool action, and then observes the result. This loop continues until the agent has enough information to produce a final answer or reaches its configured iteration limit (
max_iterations). The pattern ensures transparency because the agent's reasoning is logged at every step. - How do tool approvals work?
- Each agent has a
require_approval_for_destructivesetting. When enabled, the agent can execute read-only commands (show, display, get) automatically, but any command that modifies configuration, shuts down interfaces, or reloads devices requires explicit human approval. When the agent requests a destructive command, execution pauses and a notification is sent to configured approvers. The agent resumes after approval or terminates if the request is denied. - Can agents execute commands on devices?
- Yes. Agents can execute SSH commands on managed devices as part of their investigation workflow. The
allowed_tool_categoriessetting controls which types of tools the agent can use, includingssh_read(show commands),ssh_write(configuration changes),knowledge_query(knowledge base search), andincident_create(incident management). Combined with the tool approval system, this provides controlled automation. - How do I view agent execution history?
- Navigate to AI > NOC Agents and select an agent to view its execution history. Each execution shows the trigger source, input task, full conversation log (the ReAct loop with reasoning and actions), tool calls made, iterations used, final answer, and execution state. You can filter by state (running, completed, failed, timeout), trigger type (alert, scheduled, manual), and date range.
- Can I create custom agent types?
- Yes. Select “Custom” as the agent type when creating a new agent. Custom agents accept a free-form system prompt that defines their behavior, expertise area, and approach. You can create agents for any workflow: post-maintenance validation, security compliance audits, capacity utilization checks, or multi-vendor configuration comparison. Custom agents have access to the same tools and approval system as built-in agent types.
Troubleshooting
| Issue | Possible Cause | Solution |
|---|---|---|
| Agent not triggering on alerts | Trigger patterns not matching or agent disabled | Verify the agent is enabled and auto_triage_enabled is set to true. Check that auto_triage_alert_patterns contain regex patterns that match your alert messages. Test patterns against sample alerts before deploying. |
| Agent stuck waiting for approval | Approval notification not received or no approver available | Check that approval notifications are configured correctly (email, webhook, or in-app). Verify approver users have the required permissions. Consider adjusting the tool timeout if approvals take longer than expected. |
| Agent making incorrect decisions | System prompt needs refinement or insufficient knowledge base | Review the agent's conversation log to understand its reasoning. Refine the system prompt with more specific instructions. Add relevant documentation to the Knowledge Base so the agent has organizational context for its decisions. |
| Agent timing out | Too many iterations or slow tool responses | Increase max_total_seconds if the investigation legitimately requires more time. Increase tool_timeout_seconds if individual commands are slow. Check network connectivity to managed devices. Reduce max_iterations if the agent is looping without progress. |
| Agent execution failed | LLM provider error or tool execution failure | Check the execution error message in the history view. Verify the LLM provider is reachable and the API key is valid. Check that the agent's configured model is available on the provider. Review device connectivity for SSH tool failures. |
Related Features
- AI Chat — Interactive AI assistance for manual network operations with credential sanitization and copilot panel.
- Knowledge Base — Build a searchable document repository that NOC Agents use for organizational context during investigations.
- LLM Configuration — Configure the AI providers and models that power NOC Agents, including model selection and token budgets.
- Methods of Procedure — Define structured maintenance procedures that agents can reference or execute as part of automated workflows.
- Task Monitoring — Monitor running agent tasks, view execution progress, and manage scheduled agent invocations.
- MCP Servers — Model Context Protocol servers that extend agent capabilities with additional tools and data sources.