LLM Configuration

Set up AI providers (OpenAI, Anthropic, Ollama), configure BYOK models, manage token budgets, and deploy self-hosted LLMs.

Overview

LLM Configuration controls how NetStacks connects to AI language model providers. It covers setting up providers, managing API keys securely, selecting models for different AI features, configuring token budgets to control costs, and deploying self-hosted models for maximum privacy.

Supported Providers

Provider	Models	API Key Required	Notes
OpenAI	GPT-4o, GPT-4, GPT-4o-mini	Yes	Widely available, fast inference
Anthropic	Claude Sonnet, Claude Opus	Yes	Strong reasoning, recommended for agents
Ollama	Llama, Mistral, Qwen, and others	No	Self-hosted, data never leaves your network
OpenRouter	Access to many models via unified API	Yes	OpenAI-compatible API, model marketplace
Custom	Any OpenAI-compatible endpoint	Varies	Azure OpenAI, vLLM, text-generation-inference

BYOK Model

NetStacks uses a BYOK (Bring Your Own Key) model. You provide your own API keys for cloud providers, giving you full control over costs, rate limits, and data handling. API keys are stored encrypted in the credential vault and are never exposed to Terminal clients.

How It Works

Provider Abstraction Layer

NetStacks provides a unified interface across all LLM providers. AI features (Chat, Suggestions, Agents, Knowledge Base embeddings) interact with a single API, and the provider abstraction layer routes requests to the configured provider. This means you can switch providers without changing any other configuration.

Provider Configuration

Each provider configuration includes:

Provider Type — The provider platform (OpenAI, Anthropic, Ollama, OpenRouter, or Custom).
Display Name — A human-readable name for identification (e.g., “OpenAI Production”, “Local Ollama”).
API Key — Stored as an encrypted credential reference. The actual key is never exposed in API responses. Ollama without authentication does not require a key.
Base URL — Custom API endpoint for Ollama, Azure OpenAI, or other self-hosted deployments.
Default Model — The model used when no specific model is requested (e.g., gpt-4o, claude-sonnet-4-20250514, llama3.3).
Priority — Failover ordering. Lower priority numbers are tried first. If the primary provider is unavailable, requests automatically route to the next enabled provider.
Model Parameters — Max input tokens, max output tokens, temperature (0.0–2.0), and top_p for nucleus sampling.

Token Budget System

The token budget system helps control AI costs by setting monthly limits on token consumption:

Soft Limit — A warning threshold. When usage exceeds the soft limit, administrators receive a notification but AI features continue to operate.
Hard Limit — A strict cap. When usage reaches the hard limit, AI features are disabled until the next billing period or the limit is raised.
Warning Threshold — A percentage of the hard limit (default 80%) at which a warning notification is sent.
Usage Tracking — Every LLM request logs input tokens, output tokens, estimated cost, provider, model, feature (chat, suggestions, agents), and user. Usage can be viewed as time-series charts, broken down by feature, user, or provider.
Projection — The system projects month-end usage based on current consumption rate, helping administrators anticipate budget overruns before they happen.

Model Routing

Different AI features have different requirements. You can assign different models to different features to optimize cost and performance:

Feature	Recommended Model	Reason
Command Suggestions	GPT-4o-mini / Llama 3.3	Speed matters more than reasoning depth
AI Chat	GPT-4o / Claude Sonnet	Balance of speed and accuracy
NOC Agents	Claude Sonnet / GPT-4o	Complex reasoning and multi-step investigation
Knowledge Base Embeddings	text-embedding-3-small	Optimized for vector search, low cost

Configuring LLM Providers

Follow these steps to configure LLM providers for your NetStacks deployment.

Terminal (Standalone Mode)

Open Settings > AI Assistant — Access the AI configuration from the Terminal's settings panel.
Select provider — Choose from OpenAI, Anthropic, Ollama, OpenRouter, or Custom.
Enter API key — Paste your API key. For Ollama, enter the local endpoint URL instead (e.g., http://localhost:11434).
Choose model — Select the default model for AI features.
Test connection — Click “Test Connection” to verify the provider is reachable and the API key is valid.
Save — Save the configuration. AI features are now active.

Controller (Enterprise Mode)

Navigate to Admin > AI Configuration — Open the AI management page in the Controller admin UI.
Add LLM provider — Click “Add Provider” and select the provider type.
Configure connection — Enter the API key (stored encrypted in the credential vault), set the base URL if using a custom endpoint, and select the default model.
Set priority — Assign a priority number for failover ordering (lower = higher priority). Configure multiple providers for automatic failover.
Test connection — Verify the provider is reachable and the API key is valid.
Configure token budgets — Set monthly soft and hard limits for token consumption. Configure the warning threshold percentage.
Assign models to features — Optionally assign different models to different AI features (chat, suggestions, agents, embeddings) to optimize cost and performance.

Self-Hosted with Ollama

Install Ollama — Follow the Ollama installation guide for your platform.
Download a model — Run ollama pull llama3.3 to download a model.
Configure in NetStacks — Add Ollama as a provider with the base URL set to your Ollama instance (e.g., http://localhost:11434 or http://ollama.internal:11434).
Set default model — Enter the model name as shown by ollama list (e.g., llama3.3, mistral).

Enterprise Privacy

In Enterprise mode, API keys are stored encrypted in the Controller's credential vault and are never sent to Terminal clients. All AI requests are proxied through the Controller, ensuring API keys remain secure and centralized.

Code Examples

OpenAI Provider Configuration

openai-provider.jsonjson

{
  "provider_type": "openai",
  "name": "OpenAI Production",
  "enabled": true,
  "priority": 1,
  "api_key_credential_id": "cred-a1b2c3d4-...",
  "default_model": "gpt-4o",
  "max_input_tokens": 128000,
  "max_output_tokens": 4096,
  "temperature": 0.7,
  "top_p": 0.9
}

Anthropic Provider Configuration

anthropic-provider.jsonjson

{
  "provider_type": "anthropic",
  "name": "Anthropic - Agent Reasoning",
  "enabled": true,
  "priority": 2,
  "api_key_credential_id": "cred-e5f6g7h8-...",
  "default_model": "claude-sonnet-4-20250514",
  "max_input_tokens": 200000,
  "max_output_tokens": 8192,
  "temperature": 0.3
}

Ollama Self-Hosted Configuration

ollama-provider.jsonjson

{
  "provider_type": "ollama",
  "name": "Local Ollama - Privacy Mode",
  "enabled": true,
  "priority": 3,
  "base_url": "http://ollama.internal:11434",
  "default_model": "llama3.3",
  "max_output_tokens": 4096,
  "temperature": 0.7
}

Token Budget Configuration

token-budget.jsonjson

{
  "monthly_soft_limit_tokens": 5000000,
  "monthly_hard_limit_tokens": 10000000,
  "warning_threshold_percent": 80
}

// Budget status response
{
  "current_usage_tokens": 3250000,
  "soft_limit_tokens": 5000000,
  "hard_limit_tokens": 10000000,
  "usage_percent": 32.5,
  "status": "ok",
  "projected_month_end_tokens": 7800000
}

Usage Analytics by Feature

usage-analytics.shbash

# Get token usage breakdown by feature for the current month
curl -s https://controller.example.com/api/v1/ai/usage/by-feature \
  -H "Authorization: Bearer $TOKEN" \
  -G -d "start=2026-03-01T00:00:00Z" \
  -d "end=2026-04-01T00:00:00Z" | jq '.'

# Example response
[
  {
    "feature": "chat",
    "total_tokens": 1850000,
    "total_cost_cents": 925,
    "request_count": 342
  },
  {
    "feature": "agents",
    "total_tokens": 980000,
    "total_cost_cents": 490,
    "request_count": 45
  },
  {
    "feature": "suggestions",
    "total_tokens": 320000,
    "total_cost_cents": 32,
    "request_count": 1205
  },
  {
    "feature": "embeddings",
    "total_tokens": 100000,
    "total_cost_cents": 1,
    "request_count": 89
  }
]

Multi-Provider Failover Setup

failover-setup.shbash

# Configure primary provider (priority 1)
curl -X POST https://controller.example.com/api/v1/ai/providers \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_type": "anthropic",
    "name": "Anthropic Primary",
    "priority": 1,
    "api_key_credential_id": "cred-primary-...",
    "default_model": "claude-sonnet-4-20250514"
  }'

# Configure fallback provider (priority 2)
curl -X POST https://controller.example.com/api/v1/ai/providers \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_type": "openai",
    "name": "OpenAI Fallback",
    "priority": 2,
    "api_key_credential_id": "cred-fallback-...",
    "default_model": "gpt-4o"
  }'

# If Anthropic is unavailable, requests automatically route to OpenAI

Questions & Answers

Which LLM providers does NetStacks support?: NetStacks supports five provider types: OpenAI (GPT-4o, GPT-4, GPT-4o-mini), Anthropic (Claude Sonnet, Claude Opus), Ollama (Llama, Mistral, Qwen, and any locally hosted model), OpenRouter (access to many models via a unified OpenAI-compatible API), and Custom (any OpenAI-compatible endpoint including Azure OpenAI, vLLM, and text-generation-inference). Multiple providers can be configured simultaneously with priority-based failover.
What is BYOK (Bring Your Own Key)?: BYOK means you provide your own API keys for cloud LLM providers. NetStacks does not include any bundled AI credits or proxy your requests through its own infrastructure. You create API keys directly with OpenAI, Anthropic, or other providers and enter them into NetStacks. This gives you full control over costs, rate limits, usage policies, and data handling agreements with your provider.
Can I use a self-hosted LLM?: Yes. Configure Ollama as your LLM provider and point it to your locally hosted instance. When using Ollama, your data never leaves your network because all inference happens on your own hardware. You can also use the Custom provider type to connect to any OpenAI-compatible endpoint, including vLLM, text-generation-inference, or Azure OpenAI deployed in your own cloud subscription.
How do token budgets work?: Token budgets set monthly limits on LLM token consumption. You configure a soft limit (warning only, AI continues) and a hard limit (AI features disabled when exceeded). A warning threshold (default 80% of hard limit) triggers a notification before you reach the soft limit. The system tracks usage per request with input/output token counts and estimated costs, and projects month-end usage based on current consumption rate.
Which model should I use for each feature?: For Command Suggestions, use a fast, smaller model (GPT-4o-mini, Llama 3.3) because speed matters more than reasoning depth. For AI Chat, use a balanced model (GPT-4o, Claude Sonnet) that provides good accuracy with reasonable response times. For NOC Agents, use a powerful reasoning model (Claude Sonnet, GPT-4o) because agents need to perform multi-step analysis. For embeddings, use an embedding-specific model (text-embedding-3-small) optimized for vector search.
Is my data sent to external AI providers?: When using cloud providers (OpenAI, Anthropic, OpenRouter), your prompts and terminal context are sent to the provider's API. However, all data passes through the credential sanitization pipeline first, which removes passwords, API keys, SNMP communities, and other sensitive information. For maximum privacy, use Ollama with a locally hosted model so no data leaves your network. In Enterprise mode, the Controller proxies all AI requests, ensuring Terminal clients never communicate directly with external providers.

Troubleshooting

Issue	Possible Cause	Solution
Provider connection failing	Invalid API key, wrong endpoint, or firewall blocking	Use the “Test Connection” button to diagnose. Verify the API key has not expired or been revoked. For custom endpoints, confirm the base URL is correct and accessible from the Controller. Check firewall rules for outbound HTTPS to the provider's API endpoint.
Token budget exceeded	Monthly usage hit the hard limit	Check the budget status page for current usage and projected month-end. Increase the hard limit if budget allows, or wait for the next month to reset. Review usage by feature and user to identify the highest consumers. Consider using a cheaper model for high-volume features like suggestions.
Ollama not responding	Service not running or model not loaded	Verify the Ollama service is running: `ollama list` should show available models. If the model is not listed, run `ollama pull <model>`. Check that the base URL in NetStacks matches the Ollama listen address. On first request, Ollama loads the model into memory which can take 30–60 seconds.
Slow AI responses	Large model, insufficient hardware, or network latency	For cloud providers, check the provider's status page for outages. For Ollama, ensure the host has sufficient RAM and GPU for the model size. Consider using a smaller model for latency-sensitive features. Review max_output_tokens setting — lower limits produce faster responses.
Failover not working	Backup provider not enabled or priority misconfigured	Verify that backup providers are enabled and have valid API keys. Check priority ordering — lower numbers are tried first. Test each provider individually to confirm they work. Failover triggers on connection errors and timeouts, not on content-level errors.

AI Chat — Interactive AI assistance that uses the configured LLM providers for conversational network operations support.
NOC Agents — Autonomous agents that use configured LLM providers for ReAct reasoning during network event investigation.
Knowledge Base — Uses embedding models from configured providers to generate vector embeddings for document search.
Command Suggestions — Context-aware command autocomplete powered by the configured LLM provider and model.
System Settings — Global system configuration including credential vault settings where API keys are stored encrypted.