External AI Gateway
STING uses an External AI Gateway (external_ai_service) as the central interface between the platform and language models. Instead of connecting directly to Ollama, OpenAI, or other providers, all AI requests route through this gateway — providing unified model management, failover, and provider abstraction.
Architecture
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Bee Chat │ │ Reports │ │ Knowledge │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
└───────────────┼───────────────┘
│
┌───────▼────────┐
│ External AI │ ← Provider Registry
│ Gateway │ ← Model routing
│ (port 8091) │ ← Health checks
└───────┬────────┘
│
┌──────────┼──────────┐
│ │ │
┌──────▼──┐ ┌─────▼────┐ ┌──▼───────┐
│ Ollama │ │ MiniMax │ │ OpenAI │
│ (local) │ │ (cloud) │ │ (cloud) │
└─────────┘ └──────────┘ └──────────┘
Key Concepts
- Provider Registry — singleton that manages all configured LLM providers, their endpoints, API keys, and model lists
- Nginx LLM Proxy —
nginx-llm-proxy.confprovides upstream failover between providers with streaming support (proxy_buffering off, 300s timeouts) - Gateway endpoints — unified API at
/api/external-ai/*that the frontend and other services call - No direct LLM coupling — services never call Ollama/OpenAI directly; they go through the gateway
Supported Providers
| Provider | Type | Configuration |
|---|---|---|
| Ollama | Local or remote | Self-hosted, any Ollama-compatible endpoint |
| MiniMax | Cloud API | API key required |
| OpenAI | Cloud API | API key required |
| Anthropic | Cloud API | API key required |
| vLLM | Local or remote | OpenAI-compatible endpoint |
| LM Studio | Local | OpenAI-compatible endpoint |
Configuration
config.yml
The primary LLM configuration lives in conf/config.yml:
ai:
# Primary provider
provider: minimax # or: ollama, openai, anthropic, vllm
# Ollama / local LLM settings
ollama:
host: dev-ubuntu.tail4e263b.ts.net # Hostname or IP
port: 11434
model: llama3.1:8b # Default model
# Cloud provider API keys (stored in Vault)
minimax:
api_key: vault:sting/minimax # Vault path
model: MiniMax-Text-01
openai:
api_key: vault:sting/openai
model: gpt-4
Vault-Managed API Keys
API keys are stored securely in HashiCorp Vault, not in config files:
# Store an API key
sudo msting vault-secret openai sk-your-api-key-here
# Store MiniMax key
sudo msting vault-secret minimax your-minimax-key
# List stored providers
sudo msting vault-secret list
Environment Variables
The gateway reads from env/external_ai.env:
| Variable | Description |
|---|---|
AI_PROVIDER | Primary provider (ollama, minimax, openai) |
OLLAMA_HOST | Ollama server hostname |
OLLAMA_PORT | Ollama server port (default: 11434) |
OPENAI_API_KEY | OpenAI API key (from Vault) |
MINIMAX_API_KEY | MiniMax API key (from Vault) |
LLM_TIMEOUT | Request timeout in seconds (default: 300) |
Gateway API Endpoints
All endpoints are prefixed with /api/external-ai/:
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Gateway health and provider status |
/models | GET | List available models across all providers |
/generate | POST | Generate text (streaming supported) |
/chat | POST | Chat completion |
/embeddings | POST | Generate embeddings for knowledge sync |
/pull | POST | Pull a model (Ollama only) |
/restart | POST | Restart the gateway service |
Example: Check Gateway Health
curl -s https://localhost:5050/api/external-ai/health | python3 -m json.tool
{
"status": "ready",
"provider": "minimax",
"models_available": 3,
"ollama_reachable": true
}
Example: List Models
curl -s https://localhost:5050/api/external-ai/models
Nginx LLM Proxy
The nginx-llm-proxy.conf provides load balancing and failover between LLM backends:
upstream llm_backend {
server ollama-host:11434;
server minimax-gateway:8091 backup;
}
Key settings:
- Streaming:
proxy_buffering offfor real-time token streaming - Timeouts: 300s for long-running generation requests
- Failover: Automatic fallback if primary provider is unavailable
Using Tailscale for Remote LLM
If your LLM server (e.g., Ollama) runs on a different machine, Tailscale Magic DNS provides a stable hostname that survives IP changes:
ai:
ollama:
host: your-machine.tailnet-name.ts.net
port: 11434
This is preferred over raw Tailscale IPs (100.x.x.x) which can change when devices reconnect.
Model Management
Pulling Models (Ollama)
From the STING admin UI (Bee Settings page), or via CLI:
# On the Ollama host
ollama pull llama3.1:8b
ollama pull nomic-embed-text # For embeddings
Switching Providers
Update config.yml and regenerate:
sudo msting regenerate-env
sudo msting restart external-ai
Troubleshooting
Gateway reports “No providers available”
- Check if the LLM host is reachable:
curl http://ollama-host:11434/api/tags - Verify API keys are in Vault:
sudo msting vault-secret list - Check gateway logs:
sudo docker logs sting-ce-external-ai --tail 50
Slow responses
- Check if the model is loaded: first request after idle may take 30-60s to load
- Verify network latency to remote LLM hosts
- Consider using a smaller model for faster responses
Streaming not working
Ensure the nginx LLM proxy has proxy_buffering off and the client supports SSE (Server-Sent Events).