External AI Gateway

STING uses an External AI Gateway (external_ai_service) as the central interface between the platform and language models. Instead of connecting directly to Ollama, OpenAI, or other providers, all AI requests route through this gateway — providing unified model management, failover, and provider abstraction.

Architecture

┌────────────┐  ┌────────────┐  ┌────────────┐
│  Bee Chat  │  │  Reports   │  │  Knowledge │
└─────┬──────┘  └─────┬──────┘  └─────┬──────┘
      │               │               │
      └───────────────┼───────────────┘
                      │
              ┌───────▼────────┐
              │  External AI   │  ← Provider Registry
              │   Gateway      │  ← Model routing
              │  (port 8091)   │  ← Health checks
              └───────┬────────┘
                      │
           ┌──────────┼──────────┐
           │          │          │
    ┌──────▼──┐ ┌─────▼────┐ ┌──▼───────┐
    │ Ollama  │ │ MiniMax  │ │ OpenAI   │
    │ (local) │ │ (cloud)  │ │ (cloud)  │
    └─────────┘ └──────────┘ └──────────┘

Key Concepts

  • Provider Registry — singleton that manages all configured LLM providers, their endpoints, API keys, and model lists
  • Nginx LLM Proxynginx-llm-proxy.conf provides upstream failover between providers with streaming support (proxy_buffering off, 300s timeouts)
  • Gateway endpoints — unified API at /api/external-ai/* that the frontend and other services call
  • No direct LLM coupling — services never call Ollama/OpenAI directly; they go through the gateway

Supported Providers

ProviderTypeConfiguration
OllamaLocal or remoteSelf-hosted, any Ollama-compatible endpoint
MiniMaxCloud APIAPI key required
OpenAICloud APIAPI key required
AnthropicCloud APIAPI key required
vLLMLocal or remoteOpenAI-compatible endpoint
LM StudioLocalOpenAI-compatible endpoint

Configuration

config.yml

The primary LLM configuration lives in conf/config.yml:

ai:
  # Primary provider
  provider: minimax          # or: ollama, openai, anthropic, vllm
  
  # Ollama / local LLM settings
  ollama:
    host: dev-ubuntu.tail4e263b.ts.net  # Hostname or IP
    port: 11434
    model: llama3.1:8b                   # Default model
    
  # Cloud provider API keys (stored in Vault)
  minimax:
    api_key: vault:sting/minimax         # Vault path
    model: MiniMax-Text-01
    
  openai:
    api_key: vault:sting/openai
    model: gpt-4

Vault-Managed API Keys

API keys are stored securely in HashiCorp Vault, not in config files:

# Store an API key
sudo msting vault-secret openai sk-your-api-key-here

# Store MiniMax key
sudo msting vault-secret minimax your-minimax-key

# List stored providers
sudo msting vault-secret list

Environment Variables

The gateway reads from env/external_ai.env:

VariableDescription
AI_PROVIDERPrimary provider (ollama, minimax, openai)
OLLAMA_HOSTOllama server hostname
OLLAMA_PORTOllama server port (default: 11434)
OPENAI_API_KEYOpenAI API key (from Vault)
MINIMAX_API_KEYMiniMax API key (from Vault)
LLM_TIMEOUTRequest timeout in seconds (default: 300)

Gateway API Endpoints

All endpoints are prefixed with /api/external-ai/:

EndpointMethodDescription
/healthGETGateway health and provider status
/modelsGETList available models across all providers
/generatePOSTGenerate text (streaming supported)
/chatPOSTChat completion
/embeddingsPOSTGenerate embeddings for knowledge sync
/pullPOSTPull a model (Ollama only)
/restartPOSTRestart the gateway service

Example: Check Gateway Health

curl -s https://localhost:5050/api/external-ai/health | python3 -m json.tool
{
  "status": "ready",
  "provider": "minimax",
  "models_available": 3,
  "ollama_reachable": true
}

Example: List Models

curl -s https://localhost:5050/api/external-ai/models

Nginx LLM Proxy

The nginx-llm-proxy.conf provides load balancing and failover between LLM backends:

upstream llm_backend {
    server ollama-host:11434;
    server minimax-gateway:8091 backup;
}

Key settings:

  • Streaming: proxy_buffering off for real-time token streaming
  • Timeouts: 300s for long-running generation requests
  • Failover: Automatic fallback if primary provider is unavailable

Using Tailscale for Remote LLM

If your LLM server (e.g., Ollama) runs on a different machine, Tailscale Magic DNS provides a stable hostname that survives IP changes:

ai:
  ollama:
    host: your-machine.tailnet-name.ts.net
    port: 11434

This is preferred over raw Tailscale IPs (100.x.x.x) which can change when devices reconnect.

Model Management

Pulling Models (Ollama)

From the STING admin UI (Bee Settings page), or via CLI:

# On the Ollama host
ollama pull llama3.1:8b
ollama pull nomic-embed-text    # For embeddings

Switching Providers

Update config.yml and regenerate:

sudo msting regenerate-env
sudo msting restart external-ai

Troubleshooting

Gateway reports “No providers available”

  1. Check if the LLM host is reachable: curl http://ollama-host:11434/api/tags
  2. Verify API keys are in Vault: sudo msting vault-secret list
  3. Check gateway logs: sudo docker logs sting-ce-external-ai --tail 50

Slow responses

  • Check if the model is loaded: first request after idle may take 30-60s to load
  • Verify network latency to remote LLM hosts
  • Consider using a smaller model for faster responses

Streaming not working

Ensure the nginx LLM proxy has proxy_buffering off and the client supports SSE (Server-Sent Events).

Last updated: