AI Services

Community Edition

This documentation covers STING Community Edition (CE) — the free, open-source deployment. Enterprise features available in STING Hive are noted where relevant but are not included in CE.

Overview

STING CE provides AI capabilities through a set of loosely coupled services. LLM hosting (Ollama, LM Studio, vLLM, etc.) is external to the stack — configured during installation as either local or remote. STING services act as gateways and orchestrators.

┌──────────────────────────────────────────────────┐
│                 Application Layer                 │
│    app · chatbot · knowledge · public-bee         │
└──────────┬───────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────┐
│            External AI Service :8091              │
│     FastAPI gateway · ProviderRegistry            │
└──────────┬───────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────┐
│          LLM Gateway Proxy :8085                  │
│     Nginx reverse proxy · upstream failover       │
└──────────┬───────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────┐
│           External LLM Providers                  │
│  Ollama · OpenAI · Anthropic · LM Studio · vLLM  │
└──────────────────────────────────────────────────┘

External AI Service

The external AI service (container sting-ce-external-ai, port 8091) is a FastAPI application that acts as the unified LLM gateway for the entire platform.

ProviderRegistry

A singleton ProviderRegistry manages all configured LLM providers:

Provider discovery — reads provider configuration from environment variables and Vault-stored API keys.
Routing — directs requests to the appropriate provider based on model name or explicit provider selection.
Streaming — supports streaming responses for real-time token delivery to the frontend.
Error handling — catches provider-specific errors and returns normalized error responses.

Supported Providers

Provider	Type	Notes
Ollama	Local	Default for self-hosted deployments; runs on same host or LAN
OpenAI	Cloud	GPT models via API key
Anthropic	Cloud	Claude models via API key
LM Studio	Local	OpenAI-compatible API on local machine
vLLM	Local/Remote	High-throughput serving for self-hosted models
MiniMax	Cloud	MiniMax models via API key

API keys for cloud providers are stored in HashiCorp Vault and managed via sudo msting vault-secret.

Demo AI Service

The demo AI service (container sting-ce-demo-ai, port 8095) provides mock LLM responses for demonstrations and testing without requiring a real LLM provider.

LLM Gateway Proxy

The LLM gateway proxy (port 8085) is an Nginx reverse proxy that sits between the external AI service and the actual LLM providers. It provides:

Upstream failover — if the primary LLM provider is unavailable, requests are routed to a configured backup.
Streaming support — proxy_buffering off ensures tokens are delivered to clients as they are generated.
Long timeouts — 300-second proxy timeouts accommodate large model inference times.
Connection pooling — persistent upstream connections reduce latency.

# Key proxy settings for LLM streaming
proxy_buffering off;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_connect_timeout 60s;

Knowledge Service (Honey Jars)

The knowledge service (container sting-ce-knowledge, port 8090) powers Honey Jars — STING’s knowledge base system. Built with FastAPI, it integrates PostgreSQL for metadata and ChromaDB for vector search.

Capabilities

Document ingestion — accepts uploads (PDF, text, markdown, etc.), extracts content, and chunks it for embedding.
Semantic search — queries are embedded and compared against stored vectors using cosine similarity in ChromaDB.
Hybrid search — combines vector similarity with keyword matching for improved recall.
Collection management — each Honey Jar maps to a ChromaDB collection with isolated search scope.

Search Flow

1. User submits query
2. Knowledge service embeds query (via external-ai-service)
3. ChromaDB returns top-k similar chunks
4. PostgreSQL enriches results with document metadata
5. Ranked results returned to caller

SearXNG — Web Research

SearXNG (container sting-ce-searxng, internal port 8080, not externally exposed) is a privacy-respecting metasearch engine used by the Bee chatbot for web research.

No tracking — SearXNG does not log queries or share data with search engines.
Aggregation — queries multiple search engines and deduplicates results.
Internal only — accessible only from within the Docker network; not exposed to users directly.

Report Pipeline

STING CE generates structured reports through a multi-stage pipeline:

┌──────────┐   ┌──────────┐   ┌───────────┐   ┌───────────┐   ┌─────────┐
│ Request  │──▶│ Research │──▶│    LLM    │──▶│ ReviewBee │──▶│  Render │
│Classify  │   │ Gather   │   │ Generate  │   │  Quality  │   │   PDF   │
└──────────┘   └──────────┘   └───────────┘   └───────────┘   └─────────┘

Stage	Service	Description
Request classification	app	Determines report type and required data sources
Research gathering	knowledge + searxng	Pulls relevant content from Honey Jars and optional web search
LLM generation	external-ai-service	Generates report content using the configured LLM provider
Quality review	report-bee	ReviewBee evaluates output quality and flags issues
PDF rendering	report-worker	Converts final content to formatted PDF output

The report-worker (container sting-ce-report-worker) acts as a thin proxy to the Flask app for report generation tasks. The report-bee service performs automated quality review before final output.

Health Monitoring

Every STING CE service includes a Docker HEALTHCHECK directive:

HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:<port>/health || exit 1

Parameter	Value	Purpose
`interval`	30s	Time between health checks
`timeout`	10s	Maximum time for a health check response
`start-period`	40s	Grace period for container startup
`retries`	3	Failures before marking unhealthy

Docker Compose uses health status for:

Startup ordering — depends_on with condition: service_healthy ensures dependencies are ready.
Restart policy — unhealthy containers are restarted automatically.
Status reporting — sudo msting status shows health state of all services.

Service Startup Resilience

STING CE is designed to start reliably even in constrained environments:

Dependency Ordering

Services declare dependencies with health conditions:

app:
  depends_on:
    db:
      condition: service_healthy
    vault:
      condition: service_healthy
    redis:
      condition: service_healthy

Retry Logic

Application services implement retry loops for transient failures during startup:

Database connection retries with exponential backoff
Vault unsealing detection with polling
Redis connection retries

Graceful Degradation

If a non-critical service is unavailable, the platform continues operating with reduced functionality:

Unavailable Service	Impact	Behavior
external-ai-service	No LLM responses	Chatbot returns “AI unavailable” message
knowledge service	No semantic search	Honey Jar listing still works; search disabled
searxng	No web research	Bee skips web sources; uses local knowledge only
redis	No caching	Requests go directly to PostgreSQL (slower)
mailpit	No dev emails	Auth flows that require email will fail in dev

Resource Limits

Production deployments can set resource constraints via Docker Compose deploy configuration:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: '0.5'
    reservations:
      memory: 256M

These prevent any single service from consuming all host resources.

Hive-only features: Nectar Worker (autonomous bot management), Beeacon (observability and monitoring dashboards), and ChatOps connectors (Slack, Teams, Discord) are available in STING Hive but are not included in Community Edition.

Last updated: March 29, 2026