Overview

STING CE provides AI capabilities through a set of loosely coupled services. LLM hosting (Ollama, LM Studio, vLLM, etc.) is external to the stack — configured during installation as either local or remote. STING services act as gateways and orchestrators.

┌──────────────────────────────────────────────────┐
│                 Application Layer                 │
│    app · chatbot · knowledge · public-bee         │
└──────────┬───────────────────────────────────────┘
┌──────────▼───────────────────────────────────────┐
│            External AI Service :8091              │
│     FastAPI gateway · ProviderRegistry            │
└──────────┬───────────────────────────────────────┘
┌──────────▼───────────────────────────────────────┐
│          LLM Gateway Proxy :8085                  │
│     Nginx reverse proxy · upstream failover       │
└──────────┬───────────────────────────────────────┘
┌──────────▼───────────────────────────────────────┐
│           External LLM Providers                  │
│  Ollama · OpenAI · Anthropic · LM Studio · vLLM  │
└──────────────────────────────────────────────────┘

External AI Service

The external AI service (container sting-ce-external-ai, port 8091) is a FastAPI application that acts as the unified LLM gateway for the entire platform.

ProviderRegistry

A singleton ProviderRegistry manages all configured LLM providers:

  • Provider discovery — reads provider configuration from environment variables and Vault-stored API keys.
  • Routing — directs requests to the appropriate provider based on model name or explicit provider selection.
  • Streaming — supports streaming responses for real-time token delivery to the frontend.
  • Error handling — catches provider-specific errors and returns normalized error responses.

Supported Providers

ProviderTypeNotes
OllamaLocalDefault for self-hosted deployments; runs on same host or LAN
OpenAICloudGPT models via API key
AnthropicCloudClaude models via API key
LM StudioLocalOpenAI-compatible API on local machine
vLLMLocal/RemoteHigh-throughput serving for self-hosted models
MiniMaxCloudMiniMax models via API key

API keys for cloud providers are stored in HashiCorp Vault and managed via sudo msting vault-secret.

Demo AI Service

The demo AI service (container sting-ce-demo-ai, port 8095) provides mock LLM responses for demonstrations and testing without requiring a real LLM provider.

LLM Gateway Proxy

The LLM gateway proxy (port 8085) is an Nginx reverse proxy that sits between the external AI service and the actual LLM providers. It provides:

  • Upstream failover — if the primary LLM provider is unavailable, requests are routed to a configured backup.
  • Streaming supportproxy_buffering off ensures tokens are delivered to clients as they are generated.
  • Long timeouts — 300-second proxy timeouts accommodate large model inference times.
  • Connection pooling — persistent upstream connections reduce latency.
# Key proxy settings for LLM streaming
proxy_buffering off;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_connect_timeout 60s;

Knowledge Service (Honey Jars)

The knowledge service (container sting-ce-knowledge, port 8090) powers Honey Jars — STING’s knowledge base system. Built with FastAPI, it integrates PostgreSQL for metadata and ChromaDB for vector search.

Capabilities

  • Document ingestion — accepts uploads (PDF, text, markdown, etc.), extracts content, and chunks it for embedding.
  • Semantic search — queries are embedded and compared against stored vectors using cosine similarity in ChromaDB.
  • Hybrid search — combines vector similarity with keyword matching for improved recall.
  • Collection management — each Honey Jar maps to a ChromaDB collection with isolated search scope.

Search Flow

1. User submits query
2. Knowledge service embeds query (via external-ai-service)
3. ChromaDB returns top-k similar chunks
4. PostgreSQL enriches results with document metadata
5. Ranked results returned to caller

SearXNG — Web Research

SearXNG (container sting-ce-searxng, internal port 8080, not externally exposed) is a privacy-respecting metasearch engine used by the Bee chatbot for web research.

  • No tracking — SearXNG does not log queries or share data with search engines.
  • Aggregation — queries multiple search engines and deduplicates results.
  • Internal only — accessible only from within the Docker network; not exposed to users directly.

Report Pipeline

STING CE generates structured reports through a multi-stage pipeline:

┌──────────┐   ┌──────────┐   ┌───────────┐   ┌───────────┐   ┌─────────┐
│ Request  │──▶│ Research │──▶│    LLM    │──▶│ ReviewBee │──▶│  Render │
│Classify  │   │ Gather   │   │ Generate  │   │  Quality  │   │   PDF   │
└──────────┘   └──────────┘   └───────────┘   └───────────┘   └─────────┘
StageServiceDescription
Request classificationappDetermines report type and required data sources
Research gatheringknowledge + searxngPulls relevant content from Honey Jars and optional web search
LLM generationexternal-ai-serviceGenerates report content using the configured LLM provider
Quality reviewreport-beeReviewBee evaluates output quality and flags issues
PDF renderingreport-workerConverts final content to formatted PDF output

The report-worker (container sting-ce-report-worker) acts as a thin proxy to the Flask app for report generation tasks. The report-bee service performs automated quality review before final output.

Health Monitoring

Every STING CE service includes a Docker HEALTHCHECK directive:

HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:<port>/health || exit 1
ParameterValuePurpose
interval30sTime between health checks
timeout10sMaximum time for a health check response
start-period40sGrace period for container startup
retries3Failures before marking unhealthy

Docker Compose uses health status for:

  • Startup orderingdepends_on with condition: service_healthy ensures dependencies are ready.
  • Restart policy — unhealthy containers are restarted automatically.
  • Status reportingsudo msting status shows health state of all services.

Service Startup Resilience

STING CE is designed to start reliably even in constrained environments:

Dependency Ordering

Services declare dependencies with health conditions:

app:
  depends_on:
    db:
      condition: service_healthy
    vault:
      condition: service_healthy
    redis:
      condition: service_healthy

Retry Logic

Application services implement retry loops for transient failures during startup:

  • Database connection retries with exponential backoff
  • Vault unsealing detection with polling
  • Redis connection retries

Graceful Degradation

If a non-critical service is unavailable, the platform continues operating with reduced functionality:

Unavailable ServiceImpactBehavior
external-ai-serviceNo LLM responsesChatbot returns “AI unavailable” message
knowledge serviceNo semantic searchHoney Jar listing still works; search disabled
searxngNo web researchBee skips web sources; uses local knowledge only
redisNo cachingRequests go directly to PostgreSQL (slower)
mailpitNo dev emailsAuth flows that require email will fail in dev

Resource Limits

Production deployments can set resource constraints via Docker Compose deploy configuration:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: '0.5'
    reservations:
      memory: 256M

These prevent any single service from consuming all host resources.

Hive-only features: Nectar Worker (autonomous bot management), Beeacon (observability and monitoring dashboards), and ChatOps connectors (Slack, Teams, Discord) are available in STING Hive but are not included in Community Edition.

Last updated: