Community Edition
This documentation covers STING Community Edition (CE) — the free, open-source deployment. Enterprise features available in STING Hive are noted where relevant but are not included in CE.Overview
STING CE provides AI capabilities through a set of loosely coupled services. LLM hosting (Ollama, LM Studio, vLLM, etc.) is external to the stack — configured during installation as either local or remote. STING services act as gateways and orchestrators.
┌──────────────────────────────────────────────────┐
│ Application Layer │
│ app · chatbot · knowledge · public-bee │
└──────────┬───────────────────────────────────────┘
│
┌──────────▼───────────────────────────────────────┐
│ External AI Service :8091 │
│ FastAPI gateway · ProviderRegistry │
└──────────┬───────────────────────────────────────┘
│
┌──────────▼───────────────────────────────────────┐
│ LLM Gateway Proxy :8085 │
│ Nginx reverse proxy · upstream failover │
└──────────┬───────────────────────────────────────┘
│
┌──────────▼───────────────────────────────────────┐
│ External LLM Providers │
│ Ollama · OpenAI · Anthropic · LM Studio · vLLM │
└──────────────────────────────────────────────────┘
External AI Service
The external AI service (container sting-ce-external-ai, port 8091) is a FastAPI application that acts as the unified LLM gateway for the entire platform.
ProviderRegistry
A singleton ProviderRegistry manages all configured LLM providers:
- Provider discovery — reads provider configuration from environment variables and Vault-stored API keys.
- Routing — directs requests to the appropriate provider based on model name or explicit provider selection.
- Streaming — supports streaming responses for real-time token delivery to the frontend.
- Error handling — catches provider-specific errors and returns normalized error responses.
Supported Providers
| Provider | Type | Notes |
|---|---|---|
| Ollama | Local | Default for self-hosted deployments; runs on same host or LAN |
| OpenAI | Cloud | GPT models via API key |
| Anthropic | Cloud | Claude models via API key |
| LM Studio | Local | OpenAI-compatible API on local machine |
| vLLM | Local/Remote | High-throughput serving for self-hosted models |
| MiniMax | Cloud | MiniMax models via API key |
API keys for cloud providers are stored in HashiCorp Vault and managed via sudo msting vault-secret.
Demo AI Service
The demo AI service (container sting-ce-demo-ai, port 8095) provides mock LLM responses for demonstrations and testing without requiring a real LLM provider.
LLM Gateway Proxy
The LLM gateway proxy (port 8085) is an Nginx reverse proxy that sits between the external AI service and the actual LLM providers. It provides:
- Upstream failover — if the primary LLM provider is unavailable, requests are routed to a configured backup.
- Streaming support —
proxy_buffering offensures tokens are delivered to clients as they are generated. - Long timeouts — 300-second proxy timeouts accommodate large model inference times.
- Connection pooling — persistent upstream connections reduce latency.
# Key proxy settings for LLM streaming
proxy_buffering off;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_connect_timeout 60s;
Knowledge Service (Honey Jars)
The knowledge service (container sting-ce-knowledge, port 8090) powers Honey Jars — STING’s knowledge base system. Built with FastAPI, it integrates PostgreSQL for metadata and ChromaDB for vector search.
Capabilities
- Document ingestion — accepts uploads (PDF, text, markdown, etc.), extracts content, and chunks it for embedding.
- Semantic search — queries are embedded and compared against stored vectors using cosine similarity in ChromaDB.
- Hybrid search — combines vector similarity with keyword matching for improved recall.
- Collection management — each Honey Jar maps to a ChromaDB collection with isolated search scope.
Search Flow
1. User submits query
2. Knowledge service embeds query (via external-ai-service)
3. ChromaDB returns top-k similar chunks
4. PostgreSQL enriches results with document metadata
5. Ranked results returned to caller
SearXNG — Web Research
SearXNG (container sting-ce-searxng, internal port 8080, not externally exposed) is a privacy-respecting metasearch engine used by the Bee chatbot for web research.
- No tracking — SearXNG does not log queries or share data with search engines.
- Aggregation — queries multiple search engines and deduplicates results.
- Internal only — accessible only from within the Docker network; not exposed to users directly.
Report Pipeline
STING CE generates structured reports through a multi-stage pipeline:
┌──────────┐ ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌─────────┐
│ Request │──▶│ Research │──▶│ LLM │──▶│ ReviewBee │──▶│ Render │
│Classify │ │ Gather │ │ Generate │ │ Quality │ │ PDF │
└──────────┘ └──────────┘ └───────────┘ └───────────┘ └─────────┘
| Stage | Service | Description |
|---|---|---|
| Request classification | app | Determines report type and required data sources |
| Research gathering | knowledge + searxng | Pulls relevant content from Honey Jars and optional web search |
| LLM generation | external-ai-service | Generates report content using the configured LLM provider |
| Quality review | report-bee | ReviewBee evaluates output quality and flags issues |
| PDF rendering | report-worker | Converts final content to formatted PDF output |
The report-worker (container sting-ce-report-worker) acts as a thin proxy to the Flask app for report generation tasks. The report-bee service performs automated quality review before final output.
Health Monitoring
Every STING CE service includes a Docker HEALTHCHECK directive:
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:<port>/health || exit 1
| Parameter | Value | Purpose |
|---|---|---|
interval | 30s | Time between health checks |
timeout | 10s | Maximum time for a health check response |
start-period | 40s | Grace period for container startup |
retries | 3 | Failures before marking unhealthy |
Docker Compose uses health status for:
- Startup ordering —
depends_onwithcondition: service_healthyensures dependencies are ready. - Restart policy — unhealthy containers are restarted automatically.
- Status reporting —
sudo msting statusshows health state of all services.
Service Startup Resilience
STING CE is designed to start reliably even in constrained environments:
Dependency Ordering
Services declare dependencies with health conditions:
app:
depends_on:
db:
condition: service_healthy
vault:
condition: service_healthy
redis:
condition: service_healthy
Retry Logic
Application services implement retry loops for transient failures during startup:
- Database connection retries with exponential backoff
- Vault unsealing detection with polling
- Redis connection retries
Graceful Degradation
If a non-critical service is unavailable, the platform continues operating with reduced functionality:
| Unavailable Service | Impact | Behavior |
|---|---|---|
| external-ai-service | No LLM responses | Chatbot returns “AI unavailable” message |
| knowledge service | No semantic search | Honey Jar listing still works; search disabled |
| searxng | No web research | Bee skips web sources; uses local knowledge only |
| redis | No caching | Requests go directly to PostgreSQL (slower) |
| mailpit | No dev emails | Auth flows that require email will fail in dev |
Resource Limits
Production deployments can set resource constraints via Docker Compose deploy configuration:
deploy:
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
These prevent any single service from consuming all host resources.
Hive-only features: Nectar Worker (autonomous bot management), Beeacon (observability and monitoring dashboards), and ChatOps connectors (Slack, Teams, Discord) are available in STING Hive but are not included in Community Edition.