Observability & Monitoring

Set up and use the Beeacon observability stack — Grafana dashboards, Loki log aggregation, and Promtail log collection with built-in PII sanitization.

Beeacon Observability Stack

STING includes a built-in observability stack called Beeacon — named after the bee navigation concept. It provides centralized log aggregation, real-time dashboards, and privacy-aware log collection for your entire STING deployment.

Architecture

The Beeacon stack consists of four components, all running as Docker containers alongside the main STING services:

ComponentImagePurpose
Lokigrafana/loki:3.0.0Log aggregation and storage engine
PromtailCustom (based on grafana/promtail:3.0.0)Log collector with PII sanitization pipeline
Grafanagrafana/grafana:11.0.0Dashboard visualization and querying
Log Forwarderalpine:3.18Streams container logs to files for Promtail
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  STING App  │    │   Kratos    │    │  Knowledge  │
│  Chatbot    │    │   Vault     │    │  ChromaDB   │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │
       └──────────────────┼──────────────────┘
                          │ Docker logs
                   ┌──────▼──────┐
                   │  Promtail   │  ← PII sanitization
                   │  (collect)  │
                   └──────┬──────┘
                          │ Push
                   ┌──────▼──────┐
                   │    Loki     │  ← 7-day retention
                   │  (storage)  │
                   └──────┬──────┘
                          │ Query
                   ┌──────▼──────┐
                   │   Grafana   │  ← 4 dashboards
                   │ (visualize) │
                   └─────────────┘

Enabling the Stack

Beeacon is disabled by default and can be enabled in config.yml:

monitoring:
  observability:
    enabled: true
    grafana:
      enabled: true
    loki:
      enabled: true
    promtail:
      enabled: true

Then regenerate your environment and start the services:

sudo msting regenerate-env
sudo msting start loki
# Wait for Loki to become healthy, then:
sudo msting start promtail grafana log-forwarder

Accessing Grafana

Grafana is exposed on port 3001 by default. If you have a reverse proxy (recommended), configure it at a sub-path:

Access MethodURL
Direct (internal)http://localhost:3001/grafana/
Via reverse proxyhttps://your-domain.com/grafana/

Anonymous Viewer Access

By default, Grafana allows anonymous read-only access — visitors can view dashboards without logging in. This is ideal for demos and shared monitoring. Admin operations (editing dashboards, managing data sources) require authentication.

Nginx Reverse Proxy

Add this to your nginx configuration to expose Grafana at /grafana/:

location /grafana/ {
    proxy_pass http://127.0.0.1:3001/grafana/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    # WebSocket support for Grafana Live
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
}

Pre-Built Dashboards

Beeacon ships with four dashboards, automatically provisioned in the HIVE folder:

🐝 STING System Overview

The primary operations dashboard showing platform-wide health at a glance.

  • Service Activity Table — every STING service with event counts and tier classification (application, infrastructure, auth, AI, workers)
  • Total Events / Error Rate / Warning Rate — stat panels with color-coded thresholds
  • Log Volume by Service — stacked bar chart showing which services are most active
  • Log Volume by Level — color-coded (green=INFO, yellow=WARNING, red=ERROR)
  • Tier Distribution — donut chart breaking down log volume by service tier
  • Error Log Stream — real-time filtered view of ERROR/CRITICAL/FATAL entries
  • All Logs — full log explorer with service and level filters

🤖 Bee AI & Reports

Monitors the AI pipeline: chatbot interactions, LLM gateway traffic, and report generation.

  • Bee Chat / AI Gateway / Reports / AI Errors — stat panels per service
  • AI Service Activity — per-service timeseries (Chatbot, AI Gateway, LLM Proxy, Demo AI)
  • Report Worker Pipeline — job lifecycle tracking (started, completed, failed)
  • Chatbot Logs / AI Gateway Logs — split log panels for debugging

🔒 Authentication & Security

Tracks authentication events and PII compliance.

  • Auth Events / Failed Attempts / PII Detections / Log Redactions — stat panels
  • Authentication Events Over Time — login requests, registrations, errors, all Kratos events
  • Security & PII Events — PII scans, compliance checks, log redactions, app errors
  • Kratos Auth Logs / Security Event Logs — filtered log streams

📚 Knowledge Service

Monitors Honey Jar operations and vector store activity.

  • Knowledge Events / Uploads / Searches / ChromaDB Events — stat panels
  • Knowledge Service Activity — uploads, searches, sync/embedding operations over time
  • Vector Store Activity — ChromaDB event volume and errors
  • Knowledge Logs / ChromaDB Logs — service-specific log streams

PII Sanitization in Logs

A key differentiator of Beeacon is automatic PII redaction before logs are stored. Promtail’s pipeline sanitizes the following patterns:

PatternReplacement
Email addresses[EMAIL_REDACTED]
Phone numbers[PHONE_REDACTED]
SSN patterns[SSN_REDACTED]
Credit card numbers[CC_REDACTED]
API keys (sk_...)[API_KEY_REDACTED]
Bearer tokensBearer [TOKEN_REDACTED]
Passwords in logs[PASSWORD_REDACTED]

This ensures that even if application code accidentally logs sensitive data, it never reaches persistent storage.

Log Labels and Querying

Promtail automatically labels every log entry with metadata from Docker:

LabelDescriptionExample Values
serviceDocker Compose service nameapp, chatbot, kratos, knowledge
containerDocker container namesting-ce-app, sting-ce-chatbot
tierService categoryapplication, infrastructure, auth, ai, workers
projectCompose projectsting-ce
levelLog level (when parseable)INFO, WARNING, ERROR, CRITICAL

Example LogQL Queries

# All errors across the platform
{project="sting-ce", level=~"ERROR|CRITICAL"}

# Chatbot activity
{service="chatbot"}

# Authentication failures
{service="kratos"} |~ "level=error|failed|denied"

# Report generation events
{service="report-worker"} |~ "processed|completed|Processing"

# PII-related events in the app
{service="app"} |~ "pii|compliance"

# All logs from a specific tier
{tier="ai"}

Resource Usage

The Beeacon stack is designed to be lightweight:

ComponentMemory LimitCPU LimitTypical Usage
Loki512 MB0.5 cores~100-200 MB
Promtail256 MB0.25 cores~60-80 MB
Grafana512 MB0.5 cores~100-150 MB
Log Forwarder256 MB0.1 cores~10-20 MB
Total1.5 GB1.35 cores~300-450 MB

Configuration Reference

Loki

Stored at observability/loki/config/loki.yml:

  • Retention: 7 days (configurable via limits_config.retention_period)
  • Storage: Local filesystem at /loki/chunks
  • Schema: TSDB v13 with 24h index periods
  • Rate limits: 4 MB/s ingestion, 6 MB/s burst, 256 KB max line size

Promtail

Stored at observability/promtail/config/promtail.yml:

  • Collection: Docker socket discovery, auto-discovers all sting-ce containers
  • Pipeline: JSON parsing → level extraction → PII sanitization → health check filtering
  • Refresh: 15-second container discovery interval

Grafana

Stored at observability/grafana/config/grafana.ini:

  • Sub-path: Served at /grafana/ for reverse proxy compatibility
  • Auth: Anonymous viewer access enabled, admin login available
  • Provisioning: Dashboards and Loki datasource auto-provisioned from files
  • Security: Embedding allowed, HSTS enabled, analytics/telemetry disabled

Troubleshooting

Services not appearing in dashboards

Promtail discovers containers via Docker socket. Verify it can access the socket:

sudo docker logs sting-ce-promtail 2>&1 | tail -20

Loki showing “too many outstanding requests”

Reduce query parallelism or increase limits in loki.yml:

limits_config:
  max_query_parallelism: 4
  max_query_series: 10000

Grafana shows “No Data”

  1. Verify Loki has data: curl -s http://localhost:3100/loki/api/v1/labels
  2. Check the time range — ensure it covers the period when logs were collected
  3. Verify the dashboard’s datasource is pointing to Loki

Checking log flow

# Verify Loki is receiving data
curl -s http://localhost:3100/loki/api/v1/label/service/values

# Check Promtail targets
curl -s http://localhost:9080/targets

Last updated: