STING Queuing Architecture & Memory Optimization

Current Memory Limits Applied

All services now have memory limits to prevent swap usage:

Service	Memory Limit	CPU Limit	Purpose
PostgreSQL	1GB	1.0	Database with optimized settings
Redis	512MB	0.5	Caching & job queues
Vault	512MB	0.5	Secrets management
Kratos	512MB	0.5	Authentication
App (Flask)	1GB	1.0	Backend API
Frontend	1GB	1.0	React development server
Messaging	1GB	1.0	Message queuing service
Chatbot	3GB	2.0	phi3 model hosting
LLM Gateway	6GB	4.0	Model management
Knowledge	3GB	2.0	Vector database processing
Chroma	2GB	1.0	Vector storage

Total Memory Budget: ~20GB (vs unlimited before)

Current Queuing Infrastructure

Already Implemented:

Redis - Job queue backend (optimized for LRU caching)
Messaging Service - Custom message processing
PostgreSQL - Message persistence and storage

Redis Configuration:

REDIS_MAXMEMORY: 512mb
REDIS_MAXMEMORY_POLICY: allkeys-lru
REDIS_SAVE: "900 1 300 10 60 10000"  # Optimized persistence

Recommended Queue Enhancements

1. Background Job Processing

Add Celery for distributed task processing:

# Add to docker-compose.yml
celery-worker:
  build:
    context: .
    dockerfile: ./workers/Dockerfile.celery
  environment:
    - CELERY_BROKER_URL=redis://redis:6379/0
    - CELERY_RESULT_BACKEND=redis://redis:6379/1.
  deploy:
    resources:
      limits:
        memory: 1G
        cpus: '1.0'
      reservations:
        memory: 256M
  depends_on:
    - redis
    - db

2. Queue Types Needed:

High Priority Queues:

Chat Processing - Real-time user interactions
Model Loading - phi3 initialization and warm-up
Knowledge Ingestion - Document processing for Honey Pots

Medium Priority Queues:

Embedding Generation - Vector creation for search
System Maintenance - Cleanup and optimization tasks
Notification Dispatch - User alerts and updates

Low Priority Queues:

Analytics Processing - Usage statistics and reporting
Backup Operations - Data persistence tasks
Audit Log Processing - Security and compliance logging

3. Task Distribution Strategy:

# Example queue configuration
CELERY_ROUTES = {
    'chat.process_message': {'queue': 'high_priority'},
    'knowledge.process_document': {'queue': 'medium_priority'},
    'analytics.generate_report': {'queue': 'low_priority'},
    'models.load_phi3': {'queue': 'high_priority'},
    'embeddings.generate_batch': {'queue': 'medium_priority'},
}

Queue Monitoring & Management

1. Queue Health Monitoring:

# Redis queue monitoring
redis-cli info replication
redis-cli llen high_priority_queue
redis-cli llen medium_priority_queue
redis-cli llen low_priority_queue

2. Worker Scaling:

# Scale workers based on queue depth
celery-worker:
  deploy:
    replicas: 3  # Start with 3 workers
    restart_policy:
      condition: on-failure
    update_config:
      parallelism: 1
      delay: 10s

3. Queue Persistence:

# Redis persistence for job reliability
redis:
  environment:
    - REDIS_APPENDONLY=yes
    - REDIS_APPENDFSYNC=everysec
    - REDIS_AUTO_AOF_REWRITE_PERCENTAGE=100.

Performance Optimizations

Memory Management:

Queue Size Limits - Prevent memory exhaustion
Job TTL - Auto-expire old jobs
Result Cleanup - Remove completed job results
Memory Monitoring - Alert on high memory usage

Redis Optimizations:

# Redis memory optimization commands
CONFIG SET maxmemory-policy allkeys-lru
CONFIG SET tcp-keepalive 60
CONFIG SET timeout 300

PostgreSQL Queue Tables:

-- Efficient job queue table
CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    queue_name VARCHAR(50) NOT NULL,
    payload JSONB NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    priority INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    started_at TIMESTAMP,
    completed_at TIMESTAMP
);

-- Indexes for performance
CREATE INDEX idx_job_queue_status ON job_queue(status, priority);
CREATE INDEX idx_job_queue_created ON job_queue(created_at);

Queue Implementation Options

Core Queue Components

Consider implementing these components based on system needs:

Celery worker containers for distributed processing
Redis configuration for job persistence
Task routing for priority-based execution

Advanced Queue Features

Available options for enhanced queue functionality:

Dead letter queues for failed job handling
Job retry mechanisms with exponential backoff
Queue monitoring dashboard integration

Enterprise Queue Capabilities

Scalable options for larger deployments:

Multi-tenant queue isolation
Job scheduling and cron-like task management
Queue metrics and alerting systems

Queue Use Cases for STING

Current Applications

Document Processing - Honey Jar ingestion pipeline
Model Management - phi3 loading and optimization
User Notifications - Real-time alerts

Scalable Applications

Multi-user Chat - Concurrent Bee conversations
Batch Processing - Large document collections
Enterprise Integration - LDAP/SAML sync jobs

Memory vs Performance Trade-offs

With the new memory limits:

Benefit: No more 40GB swap usage.
Trade-off: May need smarter queue management.
Solution: Efficient job batching and prioritization.

The queue system becomes more important with memory constraints since we need to:

Process jobs efficiently without memory spikes
Batch operations to reduce memory overhead
Clean up completed jobs promptly

Last updated: October 20, 2025