STING Queuing Architecture & Memory Optimization

Current Memory Limits Applied

All services now have memory limits to prevent swap usage:

ServiceMemory LimitCPU LimitPurpose
PostgreSQL1GB1.0Database with optimized settings
Redis512MB0.5Caching & job queues
Vault512MB0.5Secrets management
Kratos512MB0.5Authentication
App (Flask)1GB1.0Backend API
Frontend1GB1.0React development server
Messaging1GB1.0Message queuing service
Chatbot3GB2.0phi3 model hosting
LLM Gateway6GB4.0Model management
Knowledge3GB2.0Vector database processing
Chroma2GB1.0Vector storage

Total Memory Budget: ~20GB (vs unlimited before)

Current Queuing Infrastructure

Already Implemented:

  1. Redis - Job queue backend (optimized for LRU caching)
  2. Messaging Service - Custom message processing
  3. PostgreSQL - Message persistence and storage

Redis Configuration:

REDIS_MAXMEMORY: 512mb
REDIS_MAXMEMORY_POLICY: allkeys-lru
REDIS_SAVE: "900 1 300 10 60 10000"  # Optimized persistence

1. Background Job Processing

Add Celery for distributed task processing:

# Add to docker-compose.yml
celery-worker:
  build:
    context: .
    dockerfile: ./workers/Dockerfile.celery
  environment:
    - CELERY_BROKER_URL=redis://redis:6379/0
    - CELERY_RESULT_BACKEND=redis://redis:6379/1.
  deploy:
    resources:
      limits:
        memory: 1G
        cpus: '1.0'
      reservations:
        memory: 256M
  depends_on:
    - redis
    - db

2. Queue Types Needed:

High Priority Queues:

  • Chat Processing - Real-time user interactions
  • Model Loading - phi3 initialization and warm-up
  • Knowledge Ingestion - Document processing for Honey Pots

Medium Priority Queues:

  • Embedding Generation - Vector creation for search
  • System Maintenance - Cleanup and optimization tasks
  • Notification Dispatch - User alerts and updates

Low Priority Queues:

  • Analytics Processing - Usage statistics and reporting
  • Backup Operations - Data persistence tasks
  • Audit Log Processing - Security and compliance logging

3. Task Distribution Strategy:

# Example queue configuration
CELERY_ROUTES = {
    'chat.process_message': {'queue': 'high_priority'},
    'knowledge.process_document': {'queue': 'medium_priority'},
    'analytics.generate_report': {'queue': 'low_priority'},
    'models.load_phi3': {'queue': 'high_priority'},
    'embeddings.generate_batch': {'queue': 'medium_priority'},
}

Queue Monitoring & Management

1. Queue Health Monitoring:

# Redis queue monitoring
redis-cli info replication
redis-cli llen high_priority_queue
redis-cli llen medium_priority_queue
redis-cli llen low_priority_queue

2. Worker Scaling:

# Scale workers based on queue depth
celery-worker:
  deploy:
    replicas: 3  # Start with 3 workers
    restart_policy:
      condition: on-failure
    update_config:
      parallelism: 1
      delay: 10s

3. Queue Persistence:

# Redis persistence for job reliability
redis:
  environment:
    - REDIS_APPENDONLY=yes
    - REDIS_APPENDFSYNC=everysec
    - REDIS_AUTO_AOF_REWRITE_PERCENTAGE=100.

Performance Optimizations

Memory Management:

  1. Queue Size Limits - Prevent memory exhaustion
  2. Job TTL - Auto-expire old jobs
  3. Result Cleanup - Remove completed job results
  4. Memory Monitoring - Alert on high memory usage

Redis Optimizations:

# Redis memory optimization commands
CONFIG SET maxmemory-policy allkeys-lru
CONFIG SET tcp-keepalive 60
CONFIG SET timeout 300

PostgreSQL Queue Tables:

-- Efficient job queue table
CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    queue_name VARCHAR(50) NOT NULL,
    payload JSONB NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    priority INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    started_at TIMESTAMP,
    completed_at TIMESTAMP
);

-- Indexes for performance
CREATE INDEX idx_job_queue_status ON job_queue(status, priority);
CREATE INDEX idx_job_queue_created ON job_queue(created_at);

Queue Implementation Options

Core Queue Components

Consider implementing these components based on system needs:

  • Celery worker containers for distributed processing
  • Redis configuration for job persistence
  • Task routing for priority-based execution

Advanced Queue Features

Available options for enhanced queue functionality:

  • Dead letter queues for failed job handling
  • Job retry mechanisms with exponential backoff
  • Queue monitoring dashboard integration

Enterprise Queue Capabilities

Scalable options for larger deployments:

  • Multi-tenant queue isolation
  • Job scheduling and cron-like task management
  • Queue metrics and alerting systems

Queue Use Cases for STING

Current Applications

  • Document Processing - Honey Jar ingestion pipeline
  • Model Management - phi3 loading and optimization
  • User Notifications - Real-time alerts

Scalable Applications

  • Multi-user Chat - Concurrent Bee conversations
  • Batch Processing - Large document collections
  • Enterprise Integration - LDAP/SAML sync jobs

Memory vs Performance Trade-offs

With the new memory limits:

  • Benefit: No more 40GB swap usage.
  • Trade-off: May need smarter queue management.
  • Solution: Efficient job batching and prioritization.

The queue system becomes more important with memory constraints since we need to:

  1. Process jobs efficiently without memory spikes
  2. Batch operations to reduce memory overhead
  3. Clean up completed jobs promptly

Last updated: