STING Queuing Architecture & Memory Optimization
Current Memory Limits Applied
All services now have memory limits to prevent swap usage:
| Service | Memory Limit | CPU Limit | Purpose |
|---|---|---|---|
| PostgreSQL | 1GB | 1.0 | Database with optimized settings |
| Redis | 512MB | 0.5 | Caching & job queues |
| Vault | 512MB | 0.5 | Secrets management |
| Kratos | 512MB | 0.5 | Authentication |
| App (Flask) | 1GB | 1.0 | Backend API |
| Frontend | 1GB | 1.0 | React development server |
| Messaging | 1GB | 1.0 | Message queuing service |
| Chatbot | 3GB | 2.0 | phi3 model hosting |
| LLM Gateway | 6GB | 4.0 | Model management |
| Knowledge | 3GB | 2.0 | Vector database processing |
| Chroma | 2GB | 1.0 | Vector storage |
Total Memory Budget: ~20GB (vs unlimited before)
Current Queuing Infrastructure
Already Implemented:
- Redis - Job queue backend (optimized for LRU caching)
- Messaging Service - Custom message processing
- PostgreSQL - Message persistence and storage
Redis Configuration:
REDIS_MAXMEMORY: 512mb
REDIS_MAXMEMORY_POLICY: allkeys-lru
REDIS_SAVE: "900 1 300 10 60 10000" # Optimized persistence
Recommended Queue Enhancements
1. Background Job Processing
Add Celery for distributed task processing:
# Add to docker-compose.yml
celery-worker:
build:
context: .
dockerfile: ./workers/Dockerfile.celery
environment:
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/1.
deploy:
resources:
limits:
memory: 1G
cpus: '1.0'
reservations:
memory: 256M
depends_on:
- redis
- db
2. Queue Types Needed:
High Priority Queues:
- Chat Processing - Real-time user interactions
- Model Loading - phi3 initialization and warm-up
- Knowledge Ingestion - Document processing for Honey Pots
Medium Priority Queues:
- Embedding Generation - Vector creation for search
- System Maintenance - Cleanup and optimization tasks
- Notification Dispatch - User alerts and updates
Low Priority Queues:
- Analytics Processing - Usage statistics and reporting
- Backup Operations - Data persistence tasks
- Audit Log Processing - Security and compliance logging
3. Task Distribution Strategy:
# Example queue configuration
CELERY_ROUTES = {
'chat.process_message': {'queue': 'high_priority'},
'knowledge.process_document': {'queue': 'medium_priority'},
'analytics.generate_report': {'queue': 'low_priority'},
'models.load_phi3': {'queue': 'high_priority'},
'embeddings.generate_batch': {'queue': 'medium_priority'},
}
Queue Monitoring & Management
1. Queue Health Monitoring:
# Redis queue monitoring
redis-cli info replication
redis-cli llen high_priority_queue
redis-cli llen medium_priority_queue
redis-cli llen low_priority_queue
2. Worker Scaling:
# Scale workers based on queue depth
celery-worker:
deploy:
replicas: 3 # Start with 3 workers
restart_policy:
condition: on-failure
update_config:
parallelism: 1
delay: 10s
3. Queue Persistence:
# Redis persistence for job reliability
redis:
environment:
- REDIS_APPENDONLY=yes
- REDIS_APPENDFSYNC=everysec
- REDIS_AUTO_AOF_REWRITE_PERCENTAGE=100.
Performance Optimizations
Memory Management:
- Queue Size Limits - Prevent memory exhaustion
- Job TTL - Auto-expire old jobs
- Result Cleanup - Remove completed job results
- Memory Monitoring - Alert on high memory usage
Redis Optimizations:
# Redis memory optimization commands
CONFIG SET maxmemory-policy allkeys-lru
CONFIG SET tcp-keepalive 60
CONFIG SET timeout 300
PostgreSQL Queue Tables:
-- Efficient job queue table
CREATE TABLE job_queue (
id SERIAL PRIMARY KEY,
queue_name VARCHAR(50) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
priority INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
started_at TIMESTAMP,
completed_at TIMESTAMP
);
-- Indexes for performance
CREATE INDEX idx_job_queue_status ON job_queue(status, priority);
CREATE INDEX idx_job_queue_created ON job_queue(created_at);
Queue Implementation Options
Core Queue Components
Consider implementing these components based on system needs:
- Celery worker containers for distributed processing
- Redis configuration for job persistence
- Task routing for priority-based execution
Advanced Queue Features
Available options for enhanced queue functionality:
- Dead letter queues for failed job handling
- Job retry mechanisms with exponential backoff
- Queue monitoring dashboard integration
Enterprise Queue Capabilities
Scalable options for larger deployments:
- Multi-tenant queue isolation
- Job scheduling and cron-like task management
- Queue metrics and alerting systems
Queue Use Cases for STING
Current Applications
- Document Processing - Honey Jar ingestion pipeline
- Model Management - phi3 loading and optimization
- User Notifications - Real-time alerts
Scalable Applications
- Multi-user Chat - Concurrent Bee conversations
- Batch Processing - Large document collections
- Enterprise Integration - LDAP/SAML sync jobs
Memory vs Performance Trade-offs
With the new memory limits:
- Benefit: No more 40GB swap usage.
- Trade-off: May need smarter queue management.
- Solution: Efficient job batching and prioritization.
The queue system becomes more important with memory constraints since we need to:
- Process jobs efficiently without memory spikes
- Batch operations to reduce memory overhead
- Clean up completed jobs promptly