STING CE Model Modes Guide

Overview

STING CE now supports two model modes to balance performance, quality, and resource usage:

Small Models Mode (Default) - Fast, lightweight models ideal for most use cases
Performance Mode - Large, state-of-the-art models for maximum quality

Quick Start

Using the Model Manager

The easiest way to manage model modes is using the model manager script:

# Check current status
./sting-model-manager.sh status

# Switch to small models (default)
./sting-model-manager.sh small

# Switch to performance models
./sting-model-manager.sh performance

# Download small models
./sting-model-manager.sh download

Model Comparison

Small Models Mode (Recommended)

Model	Size	Use Case	Memory Usage
DeepSeek-R1-1.5B	1.5GB	Reasoning & logic	~3GB
TinyLlama-1.1B	2.2GB	General chat	~3GB
DialoGPT-medium	345MB	Conversations	~1GB

Total Download: ~5GB
Total RAM Required: 8GB
Startup Time: 30-60 seconds

Performance Mode

Model	Size	Use Case	Memory Usage
Llama-3.1-8B	16GB	State-of-the-art	~16GB
Phi-3-medium-128k	28GB	Long context	~32GB
Zephyr-7B	14GB	Technical tasks	~16GB

Total Download: ~58GB
Total RAM Required: 32GB+
Startup Time: 5-10 minutes

Installation

Option 1: Install with Small Models (Recommended)

# Download small models first
./download_optimized_models.sh

# Install STING with small models
./install_sting.sh

# The system will use small models by default

Option 2: Manual Model Download

# For small models only
./download_optimized_models.sh

# For large models (optional)
./manage_sting.sh download_models

Switching Between Modes

Method 1: Using Model Manager (Easiest)

# Switch to small models
./sting-model-manager.sh small

# Switch to performance models
./sting-model-manager.sh performance

Method 2: Using Docker Compose

# For small models
docker compose -f docker-compose.yml -f docker-compose.small-models.yml up -d

# For performance models
docker compose -f docker-compose.yml -f docker-compose.performance-models.yml up -d

Method 3: Environment Variables

# Set active model
export ACTIVE_MODEL=deepseek-1.5b  # or tinyllama, dialogpt, llama3, phi3, zephyr

# Restart services
docker compose restart llm-gateway

Model Selection Guide

When to Use Small Models

Development and testing
Resource-constrained environments (VMs, older hardware)
Quick prototyping
General chatbot conversations
Fast response times needed

When to Use Performance Models

Production deployments with ample resources
Complex reasoning tasks
Code generation
Technical documentation
Multi-language support
Maximum quality requirements

DeepSeek Models

We’ve added DeepSeek models as an excellent middle ground:

DeepSeek-R1-1.5B: Despite its small size, it offers GPT-4 level reasoning
DeepSeek-7B-Chat: Larger variant for better quality (optional download)

Why DeepSeek?

Superior reasoning capabilities for their size
Open source with commercial use allowed
Optimized for both English and Chinese
Excellent benchmark performance

Troubleshooting

Models Not Loading

# Check if models are downloaded
ls -la ~/Downloads/llm_models/

# Check service logs
docker logs sting-ce-llm-gateway-1

# Verify model service is running
./sting-model-manager.sh status

Out of Memory Errors

# Switch to small models
./sting-model-manager.sh small

# Or limit memory usage
export TORCH_NUM_THREADS=2
docker compose restart

Slow Response Times

Ensure you’re using small models for development
Check available RAM: free -h
Consider using DialoGPT for fastest responses

Performance Tips

For Small Models

Use PERFORMANCE_PROFILE=vm_optimized
Enable response caching
Use batch processing for multiple requests

For Large Models

Use GPU acceleration if available
Increase Docker memory limits
Use quantization: export QUANTIZATION=int8

API Usage

The API remains the same regardless of model mode:

# Test with any model
curl -X POST http://localhost:8085/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "model": "deepseek-1.5b"}'

Available model names:

Small: deepseek-1.5b, tinyllama, dialogpt
Performance: llama3, phi3, zephyr

Last updated: October 20, 2025