LLM Health Check Instructions

The LLM services in STING can take some time to fully load the models and become operational. To ensure that your installation is working correctly, we’ve provided a health check script that will verify that all LLM services are properly running.

When to Use This Script

Run this script:

After installation completes
If you experience issues with the LLM functionality
After upgrading or changing LLM models
If you want to verify that your Hugging Face token is working

For initial LLM setup, see the Ollama Model Setup Guide. For performance optimization, see the Hardware Acceleration Guide.

Usage

# Make the script executable (if not already)
chmod +x check_llm_health.sh

# Run the health check
./check_llm_health.sh

What the Script Checks

The script performs several checks:

Docker Status: Verifies that Docker is running
Container Status: Checks if all LLM service containers are running
Gateway Health: Tests the LLM gateway health endpoint
Model Loading: Examines logs to see if models are loaded
Model Testing: Sends a simple prompt to each model to test functionality

How to Interpret Results

The health check will display colored output:

🟢 Green: Component is healthy and working correctly
🟡 Yellow: Warning or component still initializing
🔴 Red: Error or component not functioning

Troubleshooting

If issues are detected:

Models Still Loading: LLM models can take several minutes to load, especially on the first run

# Check the logs of a specific model service
docker logs $(docker ps | grep llama3-service | awk '{print $1}')

Services Not Running: Restart the services

./manage_sting.sh restart llama3-service phi3-service zephyr-service llm-gateway

Gateway Connection Issues: Check if the gateway is running on the expected port
```
docker ps | grep llm-gateway
```
Hugging Face Token Issues: Verify your token is correctly set
```
./setup_hf_token.sh
```

Wait Times

Initial Load: 3-10 minutes depending on your hardware
Subsequent Starts: 1-3 minutes

Additional Information

The LLM services are designed to initialize in the background to prevent blocking the installation process. This means your STING installation may report as successful even if the models are still loading.

For large models (like Llama 3), initialization can take longer depending on your system’s hardware resources. A machine with a GPU will load models faster than a CPU-only system.

Last updated: October 22, 2025