Web Search Integration

STING-CE includes SearXNG, a self-hosted meta-search engine that provides Bee with real-time web search capabilities while maintaining privacy and avoiding external API dependencies.

Overview

When enabled, Bee can search the web to find current information, research topics, and gather context that may not be in your Honey Jars. This is particularly useful for:

  • Current events - Information that changes frequently
  • Technical documentation - Latest framework versions, API changes
  • Research topics - Broader context beyond your knowledge base
  • Report generation - Enriching reports with external sources

Architecture

graph LR
    subgraph "STING-CE"
        BEE[Bee Chat] --> CONTEXT[Context Manager]
        CONTEXT --> SEARCH[Web Search Provider]
        SEARCH --> SEARXNG[SearXNG]
    end
    
    subgraph "External Sources"
        SEARXNG --> DDG[DuckDuckGo]
        SEARXNG --> WIKI[Wikipedia]
        SEARXNG --> BRAVE[Brave Search]
        SEARXNG --> QWANT[Qwant]
    end
    
    SEARCH --> FETCH[Content Fetcher]
    FETCH --> RESULTS[Processed Results]
    RESULTS --> BEE

How It Works

  1. Query Sanitization - User queries are sanitized to remove PII (emails, IPs, API keys, SSNs, credit cards) before searching
  2. Meta-Search - SearXNG aggregates results from multiple privacy-respecting search engines
  3. Content Fetching - Optionally fetches and extracts content from result URLs
  4. Context Scaling - Automatically adjusts content size based on the LLM’s context window
  5. Response Enhancement - Web context is added to Bee’s prompt for more informed responses

Configuration

Environment Variables

VariableDefaultDescription
WEB_SEARCH_ENABLEDfalseEnable/disable web search
WEB_SEARCH_PROVIDERsearxngSearch provider (searxng, serper, brave, tavily)
SEARXNG_URLhttp://searxng:8080SearXNG service URL
WEB_SEARCH_TIMEOUT5Per-request timeout (seconds)
WEB_SEARCH_TOTAL_TIMEOUT15Total operation timeout (seconds)
WEB_SEARCH_FETCH_CONTENTtrueFetch full page content from URLs
WEB_SEARCH_MAX_RESULTS3Maximum search results to return
WEB_SEARCH_MAX_CONTENT_LENGTH2000Max characters per source
WEB_SEARCH_API_KEY(empty)API key for external providers (not needed for SearXNG)

To enable web search, set in your environment file:

WEB_SEARCH_ENABLED=true
WEB_SEARCH_PROVIDER=searxng

Or via manage_sting.sh:

./manage_sting.sh config set WEB_SEARCH_ENABLED true
./manage_sting.sh restart external-ai

SearXNG Configuration

The SearXNG configuration is stored in searxng/settings.yml. Default settings are optimized for STING-CE:

Enabled Search Engines

EngineWeightNotes
DuckDuckGo1.5Primary - no tracking
Wikipedia1.2Factual information
Brave Search1.0Privacy-focused
Qwant0.8European privacy search
Bing0.5Fallback for coverage

Security Settings

server:
  limiter: false        # Disabled for internal service use
  public_instance: false # Not exposed publicly
  image_proxy: false     # Text-only results

Context-Aware Scaling

Web search automatically scales content fetching based on your LLM’s context window:

Context TierMax TokensResultsContent/SourceTotal Budget
Small< 8K21,000 chars2,000 chars
Medium8K - 32K32,000 chars5,000 chars
Large32K - 128K54,000 chars15,000 chars
Huge> 128K58,000 chars30,000 chars

This ensures web search doesn’t overwhelm smaller models while taking full advantage of larger context windows.

Privacy Features

Query Sanitization

Before any query is sent to search engines, STING automatically removes:

  • Email addresses - user@example.com
  • IP addresses - 192.168.1.1
  • API keys - sk-..., api_key:...
  • Social Security Numbers - 123-45-6789
  • Credit card numbers - 4111-1111-1111-1111

Blocked Domains

Content is not fetched from these domains (paywalls, login walls):

  • linkedin.com, facebook.com, twitter.com, x.com
  • instagram.com, tiktok.com, youtube.com

Self-Hosted Advantage

Unlike external search APIs:

  • No API keys required
  • No rate limits (you control the infrastructure)
  • No query logging by third parties
  • Full control over which engines are used
  • Offline capable (if engines are cached)

External Provider Fallback

If SearXNG is unavailable or you prefer external APIs, STING supports:

ProviderAPI Key Env VarNotes
SerperWEB_SEARCH_API_KEYGoogle results via API
BraveWEB_SEARCH_API_KEYBrave Search API
TavilyWEB_SEARCH_API_KEYAI-optimized search API

Configure with:

WEB_SEARCH_ENABLED=true
WEB_SEARCH_PROVIDER=serper
WEB_SEARCH_API_KEY=your-api-key

If the external provider fails or has no API key, STING automatically falls back to SearXNG.

Troubleshooting

Web Search Not Working

  1. Check if enabled:

    docker exec sting-ce-external-ai env | grep WEB_SEARCH
    
  2. Test SearXNG directly:

    curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results[:2]'
    
  3. Check SearXNG logs:

    docker logs sting-ce-searxng --tail 50
    

Slow Search Results

  • Reduce WEB_SEARCH_MAX_RESULTS to 2
  • Disable content fetching: WEB_SEARCH_FETCH_CONTENT=false
  • Lower timeout: WEB_SEARCH_TIMEOUT=3

Search Returns Empty

  • Verify SearXNG container is healthy: docker ps | grep searxng
  • Check enabled engines in searxng/settings.yml
  • Some engines may be rate-limited; try again after a few minutes

Integration with Reports

When Bee generates reports, web search provides additional context:

  1. Topic Research - Searches for current information on the report topic
  2. Source Citation - Includes URLs for references
  3. Fact Checking - Cross-references claims with web sources

Web search can be skipped for internal operations (like title generation) to improve performance.

Docker Compose Service

The SearXNG service is defined in docker-compose.yml:

searxng:
  container_name: sting-ce-searxng
  image: searxng/searxng:latest
  environment:
    - SEARXNG_BASE_URL=http://searxng:8080
  volumes:
    - ./searxng:/etc/searxng:ro
  networks:
    sting-network:
      aliases:
        - searxng
  restart: unless-stopped

The service is only accessible from within the Docker network - it’s not exposed to the host.

Best Practices

  1. Enable for research-heavy use cases - Reports, analysis, current events
  2. Disable for sensitive environments - Air-gapped or compliance-restricted deployments
  3. Monitor usage - Check logs if queries seem slow
  4. Update engines periodically - Search engines change; update settings.yml if needed
  5. Use Honey Jars first - Web search supplements but doesn’t replace your knowledge base

Last updated: