🐝 ReviewBee - Unified Quality Assurance

Overview

ReviewBee is STING’s unified quality assurance system for all AI-generated content. It combines requirements validation, PII safety checks, and professional quality assurance into a single, streamlined reviewer.

Core Philosophy:

“Compare the final output against the original ask, while ensuring safety and quality.”

ReviewBee handles everything in one pass:

✅ Requirements Fulfillment - Does it answer what the user asked?
✅ PII Safety - Are all PII tokens properly resolved?
✅ Content Quality - Grammar, structure, completeness
✅ Format Validation - Proper sections, markdown, professional tone

Why Unified? Previously, STING had separate systems (QE Bee for sanitization, other checks scattered). ReviewBee consolidates everything into one intelligent reviewer that runs once, checks everything, and provides actionable feedback.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     User Request                                │
│  "Generate a report about X with 3 use cases"                   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Primary LLM Generation                        │
│              (phi-4-reasoning-plus, etc.)                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    🐝 ReviewBee                                 │
│                  (unified reviewer)                              │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 1. REQUIREMENTS CHECK                                    │   │
│  │    • Extract asks from original request                  │   │
│  │    • Compare output against requirements                 │   │
│  │    • Score fulfillment (YES/PARTIAL/NO)                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 2. PII SAFETY CHECK                                      │   │
│  │    • Detect unresolved [PII_*] tokens                    │   │
│  │    • Flag potential data leakage                         │   │
│  │    • Block if critical PII exposed                       │   │
│  └─────────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 3. QUALITY CHECK                                         │   │
│  │    • Grammar and clarity                                 │   │
│  │    • Structure and formatting                            │   │
│  │    • Completeness (no truncation)                        │   │
│  └─────────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 4. GENERATE TASK LIST (if issues found)                  │   │
│  │    • Specific, actionable improvements                   │   │
│  │    • Prioritized by severity                             │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                    Overall Score < Threshold?
                              │
              ┌───────────────┴───────────────┐
              │ YES                           │ NO
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│   Regenerate with       │     │   Deliver to User       │
│   Task List Feedback    │     │   ✅                    │
└─────────────────────────┘     └─────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Quality Validation                            │
│  ✓ Revision ≥ 70% of original length                            │
│  ✓ No unexpected characters introduced                          │
│  ✓ Structure preserved                                          │
│  ✓ PII issues resolved                                          │
└─────────────────────────────────────────────────────────────────┘
              │
    Validation Passed? ───► NO ───► Keep Original
              │
              ▼ YES
       Use Revised Output

Key Features

1. Requirements Extraction & Validation

ReviewBee extracts what the user actually asked for:

Requirement Type	Examples
Word count	“at least 1000 words”, “brief summary”
Sections	“include executive summary”, “add architecture”
Specific asks	“3 use cases”, “HIPAA compliance”
Questions	Any explicit questions that need answers

2. PII Safety Checks

Ensures no sensitive data leaks through:

✓ Detect unresolved [PII_NAME_xyz] tokens
✓ Flag partial deserialization
✓ Block delivery if critical PII exposed
✓ Report exact token locations

3. Structured Task List

When issues are found, ReviewBee generates specific tasks:

**Your task list:**
  1. Add the requested deployment architecture section
  2. Resolve 2 unresolved PII tokens in paragraph 3
  3. Expand the third use case with more technical detail

4. Quality Validation Gate

Before accepting ANY revision, validates it’s actually an improvement:

Check	Threshold	Purpose
Length ratio	≥ 70%	Prevent content loss
Unexpected chars	< original + 5	Catch encoding issues
Header count	≥ 50%	Preserve structure
PII tokens	= 0	Ensure safety

5. Security by Design

All data is ephemeral — dies with the request
No Redis/persistence — nothing stored
PII-aware — understands token format
Logs sanitized — no sensitive data in logs

Configuration

llm_service:
  review_bee:
    # Master toggle
    enabled: true
    
    # Mode: critique_only | critique_and_revise
    mode: "critique_and_revise"
    
    # Score threshold (0.0-1.0)
    revision_threshold: 0.75
    
    # Critic model (lightweight)
    critic:
      model: "phi4"
    
    # Safety settings
    safety:
      block_on_pii_leak: true
      max_unresolved_tokens: 0
    
    # Quality thresholds
    quality_validation:
      min_length_ratio: 0.7
      min_structure_ratio: 0.5

Environment Variables

REVIEW_BEE_ENABLED=true
REVIEW_BEE_MODE=critique_and_revise
REVIEW_BEE_THRESHOLD=0.75
REVIEW_BEE_CRITIC_MODEL=phi4
REVIEW_BEE_BLOCK_ON_PII=true

API Response

{
  "response": "...",
  "review_bee": {
    "enabled": true,
    "critic_model": "phi4",
    "mode": "critique_and_revise",
    "critique_score": 0.75,
    "requirements_met": "PARTIAL",
    "pii_check": {
      "passed": true,
      "unresolved_tokens": 0
    },
    "gaps_count": 2,
    "task_list_count": 3,
    "revision_applied": true,
    "quality_metrics": {
      "length_ratio": 1.38,
      "unexpected_chars": 0,
      "original_headers": 12,
      "revised_headers": 20
    }
  }
}

Migration from QE Bee

ReviewBee replaces the previous QE Bee system. Key differences:

Feature	QE Bee (Legacy)	ReviewBee (Unified)
Focus	PII sanitization only	Full quality assurance
Requirements check	❌	✅
PII detection	✅	✅
Content quality	Basic	Comprehensive
Regeneration	❌ Flag only	✅ Critic-Revise
Task lists	❌	✅ Actionable tasks
Webhooks	✅	✅ (coming soon)

For existing QE Bee users: ReviewBee is a superset — it does everything QE Bee did plus more. Simply enable ReviewBee and disable QE Bee.

Best Practices

When to Enable

✅ Always enable for:

Production report generation
User-facing content
Any output that leaves the system

⚠️ Consider critique_only mode for:

Development/testing
High-volume, low-stakes content

Threshold Tuning

Threshold	Behavior
`0.9`	Very strict — most outputs revised
`0.75`	Balanced — catches clear issues ✅
`0.6`	Lenient — only major problems

🚀 Future Roadmap

Custom ReviewBees

Specialized reviewers for different domains:

ComplianceBee — HIPAA, SOC2, GDPR checking
TechnicalBee — Code review and accuracy
ToneBee — Brand voice consistency
FactBee — Citation verification

Cloud Orchestration

Harness cloud for heavy loads with local AI orchestration:

Local Orchestrator (always-on, lightweight)
    ├── Local GPU (fast, private)
    ├── Cloud API (powerful, scalable) 
    └── Edge Node (private, secure)

Benefits:

Local AI handles orchestration and sensitive decisions
Cloud bursts for heavy generation
Only anonymized content leaves appliance
Cost-effective scaling

Webhook Notifications

Real-time alerts when ReviewBee takes action:

Review completion events
Revision applied/rejected
PII safety blocks
Configurable filters

ReviewBee is STING’s commitment to quality — one unified reviewer for all AI outputs.

Last updated: January 23, 2026