🐝 ReviewBee - Unified Quality Assurance

Overview

ReviewBee is STING’s unified quality assurance system for all AI-generated content. It combines requirements validation, PII safety checks, and professional quality assurance into a single, streamlined reviewer.

Core Philosophy:

“Compare the final output against the original ask, while ensuring safety and quality.”

ReviewBee handles everything in one pass:

  • Requirements Fulfillment - Does it answer what the user asked?
  • PII Safety - Are all PII tokens properly resolved?
  • Content Quality - Grammar, structure, completeness
  • Format Validation - Proper sections, markdown, professional tone

Why Unified? Previously, STING had separate systems (QE Bee for sanitization, other checks scattered). ReviewBee consolidates everything into one intelligent reviewer that runs once, checks everything, and provides actionable feedback.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     User Request                                │
│  "Generate a report about X with 3 use cases"                   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Primary LLM Generation                        │
│              (phi-4-reasoning-plus, etc.)                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    🐝 ReviewBee                                 │
│                  (unified reviewer)                              │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 1. REQUIREMENTS CHECK                                    │   │
│  │    • Extract asks from original request                  │   │
│  │    • Compare output against requirements                 │   │
│  │    • Score fulfillment (YES/PARTIAL/NO)                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 2. PII SAFETY CHECK                                      │   │
│  │    • Detect unresolved [PII_*] tokens                    │   │
│  │    • Flag potential data leakage                         │   │
│  │    • Block if critical PII exposed                       │   │
│  └─────────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 3. QUALITY CHECK                                         │   │
│  │    • Grammar and clarity                                 │   │
│  │    • Structure and formatting                            │   │
│  │    • Completeness (no truncation)                        │   │
│  └─────────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 4. GENERATE TASK LIST (if issues found)                  │   │
│  │    • Specific, actionable improvements                   │   │
│  │    • Prioritized by severity                             │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                    Overall Score < Threshold?
                              │
              ┌───────────────┴───────────────┐
              │ YES                           │ NO
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│   Regenerate with       │     │   Deliver to User       │
│   Task List Feedback    │     │   ✅                    │
└─────────────────────────┘     └─────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Quality Validation                            │
│  ✓ Revision ≥ 70% of original length                            │
│  ✓ No unexpected characters introduced                          │
│  ✓ Structure preserved                                          │
│  ✓ PII issues resolved                                          │
└─────────────────────────────────────────────────────────────────┘
              │
    Validation Passed? ───► NO ───► Keep Original
              │
              ▼ YES
       Use Revised Output

Key Features

1. Requirements Extraction & Validation

ReviewBee extracts what the user actually asked for:

Requirement TypeExamples
Word count“at least 1000 words”, “brief summary”
Sections“include executive summary”, “add architecture”
Specific asks“3 use cases”, “HIPAA compliance”
QuestionsAny explicit questions that need answers

2. PII Safety Checks

Ensures no sensitive data leaks through:

✓ Detect unresolved [PII_NAME_xyz] tokens
✓ Flag partial deserialization
✓ Block delivery if critical PII exposed
✓ Report exact token locations

3. Structured Task List

When issues are found, ReviewBee generates specific tasks:

**Your task list:**
  1. Add the requested deployment architecture section
  2. Resolve 2 unresolved PII tokens in paragraph 3
  3. Expand the third use case with more technical detail

4. Quality Validation Gate

Before accepting ANY revision, validates it’s actually an improvement:

CheckThresholdPurpose
Length ratio≥ 70%Prevent content loss
Unexpected chars< original + 5Catch encoding issues
Header count≥ 50%Preserve structure
PII tokens= 0Ensure safety

5. Security by Design

  • All data is ephemeral — dies with the request
  • No Redis/persistence — nothing stored
  • PII-aware — understands token format
  • Logs sanitized — no sensitive data in logs

Configuration

llm_service:
  review_bee:
    # Master toggle
    enabled: true
    
    # Mode: critique_only | critique_and_revise
    mode: "critique_and_revise"
    
    # Score threshold (0.0-1.0)
    revision_threshold: 0.75
    
    # Critic model (lightweight)
    critic:
      model: "phi4"
    
    # Safety settings
    safety:
      block_on_pii_leak: true
      max_unresolved_tokens: 0
    
    # Quality thresholds
    quality_validation:
      min_length_ratio: 0.7
      min_structure_ratio: 0.5

Environment Variables

REVIEW_BEE_ENABLED=true
REVIEW_BEE_MODE=critique_and_revise
REVIEW_BEE_THRESHOLD=0.75
REVIEW_BEE_CRITIC_MODEL=phi4
REVIEW_BEE_BLOCK_ON_PII=true

API Response

{
  "response": "...",
  "review_bee": {
    "enabled": true,
    "critic_model": "phi4",
    "mode": "critique_and_revise",
    "critique_score": 0.75,
    "requirements_met": "PARTIAL",
    "pii_check": {
      "passed": true,
      "unresolved_tokens": 0
    },
    "gaps_count": 2,
    "task_list_count": 3,
    "revision_applied": true,
    "quality_metrics": {
      "length_ratio": 1.38,
      "unexpected_chars": 0,
      "original_headers": 12,
      "revised_headers": 20
    }
  }
}

Migration from QE Bee

ReviewBee replaces the previous QE Bee system. Key differences:

FeatureQE Bee (Legacy)ReviewBee (Unified)
FocusPII sanitization onlyFull quality assurance
Requirements check
PII detection
Content qualityBasicComprehensive
Regeneration❌ Flag only✅ Critic-Revise
Task lists✅ Actionable tasks
Webhooks✅ (coming soon)

For existing QE Bee users: ReviewBee is a superset — it does everything QE Bee did plus more. Simply enable ReviewBee and disable QE Bee.

Best Practices

When to Enable

Always enable for:

  • Production report generation
  • User-facing content
  • Any output that leaves the system

⚠️ Consider critique_only mode for:

  • Development/testing
  • High-volume, low-stakes content

Threshold Tuning

ThresholdBehavior
0.9Very strict — most outputs revised
0.75Balanced — catches clear issues ✅
0.6Lenient — only major problems

🚀 Future Roadmap

Custom ReviewBees

Specialized reviewers for different domains:

  • ComplianceBee — HIPAA, SOC2, GDPR checking
  • TechnicalBee — Code review and accuracy
  • ToneBee — Brand voice consistency
  • FactBee — Citation verification

Cloud Orchestration

Harness cloud for heavy loads with local AI orchestration:

Local Orchestrator (always-on, lightweight)
    ├── Local GPU (fast, private)
    ├── Cloud API (powerful, scalable) 
    └── Edge Node (private, secure)

Benefits:

  • Local AI handles orchestration and sensitive decisions
  • Cloud bursts for heavy generation
  • Only anonymized content leaves appliance
  • Cost-effective scaling

Webhook Notifications

Real-time alerts when ReviewBee takes action:

  • Review completion events
  • Revision applied/rejected
  • PII safety blocks
  • Configurable filters

ReviewBee is STING’s commitment to quality — one unified reviewer for all AI outputs.

Last updated: