From personal prompt to company tool
A harness is a curated, validated prompt. RUQA's harness lifecycle takes a prompt from one person's experiment to a versioned, governed tool the whole company uses.
Four stages
Personal harness
You write a prompt. Run it in the sandbox until you like the output. Save it as a personal harness — only you can see and run it.
// Personal: ruqa bug triage v3 // Owner: ey · Scope: personal // Last sandbox score: 91 / 100
Sandbox validation
Test the harness across Claude, GPT, Gemini, Llama with 5+ representative inputs. RUQA computes a rubric score (format, coherence, accuracy, brevity, actionability).
// Test cases: 8 // Avg rubric: 91 / 100 (claude=94, gpt=89, gemini=92, llama=88) // p95 latency: 2.4s · est cost / run: $0.014
Promote to company
When 5+ teammates have run your personal harness in their own work and rated it ≥4/5, the system suggests promotion. Promotion locks the prompt at a version and writes a changelog entry.
// Company: bug triage v3 (promoted from ey) // Scope: company · Reviewed by: 6 · Avg rating: 4.5 / 5 // Changelog: 2026-04-22 promoted from personal · 2026-04-30 v3.1 typo fix
Versioning
Every edit creates a version. Old versions remain runnable for reproducibility — important when a previously-shipped outcome was generated by an older prompt.
// v3.1 (current) // v3.0 (deprecated, still callable) // v2.4 (locked, used by 47 historical outcomes)
Things that surprise people
- Personal harnesses can use placeholders like {{repo}}, {{outcome_title}}, {{user_role}} that RUQA fills in at run time.
- Don't put secrets in harness bodies — they're shared once promoted. Use env-var placeholders ({{env.API_KEY}}) for tokens.
- Tag harnesses with categories (review, decision, ops, comms). The auto-recommender on the outcome creation flow surfaces the top 3.
- If a company harness drifts (avg rating drops below 3.5 for 14 days), it's auto-flagged for review.
Sandbox is where every harness is born
Read how the 4-LLM sandbox + rubric scoring works.