Docs · harness library

From personal prompt to company tool

A harness is a curated, validated prompt. RUQA's harness lifecycle takes a prompt from one person's experiment to a versioned, governed tool the whole company uses.

Lifecycle

Four stages

Personal harness

You write a prompt. Run it in the sandbox until you like the output. Save it as a personal harness — only you can see and run it.

// Personal: ruqa bug triage v3
// Owner: ey · Scope: personal
// Last sandbox score: 91 / 100

Sandbox validation

Test the harness across Claude, GPT, Gemini, Llama with 5+ representative inputs. RUQA computes a rubric score (format, coherence, accuracy, brevity, actionability).

// Test cases: 8
// Avg rubric: 91 / 100 (claude=94, gpt=89, gemini=92, llama=88)
// p95 latency: 2.4s · est cost / run: $0.014

Promote to company

When 5+ teammates have run your personal harness in their own work and rated it ≥4/5, the system suggests promotion. Promotion locks the prompt at a version and writes a changelog entry.

// Company: bug triage v3 (promoted from ey)
// Scope: company · Reviewed by: 6 · Avg rating: 4.5 / 5
// Changelog: 2026-04-22 promoted from personal · 2026-04-30 v3.1 typo fix

Versioning

Every edit creates a version. Old versions remain runnable for reproducibility — important when a previously-shipped outcome was generated by an older prompt.

// v3.1 (current)
// v3.0 (deprecated, still callable)
// v2.4 (locked, used by 47 historical outcomes)

Tips

Things that surprise people

Personal harnesses can use placeholders like {{repo}}, {{outcome_title}}, {{user_role}} that RUQA fills in at run time.

Don't put secrets in harness bodies — they're shared once promoted. Use env-var placeholders ({{env.API_KEY}}) for tokens.

Tag harnesses with categories (review, decision, ops, comms). The auto-recommender on the outcome creation flow surfaces the top 3.

If a company harness drifts (avg rating drops below 3.5 for 14 days), it's auto-flagged for review.

Sandbox is where every harness is born

Read how the 4-LLM sandbox + rubric scoring works.

Sandbox docs