Built with Claude Opus 4.7 · Anthropic hackathon 2026

Auditable Design

Decision infrastructure for product teams. Preserves the chain of reasoning from raw user feedback to defensible design direction.

Feedback enters the product process as evidence.
It leaves as opinion.
Auditable Design keeps the reasoning intact.

Why this exists

1000+ complaints
5 themes
roadmap item
backlog ticket
?

Six weeks later, nobody can defend why that decision was made. The reasoning disappears.

Generic LLMs
summarize the corpus
Research platforms
organize studies (15 users, weeks)
Analytics tools
show behaviour, not why
Auditable Design
creates the missing layer: a traceable argument for what should change next

The pipeline

Eight steps. One brief at the end.

Each layer has a named purpose, typed inputs, typed outputs, and an append-only provenance trail. Hover a layer to expand.

The cluster

Representative user quotes

Informing reviews

review IDs aggregated by the clustering step, each traceable to a specific user complaint.

Measured pain

What actually hurts — named defects, total severity

Six design lenses read the same cluster independently — Don Norman on usability, WCAG on accessibility, Kahneman on decision psychology, Osterwalder on business alignment, Cooper on interaction design, Garrett on UX architecture. The pipeline merges their overlapping findings into one ranked list. Each defect gets one severity score on a five-point scale: 0 absent, 3 minor, 5 moderate, 7 serious, 9 critical.

Grounded verification

Claude Opus 4.7 checks every hypothesis against the real product

Review text says what users feel. Real product pixels say what they see. Opus 4.7 reads the Duolingo screenshots and scores each heuristic hypothesis as confirmed, partial, or refuted — with cited UI details — so the designer knows which review-inferred defects hold up on the actual product.

Energy screen
Energy management surface
Non-blocking surface · 22h regeneration label visible.
Out of energy · between lessons
Out of energy — home
Super pre-checked · Recharge at 450 gems · Quit muted.
Out of energy · mid-lesson
Out of energy mid-lesson
"Translate this sentence" visible above · "LOSE XP" as punitive non-paid exit. The surface this hero cluster is about.

Duolingo is used as a public, worldwide-recognised freemium example — not as a target. Every finding is anchored in a literal quote from a publicly available Google Play review, every screenshot is what a paying user sees.

The method is product-agnostic.

What the product evidence shows

Opus 4.7 confirmed every baseline heuristic on the product. 6 confirmed · 1 partial · 0 refuted (of 7 heuristics). The review-inferred pain holds up under pixel inspection.

Beyond confirmation, the grounding step surfaced a defect the baseline heuristic list did not name: a pricing inconsistency across surfaces — Super is 500 gems on the energy screen and 450 gems on the paywall modal, same product, two price points in the same session. Not a single-surface heuristic; it requires comparing values across screenshots. Flagged as a candidate for the next clustering cycle.

The validated direction

One direction generated · re-audited · converged

The pipeline first proposes a before/after direction grounded in the measured defects, then re-audits that direction against the same heuristic list to measure how much pain actually drops. Where residual severity remains, a refinement loop tweaks further under a scored verifier — accepting changes that reduce pain without regression, rejecting ones that do not. The full iteration trail, including rejected attempts, lives in the shipping brief below.

Before — current product
9:41
9087
You ran out of energy!
SUPER
Unlimited
Try free
Recharge
450
Try 1 week for free
Quit lesson
Blame copy · Super pre-selected · price hidden behind "free trial" · "Quit lesson" framed as exit.
After — generated brief
9:43
Nice work!
+10 XP · 47-day streak
You're out of energy
We saved your spot. Pick what's next.
  • Get Super
    $6.99/month · cancel anytime
  • Watch an ad
    Refills your energy
  • Come back tomorrow
    Refills overnight
Neutral copy · streak protected by default · three equal paths · pricing transparent · no pre-selection.

The shipping artifact

One markdown. Ten sections. Full audit trail.

The designer opens a single document and begins. Every finding traces back to evidence. Every iteration — accepted or rejected — is preserved. The agent does not commit, does not merge, does not ship; the work begins here and is owned by the human team.

design_brief_cluster11_opus47.md
1. Executive summary
2. User pain signal
3. Measured pain spaces 7 heuristics · grounded evidence
4. Priority reasoning 5-dimensional weighted score
5. Validated direction before / after · per-heuristic delta
6. Out-of-baseline observations defect not in baseline
7. Audit trail 2 iterations · 0 rejected
8. Signal quality indicators components, not rollup
9. Handoff notes guarantees vs non-guarantees
10. Provenance sha256 of every input file
The point is not to automate taste.
The point is to make product reasoning inspectable.

Not louder recommendations. Better accountability.