Product Designer

Product Designer · AI Tools

2026

Five AI skills I built because generic feedback was making my work worse

AI assistants were giving me UX feedback that sounded credible but wasn't grounded in anything. Same vague observations, same aesthetic-first suggestions, same heuristics applied without context. So I built five Claude skills that fix that, each one covering a different stage of the design process, all available on GitHub.

5

Skills built and shipped

100%

Open source, free to use

0%

Unlabeled assumptions in any output

7

Design principles grounding the framework

Product Designer

Product Designer · AI Tools

2026

Five AI skills I built because generic feedback was making my work worse

AI assistants were giving me UX feedback that sounded credible but wasn't grounded in anything. Same vague observations, same aesthetic-first suggestions, same heuristics applied without context. So I built five Claude skills that fix that, each one covering a different stage of the design process, all available on GitHub.

5

Skills built and shipped

100%

Open source, free to use

0%

Unlabeled assumptions in any output

7

Design principles grounding the framework

It sounds right. It usually isn't grounded in anything.

I started noticing a pattern in AI-assisted UX work. Ask for a critique of a screen and you get: "consider improving visual hierarchy," "the typography could be cleaner," "users might find this confusing." All defensible. None of it specific enough to act on. None of it citing which principle was violated, what the actual contrast ratio is, or what the alternative should look like.

Worse, the feedback often worked backwards from aesthetics. It would suggest a visual change first and then construct a rationale for why that change was about usability. That's not how good design feedback works. Good design feedback starts with whether the user can complete the task.

The other problem was hallucination confidence. AI tools would state accessibility requirements incorrectly, cite heuristics with the wrong attribution, or flag issues using made-up success criteria. Because the output sounded authoritative, it was easy to act on without verification. That's a reliability problem in a discipline where precision matters.

The core constraint I set

Every skill had to declare when it was stating a fact, when it was making an estimate, and when it was drawing an inference. Not as a disclaimer at the bottom. As a label on every individual claim. If the output couldn't be labeled with a confidence level, it shouldn't be in the output.]

One rule that governs all five skills

Before building any individual skill, I needed a hierarchy that applied consistently across all of them. The problem with generic AI UX feedback is partly a vocabulary problem, but it's more fundamentally a priority problem: it treats aesthetics as a first-class concern when it should be the last thing considered.


01

Fuction

02

Clarity

03

Acessibility

04

Feedback

05

Efficiency

06

Consistency

07

Aesthetics


This ordering isn't original. It's grounded in established UX theory, usability research, and accessibility standards. What made it useful as a skill constraint was making it explicit and non-negotiable. Every skill must evaluate in this order. Aesthetic feedback cannot precede functional feedback.

The labeling system works in three tiers: DATA for verifiable measurements (contrast ratios, target sizes in pixels, specific WCAG criteria), ESTIMATED for reasonable inferences from available information (likely user behaviour, probable technical constraints), and INFERENCE for interpretive conclusions that require explicit acknowledgment as such.

What each one does and why it exists

UX Audit

Takes a screen or component and returns a structured accessibility and usability audit. Every finding includes the exact hex values for contrast pairs, touch target measurements against WCAG 2.5.5, the specific criterion that is or isn't being met, and a precise correction. The problem it replaces: "this button has low contrast" with no numbers. The output it produces: "foreground #8A8A8A on background #FFFFFF has a contrast ratio of 3.5:1. WCAG AA requires 4.5:1 for normal text. Change foreground to #757575 or darker."

Output labels

  • Contrast ratio with hex pairs

  • WCAG level (AA/AAA)

  • Touch target px vs. 44x44 minimum

  • Specific fix with exact values

Design Critique

Works like a code review for design: what's wrong, exactly where, which established principle it violates, and how to fix it. Grounded in Nielsen's heuristics, Fitts' Law, Gestalt principles, Von Restorff effect, and WCAG 2.1. Every critique item follows a WHAT / PRINCIPLE / FIX structure.

Prevents the most common critique failure: giving feedback that can't be acted on because it doesn't name the principle being violated. "This is visually noisy" is not a critique. "The secondary CTA has equal visual weight to the primary CTA, violating figure-ground distinction (Gestalt). Reduce secondary button weight to ghost/outline style" is.

Output structure per issue

  • WHAT: exact element

  • PRINCIPLE: named heuristic

  • FIX: specific change

UX Design

Handles the strategic layer that most AI tools skip. Before any visual tool is opened, there's a set of decisions that should be made: what is the flow, what are the states, what does the error handling look like, what is the microcopy. This skill works through those decisions in the strict function-first hierarchy.

Covers onboarding flows, authentication, microcopy review, information architecture, and developer handoff specifications. The output is a structured design direction, not a visual mockup.

Coverage areas

  • Flow structure

  • State definitions

  • Error handling

  • Microcopy

  • Dev handoff spec

Discovery

Generates a structured discovery document in HTML, including competitive benchmarks and gap analysis. The document is organized into four sections: project metadata and scope, classified findings, benchmarking table, and actionable recommendations.

The critical design decision was requiring every claim to carry a label. A finding from a published study is DATA. A pattern inferred from product teardowns is ESTIMATED. An interpretive conclusion based on qualitative signals is INFERENCE. This prevents the common problem of research documents that blend facts and assumptions without distinction.

Document sections

  • Project metadata

  • Classified findings

  • Benchmark table

  • Gap analysis

  • Recommendations

Frontend Spec

AI-generated frontend code routinely ignores design decisions: it produces technically correct implementations that miss the states, the transitions, and the token usage that make the component actually match the design. This skill detects context first (product interface vs. marketing), respects existing design tokens, and defines all component states before any styling begins.

Output covers: default, hover, focus, active, disabled, loading, error, and success states. Each state has exact token references, not hardcoded values.

States covered per component

  • Default

  • Hover

  • Focus

  • Success

  • Active

  • Disable

  • Loading

  • Error

Three choices that defined the project

Open source from day one

Keeping the skills private would have made them a personal productivity tool. Open-sourcing them makes them a contribution. Anyone who has the same frustration with generic AI feedback can fork, modify, or build on top of them. That also creates an implicit accountability layer: if the framework has gaps, they'll get found.

Named principles, not general advice

Every critique must name the principle being applied: not "this is unclear" but "this violates Hick's Law because the user is being presented with 9 options of similar weight." Named principles are falsifiable. They can be agreed or disagreed with. Unnamed principles are just opinions dressed up as expertise.

Confidence labeling as a hard requirement

In research and analytics work, it's standard practice to label confidence levels. In design critique, almost nobody does it. Every claim in these skills must be labeled DATA, ESTIMATED, or INFERENCE. A critique with unlabeled assumptions is just an opinion. The label forces the skill to stay within what can actually be verified.

One skill per design stage

A single general-purpose "design assistant" produces averaged, context-free output. Separating by stage means each skill can be opinionated about what matters at that specific moment: a discovery document and a dev handoff spec have completely different success criteria, and should be treated as different problems.

The output of the project

Five skills, all available on GitHub under an open-source license. Each skill is a structured prompt with a built-in framework: the hierarchy, the labeling rules, the principle grounding, and the output format. Anyone can use them in Claude chat mode without any setup beyond loading the skill.

I use all five in my daily work. The UX Audit runs on every screen before I present a critique. The Discovery skill replaced a document template I had been maintaining manually. The Frontend Spec has significantly reduced the back-and-forth between my design files and what engineers actually build.

The broader point is that the skills are a design argument, not just a productivity tool. They argue for a specific approach to AI-assisted design work: grounded in named principles, transparent about confidence levels, and structured around function before aesthetics. That argument is embedded in every output they produce.

Open Source. Free to use.

All 5 skills are available on GitHub. Download, modify, or

build on top of them.

What I specifically did

  • Identified the failure modes in generic AI design feedback: no principle grounding, aesthetics-first evaluation, unlabeled confidence levels, and non-actionable specificity

  • Defined the function-first hierarchy as the non-negotiable constraint for all five skills, drawing from Nielsen's heuristics, accessibility standards, and UX research

  • Designed and wrote the DATA/ESTIMATED/INFERENCE labeling system and embedded it as a structural requirement in every skill output

  • Built and iterated all five skills: UX Audit, Design Critique, UX Design, Discovery, and Frontend Spec

  • Open-sourced the full repository on GitHub with documentation for each skill

One rule that governs all five skills

Before building any individual skill, I needed a hierarchy that applied consistently across all of them. The problem with generic AI UX feedback is partly a vocabulary problem, but it's more fundamentally a priority problem: it treats aesthetics as a first-class concern when it should be the last thing considered.


01

Fuction

02

Clarity

03

Acessibility

04

Feedback

05

Efficiency

06

Consistency

07

Aesthetics


This ordering isn't original. It's grounded in established UX theory, usability research, and accessibility standards. What made it useful as a skill constraint was making it explicit and non-negotiable. Every skill must evaluate in this order. Aesthetic feedback cannot precede functional feedback.

The labeling system works in three tiers: DATA for verifiable measurements (contrast ratios, target sizes in pixels, specific WCAG criteria), ESTIMATED for reasonable inferences from available information (likely user behaviour, probable technical constraints), and INFERENCE for interpretive conclusions that require explicit acknowledgment as such.

What each one does and why it exists

UX Audit

Takes a screen or component and returns a structured accessibility and usability audit. Every finding includes the exact hex values for contrast pairs, touch target measurements against WCAG 2.5.5, the specific criterion that is or isn't being met, and a precise correction. The problem it replaces: "this button has low contrast" with no numbers. The output it produces: "foreground #8A8A8A on background #FFFFFF has a contrast ratio of 3.5:1. WCAG AA requires 4.5:1 for normal text. Change foreground to #757575 or darker."

Output labels

  • Contrast ratio with hex pairs

  • WCAG level (AA/AAA)

  • Touch target px vs. 44x44 minimum

  • Specific fix with exact values

Design Critique

Works like a code review for design: what's wrong, exactly where, which established principle it violates, and how to fix it. Grounded in Nielsen's heuristics, Fitts' Law, Gestalt principles, Von Restorff effect, and WCAG 2.1. Every critique item follows a WHAT / PRINCIPLE / FIX structure.

Prevents the most common critique failure: giving feedback that can't be acted on because it doesn't name the principle being violated. "This is visually noisy" is not a critique. "The secondary CTA has equal visual weight to the primary CTA, violating figure-ground distinction (Gestalt). Reduce secondary button weight to ghost/outline style" is.

Output structure per issue

  • WHAT: exact element

  • PRINCIPLE: named heuristic

  • FIX: specific change

UX Design

Handles the strategic layer that most AI tools skip. Before any visual tool is opened, there's a set of decisions that should be made: what is the flow, what are the states, what does the error handling look like, what is the microcopy. This skill works through those decisions in the strict function-first hierarchy.

Covers onboarding flows, authentication, microcopy review, information architecture, and developer handoff specifications. The output is a structured design direction, not a visual mockup.

Coverage areas

  • Flow structure

  • State definitions

  • Error handling

  • Microcopy

  • Dev handoff spec

Discovery

Generates a structured discovery document in HTML, including competitive benchmarks and gap analysis. The document is organized into four sections: project metadata and scope, classified findings, benchmarking table, and actionable recommendations.

The critical design decision was requiring every claim to carry a label. A finding from a published study is DATA. A pattern inferred from product teardowns is ESTIMATED. An interpretive conclusion based on qualitative signals is INFERENCE. This prevents the common problem of research documents that blend facts and assumptions without distinction.

Document sections

  • Project metadata

  • Classified findings

  • Benchmark table

  • Gap analysis

  • Recommendations

Frontend Spec

AI-generated frontend code routinely ignores design decisions: it produces technically correct implementations that miss the states, the transitions, and the token usage that make the component actually match the design. This skill detects context first (product interface vs. marketing), respects existing design tokens, and defines all component states before any styling begins.

Output covers: default, hover, focus, active, disabled, loading, error, and success states. Each state has exact token references, not hardcoded values.

States covered per component

  • Default

  • Hover

  • Focus

  • Success

  • Active

  • Disable

  • Loading

  • Error

Three choices that defined the project

Open source from day one

Keeping the skills private would have made them a personal productivity tool. Open-sourcing them makes them a contribution. Anyone who has the same frustration with generic AI feedback can fork, modify, or build on top of them. That also creates an implicit accountability layer: if the framework has gaps, they'll get found.

Named principles, not general advice

Every critique must name the principle being applied: not "this is unclear" but "this violates Hick's Law because the user is being presented with 9 options of similar weight." Named principles are falsifiable. They can be agreed or disagreed with. Unnamed principles are just opinions dressed up as expertise.

Confidence labeling as a hard requirement

In research and analytics work, it's standard practice to label confidence levels. In design critique, almost nobody does it. Every claim in these skills must be labeled DATA, ESTIMATED, or INFERENCE. A critique with unlabeled assumptions is just an opinion. The label forces the skill to stay within what can actually be verified.

One skill per design stage

A single general-purpose "design assistant" produces averaged, context-free output. Separating by stage means each skill can be opinionated about what matters at that specific moment: a discovery document and a dev handoff spec have completely different success criteria, and should be treated as different problems.

The output of the project

Five skills, all available on GitHub under an open-source license. Each skill is a structured prompt with a built-in framework: the hierarchy, the labeling rules, the principle grounding, and the output format. Anyone can use them in Claude chat mode without any setup beyond loading the skill.

I use all five in my daily work. The UX Audit runs on every screen before I present a critique. The Discovery skill replaced a document template I had been maintaining manually. The Frontend Spec has significantly reduced the back-and-forth between my design files and what engineers actually build.

The broader point is that the skills are a design argument, not just a productivity tool. They argue for a specific approach to AI-assisted design work: grounded in named principles, transparent about confidence levels, and structured around function before aesthetics. That argument is embedded in every output they produce.

Open Source. Free to use.

All 5 skills are available on GitHub. Download, modify, or

build on top of them.

What I specifically did

  • Identified the failure modes in generic AI design feedback: no principle grounding, aesthetics-first evaluation, unlabeled confidence levels, and non-actionable specificity

  • Defined the function-first hierarchy as the non-negotiable constraint for all five skills, drawing from Nielsen's heuristics, accessibility standards, and UX research

  • Designed and wrote the DATA/ESTIMATED/INFERENCE labeling system and embedded it as a structural requirement in every skill output

  • Built and iterated all five skills: UX Audit, Design Critique, UX Design, Discovery, and Frontend Spec

  • Open-sourced the full repository on GitHub with documentation for each skill