Product Designer
Product Designer · AI Tools
2026
Five AI skills I built because generic feedback was making my work worse
AI assistants were giving me UX feedback that sounded credible but wasn't grounded in anything. Same vague observations, same aesthetic-first suggestions, same heuristics applied without context. So I built five Claude skills that fix that, each one covering a different stage of the design process, all available on GitHub.
5
Skills built and shipped
100%
Open source, free to use
0%
Unlabeled assumptions in any output
7
Design principles grounding the framework
Product Designer
Product Designer · AI Tools
2026
Five AI skills I built because generic feedback was making my work worse
AI assistants were giving me UX feedback that sounded credible but wasn't grounded in anything. Same vague observations, same aesthetic-first suggestions, same heuristics applied without context. So I built five Claude skills that fix that, each one covering a different stage of the design process, all available on GitHub.
5
Skills built and shipped
100%
Open source, free to use
0%
Unlabeled assumptions in any output
7
Design principles grounding the framework
It sounds right. It usually isn't grounded in anything.
I started noticing a pattern in AI-assisted UX work. Ask for a critique of a screen and you get: "consider improving visual hierarchy," "the typography could be cleaner," "users might find this confusing." All defensible. None of it specific enough to act on. None of it citing which principle was violated, what the actual contrast ratio is, or what the alternative should look like.
Worse, the feedback often worked backwards from aesthetics. It would suggest a visual change first and then construct a rationale for why that change was about usability. That's not how good design feedback works. Good design feedback starts with whether the user can complete the task.
The other problem was hallucination confidence. AI tools would state accessibility requirements incorrectly, cite heuristics with the wrong attribution, or flag issues using made-up success criteria. Because the output sounded authoritative, it was easy to act on without verification. That's a reliability problem in a discipline where precision matters.
The core constraint I set
Every skill had to declare when it was stating a fact, when it was making an estimate, and when it was drawing an inference. Not as a disclaimer at the bottom. As a label on every individual claim. If the output couldn't be labeled with a confidence level, it shouldn't be in the output.]
One rule that governs all five skills
Before building any individual skill, I needed a hierarchy that applied consistently across all of them. The problem with generic AI UX feedback is partly a vocabulary problem, but it's more fundamentally a priority problem: it treats aesthetics as a first-class concern when it should be the last thing considered.
01
Fuction
02
Clarity
03
Acessibility
04
Feedback
05
Efficiency
06
Consistency
07
Aesthetics
This ordering isn't original. It's grounded in established UX theory, usability research, and accessibility standards. What made it useful as a skill constraint was making it explicit and non-negotiable. Every skill must evaluate in this order. Aesthetic feedback cannot precede functional feedback.
The labeling system works in three tiers: DATA for verifiable measurements (contrast ratios, target sizes in pixels, specific WCAG criteria), ESTIMATED for reasonable inferences from available information (likely user behaviour, probable technical constraints), and INFERENCE for interpretive conclusions that require explicit acknowledgment as such.
What each one does and why it exists
UX Audit
Takes a screen or component and returns a structured accessibility and usability audit. Every finding includes the exact hex values for contrast pairs, touch target measurements against WCAG 2.5.5, the specific criterion that is or isn't being met, and a precise correction. The problem it replaces: "this button has low contrast" with no numbers. The output it produces: "foreground #8A8A8A on background #FFFFFF has a contrast ratio of 3.5:1. WCAG AA requires 4.5:1 for normal text. Change foreground to #757575 or darker."
Output labels
Contrast ratio with hex pairs
WCAG level (AA/AAA)
Touch target px vs. 44x44 minimum
Specific fix with exact values
Design Critique
Works like a code review for design: what's wrong, exactly where, which established principle it violates, and how to fix it. Grounded in Nielsen's heuristics, Fitts' Law, Gestalt principles, Von Restorff effect, and WCAG 2.1. Every critique item follows a WHAT / PRINCIPLE / FIX structure.
Prevents the most common critique failure: giving feedback that can't be acted on because it doesn't name the principle being violated. "This is visually noisy" is not a critique. "The secondary CTA has equal visual weight to the primary CTA, violating figure-ground distinction (Gestalt). Reduce secondary button weight to ghost/outline style" is.
Output structure per issue
WHAT: exact element
PRINCIPLE: named heuristic
FIX: specific change
UX Design
Handles the strategic layer that most AI tools skip. Before any visual tool is opened, there's a set of decisions that should be made: what is the flow, what are the states, what does the error handling look like, what is the microcopy. This skill works through those decisions in the strict function-first hierarchy.
Covers onboarding flows, authentication, microcopy review, information architecture, and developer handoff specifications. The output is a structured design direction, not a visual mockup.
Coverage areas
Flow structure
State definitions
Error handling
Microcopy
Dev handoff spec
Discovery
Generates a structured discovery document in HTML, including competitive benchmarks and gap analysis. The document is organized into four sections: project metadata and scope, classified findings, benchmarking table, and actionable recommendations.
The critical design decision was requiring every claim to carry a label. A finding from a published study is DATA. A pattern inferred from product teardowns is ESTIMATED. An interpretive conclusion based on qualitative signals is INFERENCE. This prevents the common problem of research documents that blend facts and assumptions without distinction.
Document sections
Project metadata
Classified findings
Benchmark table
Gap analysis
Recommendations
Frontend Spec
AI-generated frontend code routinely ignores design decisions: it produces technically correct implementations that miss the states, the transitions, and the token usage that make the component actually match the design. This skill detects context first (product interface vs. marketing), respects existing design tokens, and defines all component states before any styling begins.
Output covers: default, hover, focus, active, disabled, loading, error, and success states. Each state has exact token references, not hardcoded values.
States covered per component
Default
Hover
Focus
Success
Active
Disable
Loading
Error
Three choices that defined the project
Open source from day one
Keeping the skills private would have made them a personal productivity tool. Open-sourcing them makes them a contribution. Anyone who has the same frustration with generic AI feedback can fork, modify, or build on top of them. That also creates an implicit accountability layer: if the framework has gaps, they'll get found.
Named principles, not general advice
Every critique must name the principle being applied: not "this is unclear" but "this violates Hick's Law because the user is being presented with 9 options of similar weight." Named principles are falsifiable. They can be agreed or disagreed with. Unnamed principles are just opinions dressed up as expertise.
Confidence labeling as a hard requirement
In research and analytics work, it's standard practice to label confidence levels. In design critique, almost nobody does it. Every claim in these skills must be labeled DATA, ESTIMATED, or INFERENCE. A critique with unlabeled assumptions is just an opinion. The label forces the skill to stay within what can actually be verified.
One skill per design stage
A single general-purpose "design assistant" produces averaged, context-free output. Separating by stage means each skill can be opinionated about what matters at that specific moment: a discovery document and a dev handoff spec have completely different success criteria, and should be treated as different problems.
The output of the project
Five skills, all available on GitHub under an open-source license. Each skill is a structured prompt with a built-in framework: the hierarchy, the labeling rules, the principle grounding, and the output format. Anyone can use them in Claude chat mode without any setup beyond loading the skill.
I use all five in my daily work. The UX Audit runs on every screen before I present a critique. The Discovery skill replaced a document template I had been maintaining manually. The Frontend Spec has significantly reduced the back-and-forth between my design files and what engineers actually build.
The broader point is that the skills are a design argument, not just a productivity tool. They argue for a specific approach to AI-assisted design work: grounded in named principles, transparent about confidence levels, and structured around function before aesthetics. That argument is embedded in every output they produce.
Open Source. Free to use.
All 5 skills are available on GitHub. Download, modify, or
build on top of them.
What I specifically did
Identified the failure modes in generic AI design feedback: no principle grounding, aesthetics-first evaluation, unlabeled confidence levels, and non-actionable specificity
Defined the function-first hierarchy as the non-negotiable constraint for all five skills, drawing from Nielsen's heuristics, accessibility standards, and UX research
Designed and wrote the DATA/ESTIMATED/INFERENCE labeling system and embedded it as a structural requirement in every skill output
Built and iterated all five skills: UX Audit, Design Critique, UX Design, Discovery, and Frontend Spec
Open-sourced the full repository on GitHub with documentation for each skill
One rule that governs all five skills
Before building any individual skill, I needed a hierarchy that applied consistently across all of them. The problem with generic AI UX feedback is partly a vocabulary problem, but it's more fundamentally a priority problem: it treats aesthetics as a first-class concern when it should be the last thing considered.
01
Fuction
02
Clarity
03
Acessibility
04
Feedback
05
Efficiency
06
Consistency
07
Aesthetics
This ordering isn't original. It's grounded in established UX theory, usability research, and accessibility standards. What made it useful as a skill constraint was making it explicit and non-negotiable. Every skill must evaluate in this order. Aesthetic feedback cannot precede functional feedback.
The labeling system works in three tiers: DATA for verifiable measurements (contrast ratios, target sizes in pixels, specific WCAG criteria), ESTIMATED for reasonable inferences from available information (likely user behaviour, probable technical constraints), and INFERENCE for interpretive conclusions that require explicit acknowledgment as such.
What each one does and why it exists
UX Audit
Takes a screen or component and returns a structured accessibility and usability audit. Every finding includes the exact hex values for contrast pairs, touch target measurements against WCAG 2.5.5, the specific criterion that is or isn't being met, and a precise correction. The problem it replaces: "this button has low contrast" with no numbers. The output it produces: "foreground #8A8A8A on background #FFFFFF has a contrast ratio of 3.5:1. WCAG AA requires 4.5:1 for normal text. Change foreground to #757575 or darker."
Output labels
Contrast ratio with hex pairs
WCAG level (AA/AAA)
Touch target px vs. 44x44 minimum
Specific fix with exact values
Design Critique
Works like a code review for design: what's wrong, exactly where, which established principle it violates, and how to fix it. Grounded in Nielsen's heuristics, Fitts' Law, Gestalt principles, Von Restorff effect, and WCAG 2.1. Every critique item follows a WHAT / PRINCIPLE / FIX structure.
Prevents the most common critique failure: giving feedback that can't be acted on because it doesn't name the principle being violated. "This is visually noisy" is not a critique. "The secondary CTA has equal visual weight to the primary CTA, violating figure-ground distinction (Gestalt). Reduce secondary button weight to ghost/outline style" is.
Output structure per issue
WHAT: exact element
PRINCIPLE: named heuristic
FIX: specific change
UX Design
Handles the strategic layer that most AI tools skip. Before any visual tool is opened, there's a set of decisions that should be made: what is the flow, what are the states, what does the error handling look like, what is the microcopy. This skill works through those decisions in the strict function-first hierarchy.
Covers onboarding flows, authentication, microcopy review, information architecture, and developer handoff specifications. The output is a structured design direction, not a visual mockup.
Coverage areas
Flow structure
State definitions
Error handling
Microcopy
Dev handoff spec
Discovery
Generates a structured discovery document in HTML, including competitive benchmarks and gap analysis. The document is organized into four sections: project metadata and scope, classified findings, benchmarking table, and actionable recommendations.
The critical design decision was requiring every claim to carry a label. A finding from a published study is DATA. A pattern inferred from product teardowns is ESTIMATED. An interpretive conclusion based on qualitative signals is INFERENCE. This prevents the common problem of research documents that blend facts and assumptions without distinction.
Document sections
Project metadata
Classified findings
Benchmark table
Gap analysis
Recommendations
Frontend Spec
AI-generated frontend code routinely ignores design decisions: it produces technically correct implementations that miss the states, the transitions, and the token usage that make the component actually match the design. This skill detects context first (product interface vs. marketing), respects existing design tokens, and defines all component states before any styling begins.
Output covers: default, hover, focus, active, disabled, loading, error, and success states. Each state has exact token references, not hardcoded values.
States covered per component
Default
Hover
Focus
Success
Active
Disable
Loading
Error
Three choices that defined the project
Open source from day one
Keeping the skills private would have made them a personal productivity tool. Open-sourcing them makes them a contribution. Anyone who has the same frustration with generic AI feedback can fork, modify, or build on top of them. That also creates an implicit accountability layer: if the framework has gaps, they'll get found.
Named principles, not general advice
Every critique must name the principle being applied: not "this is unclear" but "this violates Hick's Law because the user is being presented with 9 options of similar weight." Named principles are falsifiable. They can be agreed or disagreed with. Unnamed principles are just opinions dressed up as expertise.
Confidence labeling as a hard requirement
In research and analytics work, it's standard practice to label confidence levels. In design critique, almost nobody does it. Every claim in these skills must be labeled DATA, ESTIMATED, or INFERENCE. A critique with unlabeled assumptions is just an opinion. The label forces the skill to stay within what can actually be verified.
One skill per design stage
A single general-purpose "design assistant" produces averaged, context-free output. Separating by stage means each skill can be opinionated about what matters at that specific moment: a discovery document and a dev handoff spec have completely different success criteria, and should be treated as different problems.
The output of the project
Five skills, all available on GitHub under an open-source license. Each skill is a structured prompt with a built-in framework: the hierarchy, the labeling rules, the principle grounding, and the output format. Anyone can use them in Claude chat mode without any setup beyond loading the skill.
I use all five in my daily work. The UX Audit runs on every screen before I present a critique. The Discovery skill replaced a document template I had been maintaining manually. The Frontend Spec has significantly reduced the back-and-forth between my design files and what engineers actually build.
The broader point is that the skills are a design argument, not just a productivity tool. They argue for a specific approach to AI-assisted design work: grounded in named principles, transparent about confidence levels, and structured around function before aesthetics. That argument is embedded in every output they produce.
Open Source. Free to use.
All 5 skills are available on GitHub. Download, modify, or
build on top of them.
What I specifically did
Identified the failure modes in generic AI design feedback: no principle grounding, aesthetics-first evaluation, unlabeled confidence levels, and non-actionable specificity
Defined the function-first hierarchy as the non-negotiable constraint for all five skills, drawing from Nielsen's heuristics, accessibility standards, and UX research
Designed and wrote the DATA/ESTIMATED/INFERENCE labeling system and embedded it as a structural requirement in every skill output
Built and iterated all five skills: UX Audit, Design Critique, UX Design, Discovery, and Frontend Spec
Open-sourced the full repository on GitHub with documentation for each skill
