We Ran Axe-Core On AI-Generated UI Code. The Findings Surprised Us.


We asked an AI coding assistant for five UI components that show up in almost every SaaS app: a login form, a pricing card, a confirmation modal, a top navigation bar, and a dashboard stats card. Same prompts you would probably type yourself on a Friday afternoon. Then we piped each result into axe-core through the same jsdom-based scanner we use inside our own build pipeline.

Here is the top-line number: 3 WCAG violations across 5 components, all of the same rule. That is a lot better than we expected. It is also less reassuring than it sounds. The interesting story is in what axe-core could not see.

The test setup, in boring detail

We used Claude Sonnet via the claude CLI as the code generator. No system prompt, no style preamble, just the same sort of one-line request a developer would paste into any AI coder — Cursor, v0, Bolt, Lovable, Claude Code — and expect back a component. The exact prompts and outputs live in our audit workspace; for reference, the login prompt was “Create a login form component as a single HTML snippet. Email input, password input, a remember me checkbox, a primary Sign in button, and a forgot password link. Use Tailwind classes.” The other four were the same shape.

For the audit itself we used jsdom for the DOM, axe-core 4.11 for the rules, WCAG 2.1 AA and WCAG 2.2 AA tags enabled, plus the best-practice tag so landmark rules would actually fire. We disabled the color-contrast rule because jsdom cannot resolve Tailwind’s JIT classes into real computed colors, so any result it gave us would be noise. We verified contrast by hand against the Tailwind palette instead — more on that in a moment.

Each component was wrapped in a minimal <!doctype html> page with no <main>, because that is what happens the instant a developer drops an AI-generated snippet into a blank page.tsx. If we padded the test harness with landmarks, we would be auditing our harness, not the component.

What axe-core flagged

Three violations. All of them the same rule: region, impact moderate. Eight nodes in the pricing card, four in the login form, one in the dashboard card. The modal and the navbar came back clean.

region fires when content lives outside of a landmark element — no <main>, no <nav>, no <section aria-label>. It is the rule screen reader users feel most directly, because their jump-to-landmark shortcut is how they navigate a page. When content lives in the void, they have to arrow through every element to find it.

The same two checks showed up as incomplete on every component except the modal: landmark-one-main and page-has-heading-one. Incomplete results are not violations — axe is telling you it cannot decide from static analysis alone and needs a human to confirm. In practice, on a component snippet, these resolve to “your host page had better supply the <main> and the <h1>, because this component is not going to.” That is fine if the developer knows. The failure mode is that most developers do not read the incomplete list.

What the AI actually got right

This part we did not expect to write.

The login form has proper <label for="email"> / id="email" pairing, autocomplete="email" and autocomplete="current-password" hints, required attributes, and visible focus:ring-2 styles on every input and the submit button. The password input is a type="password" input, which sounds like a non-achievement until you remember how many tutorials ship a type="text" field because it was “easier to test.”

The modal is the bigger surprise. It came back with role="dialog", aria-modal="true", aria-labelledby="modal-title", aria-describedby="modal-description", a close button with an explicit aria-label="Close", and an aria-hidden="true" on the decorative X icon. That is the full ARIA triad, correctly wired, on a first-shot prompt. A year ago this same prompt in the same tool would have given us a <div> with a close <span> and nothing else.

The navbar uses <nav>, wraps its links in <ul><li>, and uses real <a> elements for every destination including the “Sign up” call-to-action — not a <button> with a click handler pretending to be navigation. Semantic baseline, respected.

Color contrast, checked manually against the Tailwind palette: text-gray-700 on white is 10.4:1, the text-gray-500 used for secondary text is 5.6:1, the bg-indigo-600 button with white text is 6.1:1. All comfortably past WCAG AA’s 4.5:1 threshold. Nothing failed. Including this because it is true, and skipping it would make this post dishonest.

The gap that axe-core cannot see

Here is where we have to slow down. A zero-violation automated scan on four of five components does not mean those components are accessible. It means they passed the checks that a static HTML parser is capable of running. The rest of WCAG lives in behavior.

The modal is the clearest example. It has every ARIA attribute a screen reader needs to announce it as a dialog. It has zero JavaScript. Open that modal in a real browser and press Tab: focus walks right off the “Delete” button and into whatever link is underneath the backdrop. A keyboard user has no way to know they have left the dialog, and no way to escape it with the Escape key, because nothing is listening. Axe-core cannot detect this. It audits a tree, not a runtime.

The dashboard card is another one. Semantically, it is a <div> containing a <p> for the label and a <p> for the value. Visually it reads as “Monthly Revenue: $48,210.” To a screen reader it reads as two disconnected paragraphs. A proper card would use a heading (<h3> or the DL pattern with <dt> and <dd>) so the metric and its label are announced as a unit. Axe does not flag this because two <p> tags are valid HTML. They are just not meaningful HTML for this context.

The pricing card has the opposite flavor of the same problem. Its green checkmark SVGs are decorative — the feature name next to them is the actual content — but they have no aria-hidden="true" and no role="img" with a label. Axe did not flag them either, because axe is conservative about SVG. A verbose screen reader will still read “graphic” before every list item. Small paper cut, repeated five times per card.

Why this pattern shows up

AI coders have been trained on an enormous amount of HTML that mostly looks like the tutorials and component libraries a human has already published. So they are very good at reproducing surface correctness: label associations, ARIA attribute names, Tailwind contrast-safe colors, semantic list elements. The patterns that are visible in a static snapshot of code got learned thoroughly.

The patterns that live in runtime behavior did not. Nobody writes a blog post about the exact event listener that closes a modal on Escape. It is buried in a hook, or a library, or it just works because everyone uses Radix. AI output optimizes for the version you can paste into a file. It does not optimize for the version you can actually use with a keyboard.

This is not a tooling critique. It is a lifecycle observation. The accessibility gap in AI-generated UI is no longer “it forgot the label.” It is “it remembered every attribute and forgot every interaction.”

What to actually do about it

Put axe-core in CI, not as a gate but as a signal. Even a modest run catches the regressions that AI coders still make — missing landmarks, missing alt text on real images, buttons without accessible names in the parts of the app that nobody thinks to regenerate. We published a walkthrough for wiring this up as a GitHub Action in five minutes if you want a template.

Then do three manual checks the first time you accept any AI-generated component. Press Tab through every interactive element and confirm the focus indicator is visible and the order matches the visual order. Open any dialog or menu and press Escape, then try to Tab out of it — if either fails, you have work to do. Turn on VoiceOver or NVDA for sixty seconds and listen to the component. Most of what axe cannot catch becomes obvious within the first ten announcements.

We are going to run this same audit monthly against different AI coders and different prompts. Partly because the tools are moving fast and what we found today will not be true in six months. Partly because AI-generated UI is going to eat a huge share of the frontend we all end up shipping, and someone should be tracking how the accessibility baseline moves — up or down — as that happens.

If you want the full component files, the axe-core output, and the exact prompts we used, they are in the audit workspace on blog.a11yfix.dev. Next month: we rerun this on a more realistic prompt — a full signup page, not a single component — and see what breaks when the AI has to remember context across sections.