Core Web Vitals in 2026
Google's Core Web Vitals are three numbers (LCP, INP, CLS) that quietly affect search rankings and explicitly affect user experience. They were designed for content sites. AI apps challenge their assumptions: there's no “largest contentful paint” when the most meaningful content is a streamed AI response that arrives 4 seconds in.
The good news: most AI apps land in a hybrid pattern (static marketing + interactive AI surface) where the marketing pages can hit excellent CWV and the AI surface optimizes for different metrics. The mistake is treating both surfaces the same.
The Tension Between AI UX and CWV
A pure AI product surface has three uncomfortable interactions with CWV:
- The “largest content” arrives after a network round-trip and 4–15 seconds of model generation — not within 2.5s.
- The page may be interactive long before the AI output finishes — INP can be great while “perceived speed” is bad.
- AI-generated content frequently shifts layout as new tokens arrive — CLS can spike artificially.
The fix is not to game the metrics. It's to optimize for the actual user experience and let the metrics catch up.
LCP — Largest Contentful Paint
Target: under 2.5 seconds. Means the largest meaningful element of the page loads quickly.
For AI apps, the largest element is usually the chat shell or the input box — not the AI output. Optimize for the shell: preload the font, inline critical CSS, defer non-critical JS, use a CDN. Hit the LCP target with the shell alone. The AI output arrives later; that's fine.
INP — Interaction to Next Paint
Target: under 200ms. Means the page responds quickly to user input.
This is the most important CWV for AI apps because it's about responsiveness, not load time. The trap: heavy JS bundles that block the main thread when the user clicks the AI button. Three fixes:
- Dynamic-import the AI logic. Don't load it until the user is about to use it.
- Use Web Workers for any client-side processing (token counting, parsing).
- Avoid blocking animations during AI generation.
- • Heavy syntax highlighting on streamed code blocks.
- • Re-rendering the whole conversation on each token.
- • Synchronous markdown rendering on every chunk.
- • Auto-scroll that fights user scroll.
- • Token-counting JS that runs on every keypress.
CLS — Cumulative Layout Shift
Target: under 0.1. Means the page doesn't shift unexpectedly as it loads.
For AI apps, the streaming response inherently shifts layout as new tokens arrive. The fix is not to stop the shift — it's to make the layout shift predictable. Reserve space for AI output before it arrives. Pin the input box. Use min-heighton streaming containers so they don't collapse and re-expand.
AI-Specific Performance Metrics To Add
Beyond CWV, track these for AI apps:
- Time to first token (TTFT) — how long until the AI starts streaming. Target: under 800ms.
- Tokens per second — streaming speed once it starts. Target: above 30 tps.
- End-to-end latency — click to final output. Target depends on use case; for chat, under 10s.
- Bounce on first prompt — users who type one prompt and leave.
High-Impact Improvements
- Static-export the marketing surface. Free CWV wins.
- Dynamic-import heavy AI components. Reduces initial JS.
- Preconnect to your AI endpoint. Shaves 100–300ms off first request.
- Stream from edge. Cuts TTFT significantly.
- Reserve layout space for streaming content.
- Use semantic markdown rendering that batches updates, not per-token re-renders.
Core Web Vitals on AI apps measure the shell, not the AI. Optimize the shell to land in the green zone, then track the AI-specific metrics (TTFT, tokens per second) for the experience that actually matters.
For more on the architecture see Next.js static export and streaming AI responses.
FAQ
Does CWV affect SEO for AI apps? Yes for indexable pages. For authenticated app surfaces, less directly but still affects user retention.
What tool to monitor? Vercel Speed Insights, PageSpeed Insights, or any RUM platform. The key is real-user data, not lab-only.
Should we lazy-load the AI shell itself?No — the user came to use AI. Lazy-load the heavy bits inside it.