Streaming AI Responses on the Web: A Practical Guide

AI Systems

Why Streaming Matters For AI UX

A frontier-model response takes 4–15 seconds end-to-end. Without streaming, the user stares at a spinner the whole time. With streaming, the first words arrive in 200–800ms and the response unfolds smoothly. The subjective “feel” difference is enormous. Same model, same total time — entirely different product.

Streaming is not optional in 2026. Any AI feature longer than 1.5 seconds end-to-end must stream. The UX cost of not streaming is too high.

The Three Streaming Options On The Web

Server-Sent Events (SSE) — one-way server-to-client over HTTP. The default.
WebSocket — bi-directional, persistent connection.
Raw fetch with chunked transfer — the simplest, plain HTTP streaming.

Most AI use cases need exactly the first or third. WebSockets are over-engineered for one-shot AI responses; they earn their complexity only when the conversation is truly bi-directional with frequent user interrupts.

SSE: The Default Choice

SSE is HTTP with a streaming response body and a specific text format. The server sends events like:

data: {"text": "Hello"}

data: {"text": " world"}

data: [DONE]

The client uses EventSource or a fetch with a stream reader. SSE has automatic reconnection in the browser, which matters when networks flake. The trade-off: one-way only, and limited to 6 concurrent SSE connections per origin (browser limit). For AI, both of those are typically fine.

SSE Server-Side Pattern

• Set Content-Type to text/event-stream.
• Disable proxy buffering (X-Accel-Buffering: no on nginx).
• Flush after each chunk.
• Send a final [DONE] sentinel so the client knows to close.

WebSocket: When You Actually Need It

Use WebSocket when:

The user can interrupt the AI mid-stream and the server needs that input immediately.
You have multi-modal streams (audio + text simultaneously).
You're building a multi-user collaborative AI room.

Don't use WebSocket because it's “more modern.” The extra connection lifecycle, reconnect logic, and infrastructure cost rarely earns their keep for one-shot AI completions.

Raw fetch With Chunked Transfer

The simplest streaming: a plain fetch where the server returnsContent-Type: text/plain and the body is just text streamed in chunks. The client reads from response.body.getReader().

When this works: you don't need to distinguish events. The model is outputting plain text. No structured intermediate states (no “thinking, searching, writing” phases). It's the path of least resistance and we use it more often than SSE for raw text generation.

UX Of Streaming Done Right

Streaming is a UX pattern, not just a transport. Get these details right:

Cursor. A visible cursor showing “still generating” matters. Even a blinking pipe character beats no signal.
Smooth typing speed. Don't let tokens arrive in bursts. Buffer briefly and release smoothly — reads more natural than jittery batches.
Stop button. Always offer a way to interrupt mid-stream. Required for trust.
Status text. If the AI is doing multi-step work (retrieval, then generation), show what phase it's in.

Common Pitfalls

Proxy buffering. CDNs and proxies buffer responses by default. Disable for streaming routes.
Content-Length header. Setting it kills streaming. Don't.
Reverse-proxy timeouts. Some platforms cap stream duration at 30s. Plan around it.
Mobile flakiness. Long streams on mobile networks fail. Build retry/resume.
Token-counting client-side. Don't. The server should bill, the client shouldn't need to know.

The streaming AI response is the modern progress indicator. A still spinner tells the user nothing is happening. A streamed token tells them the system is alive and working. Choose to stream.

FAQ

Can we stream JSON responses? Use SSE with JSON-per-event, not chunked transfer of a partial JSON document.

Does streaming break SEO? Streaming is for interactive use only. SEO content should be statically rendered.

What about resume-after-disconnect? Possible with SSELast-Event-ID header but rarely worth the complexity for short AI streams.

Keep reading.

AI Systems8 min read

Why 87% of AI Implementations Fail — And What the 13% Do Differently

Most businesses invest in AI and see minimal returns. The problem isn’t the technology — it’s the approach.

May 14, 2025Read

Design6 min read

The Design-First Approach to Building AI Systems That People Actually Use

When everyone has the same AI models, design becomes the only real differentiator. Here’s what that means in practice.

May 7, 2025Read

Automation7 min read

5 Business Workflows You Should Automate with AI in 2025 (And How Much Time You’ll Save)

Not all automation is equal. These five workflows deliver the highest ROI for most businesses — with real numbers.

April 29, 2025Read