stream=True.
What it looks like in code
Instead of a singleresponse object you get an iterator. Each iteration yields a chunk, and the useful text lives in chunk.choices[0].delta.content. Append it to your output as it arrives.
data: {...} lines, each carrying its own fragment. The end of the stream is the line data: [DONE].
Switching models
As with ordinary requests, switching models means changing only themodel field — the streaming code stays the same. For example, gpt-5, gemini-3.5-flash or deepseek-v3:
Streaming over the Anthropic protocol
If you work through the native Anthropic protocol (Claude Code, Anthropic SDK), streaming is supported there too — viaclient.messages.stream(...). Note the address here is without /v1:
Gotchas
delta.content is None on the final chunk
delta.content is None on the final chunk
The last fragment of the stream usually carries metadata (such as
finish_reason) rather than text — so delta.content there is None (or undefined in Node.js). Always check the value before printing or concatenating it, otherwise you’ll hit a TypeError on concatenation.Getting token usage while streaming
Getting token usage while streaming
By default the final usage stats (A separate chunk with a
usage) are not sent in the stream. To get them, add the stream_options parameter to your request:usage field then arrives at the very end of the stream. That chunk’s choices list is empty — account for it when parsing.The response arrives all at once instead of piece by piece
The response arrives all at once instead of piece by piece
If the text shows up in one block rather than typing out gradually, there’s probably a proxy or load balancer between you and the API that buffers SSE. Make sure response buffering is disabled (e.g.
proxy_buffering off; in nginx) and that your HTTP client isn’t accumulating the stream itself. With curl, add the -N (--no-buffer) flag to be safe.What’s next
- Quickstart — sign up, get a key, make your first request
- Claude models — picking a model and what it can do
- Telegram bot — a real project where streaming shines
- Errors explained — what to do when a request fails