Skip to main content
When you make a normal request, the client waits until the model writes its last word before showing anything. On a long answer that’s several seconds of silence. Streaming sends the text in pieces, as it’s being generated — the user watches the answer type itself out, and perceived latency drops to almost nothing. There’s nothing magic about it: instead of one JSON response, the server opens a Server-Sent Events (SSE) stream and pushes small fragments (chunks) over it. You turn this on with a single flag — stream=True.
Streaming runs on the same OpenAI-compatible endpoint as ordinary requests. Only the stream flag and the way you read the response change. The base_url stays the same: https://www.ruapi.ai/v1.

What it looks like in code

Instead of a single response object you get an iterator. Each iteration yields a chunk, and the useful text lives in chunk.choices[0].delta.content. Append it to your output as it arrives.
from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR_KEY",
    base_url="https://www.ruapi.ai/v1",
)

stream = client.chat.completions.create(
    model="claude-opus-4-8",  # exact names are on the pricing page of the main site
    messages=[{"role": "user", "content": "Tell me a short story about a cat."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:  # the final chunk can carry content=None — skip it
        print(delta, end="", flush=True)
print()
With curl the response arrives as a stream of data: {...} lines, each carrying its own fragment. The end of the stream is the line data: [DONE].

Switching models

As with ordinary requests, switching models means changing only the model field — the streaming code stays the same. For example, gpt-5, gemini-3.5-flash or deepseek-v3:
model="gpt-5"            # instead of claude-opus-4-8
The full list of available names is on the Pricing page of the main site.

Streaming over the Anthropic protocol

If you work through the native Anthropic protocol (Claude Code, Anthropic SDK), streaming is supported there too — via client.messages.stream(...). Note the address here is without /v1:
import anthropic

client = anthropic.Anthropic(
    api_key="sk-YOUR_KEY",
    base_url="https://www.ruapi.ai",  # for Anthropic — without /v1
)

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a short story about a cat."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print()
More on the two protocols and base URLs is in the API reference.

Gotchas

The last fragment of the stream usually carries metadata (such as finish_reason) rather than text — so delta.content there is None (or undefined in Node.js). Always check the value before printing or concatenating it, otherwise you’ll hit a TypeError on concatenation.
By default the final usage stats (usage) are not sent in the stream. To get them, add the stream_options parameter to your request:
stream = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Hi!"}],
    stream=True,
    stream_options={"include_usage": True},
)
A separate chunk with a usage field then arrives at the very end of the stream. That chunk’s choices list is empty — account for it when parsing.
If the text shows up in one block rather than typing out gradually, there’s probably a proxy or load balancer between you and the API that buffers SSE. Make sure response buffering is disabled (e.g. proxy_buffering off; in nginx) and that your HTTP client isn’t accumulating the stream itself. With curl, add the -N (--no-buffer) flag to be safe.

What’s next