Skip to main content
Vision-capable models take images alongside text in a single request. You can hand the model a picture and ask it to describe the scene, read text from it (OCR), make sense of a UI screenshot, a diagram or a chart, or pull data out of a photographed table. Under the hood this uses the same OpenAI-compatible endpoint as a normal chat. The only thing that changes is the content field: instead of a string you pass an array of parts — text blocks {"type": "text"} and image blocks {"type": "image_url"}.
Use a model that supports vision: Claude (Sonnet / Opus), GPT and Gemini accept images. DeepSeek is text-only — see DeepSeek models. Check the capability badges on the Pricing page of the main site before you send a request.

Image from a public URL

The simplest case — the image is already online at a direct URL.
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR_KEY",
    base_url="https://www.ruapi.ai/v1",
)

response = client.chat.completions.create(
    model="claude-opus-4-8",  # a vision model; exact names on the pricing page
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image? Describe it briefly."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Local file via base64

If the image isn’t publicly reachable (a screenshot, a photo on disk), encode it to base64 and pass it as a data URI in the form data:image/jpeg;base64,....
Python (OpenAI SDK)
import base64
from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR_KEY",
    base_url="https://www.ruapi.ai/v1",
)

# Encode a local file to base64
with open("screenshot.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode("utf-8")

data_uri = f"data:image/jpeg;base64,{b64}"

response = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Read all the text in this screenshot."},
                {"type": "image_url", "image_url": {"url": data_uri}},
            ],
        }
    ],
)

print(response.choices[0].message.content)
For PNG change the prefix to data:image/png;base64,, for WebP use data:image/webp;base64,. The MIME type must match the file’s real format.

Gotchas

An image counts as input tokens, and the higher the resolution the more expensive the request. If you don’t need the fine detail, downscale the image before sending (for example to 1024–1568 px on the long side).
A base64 string is about a third larger than the file itself and travels entirely inside the request body. For heavy images a public URL is safer than a multi-megabyte data URI — otherwise you can hit the request size limit.
Typically jpeg, png, webp and gif. Convert exotic formats (HEIC, TIFF, SVG) to one of these beforehand.

What’s next