Some models think through a problem before giving a final answer. This makes them stronger at math, code, and logic-heavy tasks. Models with the reasoning capability (Deepshi’s own models and most frontier models) can return their thinking alongside the answer.
See Text models for which models support reasoning.
Reading the reasoning trace
On a reasoning model, the assistant message can carry extra fields next to content:
message.reasoning: the reasoning text.
message.reasoning_details: a structured array of reasoning segments.
from openai import OpenAI
client = OpenAI(base_url="https://api.deepshi.ai/v1", api_key="YOUR_DEEPSHI_API_KEY")
resp = client.chat.completions.create(
model="deepshi-3.0",
messages=[{"role": "user", "content": "What is 15% of 240?"}],
max_tokens=2048,
)
msg = resp.choices[0].message
print(getattr(msg, "reasoning", None)) # the thinking trace (may be None)
print(msg.content) # the final answer
If you only want the final answer, read message.content and ignore the
reasoning fields.
Give reasoning models enough tokens
Reasoning and the visible answer share the same generation budget. If you set max_tokens too low, a reasoning model can spend its whole budget thinking and return empty content with finish_reason: "length". That is a normal 200, not an error.
{
"choices": [{ "index": 0, "finish_reason": "length", "message": { "role": "assistant", "content": "" } }]
}
To avoid it, give reasoning models a generous max_tokens so there’s room for both the thinking and the answer.
Streaming
Reasoning works with streaming too. Set "stream": true and read choices[].delta. See the Streaming guide.
Best practices
- Allow ample
max_tokens on reasoning models (room for thinking and the answer).
- Use a reasoning model for math, code, and multi-step problems; a non-reasoning model is faster and cheaper for simple tasks.
- Both reasoning and answer tokens count toward
usage.completion_tokens and your cost.