POST https://api.deepshi.ai/v1/chat/completions
Basic request
Send a list ofmessages. Each message has a role (system, user, or assistant) and content.
Common parameters
| Parameter | Type | Description |
|---|---|---|
model | string | The model id to use, e.g. deepshi-2.0 or deepshi-3.0. See Models. |
messages | array | The conversation so far. Required. |
temperature | number | Sampling randomness, typically 0–2. Lower is more deterministic. |
top_p | number | Nucleus sampling cutoff. Use instead of temperature, not both. |
max_tokens | integer | Maximum tokens to generate in the response. |
stop | string or array | Sequences that stop generation. |
stream | boolean | Stream tokens as Server-Sent Events. See Streaming. |
seed | integer | Best-effort deterministic sampling for repeatable output. |
tools | array | Function/tool definitions the model may call. See Tool calling. |
response_format | object | Set to { "type": "json_object" } to force valid JSON output (model-dependent). |
Supported parameters vary by model. Unsupported fields are safely ignored
rather than rejected. Each model lists its
supported_sampling_parameters in
the models catalog.Multi-turn conversations
The API is stateless, so it doesn’t remember previous calls. To continue a conversation, send the full message history each time and append the model’s previous reply as anassistant message:
Reasoning models
Models like Deepshi 3.0 reason step by step before answering. They accept the same request shape; allow moremax_tokens and expect higher latency on hard problems. See Reasoning models and Text models for which models support reasoning.
Streaming
Set"stream": true to receive tokens incrementally as Server-Sent Events. See the Streaming guide for a full example.
Next steps
Tool calling
Let the model call your functions.
Reasoning
Use models that think before they answer.