GPT 5.4
OpenAI GPT 5.4 — multimodal reasoning model exposed via the OpenAI Responses API at /v1/responses. Supports structured input arrays, adjustable reasoning effort (minimal → xhigh), web search, and function calling.
Model
| Model Name | Context Window | Reasoning |
|---|---|---|
gpt-5-4 | 256K tokens | Yes (reasoning.effort controls effort) |
Pricing
Per-token billing:
| Type | Credits / 1M tokens | Price / 1M tokens |
|---|---|---|
| Input | 300 credits | $1.50 |
| Output | 1800 credits | $9.00 |
Endpoint
POST https://api.aivideoapi.ai/v1/responses
OpenAI Responses API–compatible. Use openai SDK's responses.create with baseURL set to https://api.aivideoapi.ai/v1.
Create Response
curl -X POST https://api.aivideoapi.ai/v1/responses \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5-4",
"input": [
{
"role": "user",
"content": [
{ "type": "input_text", "text": "Summarize quantum entanglement in one sentence" }
]
}
],
"reasoning": { "effort": "low" }
}'
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Must be gpt-5-4 |
input | string | array | Yes | Plain string or array of message objects |
stream | boolean | No | Enable streaming (default: false) |
reasoning.effort | string | No | minimal/low/medium/high/xhigh (default: low) |
tools | array | No | web_search or function (mutually exclusive) |
tool_choice | string | No | Use with function tools, recommended auto |
Input Message
input can be either a plain string or an array of message objects.
String form (simplest)
{ "model": "gpt-5-4", "input": "Summarize quantum entanglement in one sentence" }
Array form (recommended)
{
"model": "gpt-5-4",
"input": [
{
"role": "user",
"content": [
{ "type": "input_text", "text": "Summarize quantum entanglement in one sentence" }
]
}
],
"reasoning": { "effort": "low" }
}
Message fields
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | user / assistant / system / developer / tool |
content | array | Yes | Array of content blocks |
Content block types
type | Field | Description |
|---|---|---|
input_text | text: string | Plain text |
input_image | image_url: string | Publicly accessible image URL |
input_file | file_url: string | Publicly accessible file URL (PDF, document, etc.) |
Multimodal mix example
{
"role": "user",
"content": [
{ "type": "input_text", "text": "What is in this image?" },
{ "type": "input_image", "image_url": "https://example.com/photo.jpg" },
{ "type": "input_file", "file_url": "https://example.com/doc.pdf" }
]
}
Note: Unlike OpenAI Chat Completions, the Responses protocol uses explicit types (
input_text/input_image/input_file) instead of reusing the Chat-styleimage_urlwrapper.
Response (Non-Streaming)
{
"id": "resp_0e90a829f82901960169e10f8f12d081999a38d26a2d3f2f8e",
"object": "response",
"model": "gpt-5.4",
"status": "completed",
"created_at": 1776357263,
"completed_at": 1776357264,
"output": [
{
"type": "message",
"role": "assistant",
"id": "msg_0e90a829f82901960169e10f8fab94819994201d9d5f8b201c",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Quantum entanglement: two or more particles share correlated states such that measuring one instantly determines the outcome of measuring the others, no matter the distance.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 0,
"input_tokens_details": { "cached_tokens": 0 },
"output_tokens": 45,
"output_tokens_details": { "reasoning_tokens": 0 },
"total_tokens": 45
},
"reasoning": { "effort": "low" },
"text": { "format": { "type": "text" }, "verbosity": "medium" },
"tool_choice": "auto",
"tools": [],
"parallel_tool_calls": true,
"temperature": 1,
"top_p": 0.98,
"frequency_penalty": 0,
"presence_penalty": 0,
"service_tier": "default",
"truncation": "disabled",
"store": false,
"background": false,
"prompt_cache_key": "3e936979-eeac-46b9-b8de-1023adf31fb1",
"prompt_cache_retention": "24h",
"credits_consumed": 0.08
}
| Field | Type | Description |
|---|---|---|
id | string | Unique response ID (resp_ prefix) |
object | string | Always response |
model | string | Actual upstream model version |
status | string | completed / incomplete / failed |
created_at / completed_at | int | Creation / completion Unix timestamps |
output[].type | string | Output block type: message for text reply; may include reasoning blocks when thinking is enabled |
output[].content[].type | string | output_text (text) / output_image etc. |
output[].content[].text | string | Text content |
usage.input_tokens | int | Input tokens |
usage.input_tokens_details.cached_tokens | int | Tokens served from prompt cache (upstream cache discount may apply) |
usage.output_tokens | int | Output tokens (includes reasoning) |
usage.output_tokens_details.reasoning_tokens | int | Reasoning portion within output |
usage.total_tokens | int | input + output |
reasoning.effort | string | Echo of the requested reasoning effort |
text.format.type / text.verbosity | string | Output text format and verbosity |
prompt_cache_key / prompt_cache_retention | string | Prompt cache key and retention window |
service_tier | string | Service tier (default / flex / etc.) |
credits_consumed | number | Credits actually charged for this call (computed at this platform's rate, independent of upstream's internal value) |
Billing:
credits_consumed = input_tokens × 300/1M + output_tokens × 1800/1M. Note:output_tokensin the OpenAI Responses protocol already includesreasoning_tokens(this is OpenAI's official billing model), so reasoning is billed as output. To limit reasoning costs, setreasoning.effort: "minimal"or"low". The example above has input=0, output=45 →0 + 45 × 1800 / 1_000_000 = 0.081, rounded tocredits_consumed = 0.08.
Response (Streaming)
When stream: true, returns SSE. Events fall into three groups: lifecycle (response/output_item/content_part added and done), incremental (output_text.delta), and terminal (completed).
Full event sequence
event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_xxx","object":"response","status":"in_progress","created_at":1776357621,"model":"gpt-5.4","output":[],"usage":null,...}}
event: response.in_progress
data: {"type":"response.in_progress","sequence_number":1,"response":{...}}
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":2,"output_index":0,"item":{"id":"msg_xxx","type":"message","status":"in_progress","content":[],"phase":"final_answer","role":"assistant"}}
event: response.content_part.added
data: {"type":"response.content_part.added","sequence_number":3,"content_index":0,"item_id":"msg_xxx","output_index":0,"part":{"type":"output_text","annotations":[],"logprobs":[],"text":""}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":4,"content_index":0,"delta":"Quantum","item_id":"msg_xxx","output_index":0,"obfuscation":"Lh5gIRXJduvt83D","logprobs":[]}
event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":5,"content_index":0,"delta":" entanglement","item_id":"msg_xxx","output_index":0,"obfuscation":"6aRgLGNpXdPvddV","logprobs":[]}
... (more delta events) ...
event: response.output_text.done
data: {"type":"response.output_text.done","sequence_number":56,"content_index":0,"item_id":"msg_xxx","output_index":0,"text":"Quantum entanglement is the phenomenon where two or more quantum systems form an inseparable whole..."}
event: response.content_part.done
data: {"type":"response.content_part.done","sequence_number":57,"content_index":0,"item_id":"msg_xxx","output_index":0,"part":{"type":"output_text","annotations":[],"logprobs":[],"text":"Quantum entanglement..."}}
event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":58,"output_index":0,"item":{"id":"msg_xxx","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Quantum entanglement..."}],"phase":"final_answer","role":"assistant"}}
event: response.completed
data: {"type":"response.completed","sequence_number":59,"response":{"id":"resp_xxx","object":"response","status":"completed","created_at":1776357621,"completed_at":1776357625,"model":"gpt-5.4","output":[{"role":"assistant","id":"msg_xxx","type":"message","status":"completed","content":[{"annotations":[],"text":"Quantum entanglement...","type":"output_text"}]}],"usage":{"input_tokens":0,"input_tokens_details":{"cached_tokens":0},"output_tokens":57,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":57},...},"credits_consumed":0.10}
⚠️ Note: This upstream does NOT emit a
data: [DONE]terminator. Treat theresponse.completedevent as the end-of-stream signal.
Event types
| Event | Description |
|---|---|
response.created | Response object created, status: "in_progress" |
response.in_progress | In progress (same object as created) |
response.output_item.added | New output item appended (e.g. a message block) |
response.content_part.added | New content part within an output item (e.g. output_text) |
response.output_text.delta | Text increment — append delta to your buffer |
response.output_text.done | Text complete; text holds the full content |
response.content_part.done | Content part finalized |
response.output_item.done | Output item finalized |
response.completed | End of stream — response.usage has final token counts; top-level credits_consumed is the actual platform charge |
Client assembly
For text-only consumers, subscribe to response.output_text.delta and concatenate delta. To grab the final string in one shot, read text on response.output_text.done. For billing/usage, read response.usage and the top-level credits_consumed from response.completed.
Extra fields
| Field | Description |
|---|---|
sequence_number | Monotonically increasing index — useful for ordering and dedup |
obfuscation | Upstream-injected random string to prevent replay; safe to ignore |
output_index / content_index | Indices when multiple output items / content parts coexist |
item_id | Links the event to a specific output item (msg_xxx) |
Examples
Web Search
curl -X POST https://api.aivideoapi.ai/v1/responses \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5-4",
"input": [
{ "role": "user", "content": [
{ "type": "input_text", "text": "Top AI news this week" }
]}
],
"tools": [{ "type": "web_search" }],
"reasoning": { "effort": "high" }
}'
Function Calling
curl -X POST https://api.aivideoapi.ai/v1/responses \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5-4",
"input": [
{ "role": "user", "content": [
{ "type": "input_text", "text": "How hot is it in San Francisco?" }
]}
],
"tools": [
{
"type": "function",
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" },
"unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
},
"required": ["location", "unit"]
}
}
],
"tool_choice": "auto"
}'
Restriction:
web_searchandfunctioncannot be used in the same request.
Error Codes
When a request fails, the API returns a JSON error response:
{
"error": {
"code": "insufficient_credits",
"message": "Your credit balance is too low. Please top up.",
"type": "billing_error"
}
}
Error Reference
| HTTP Status | Code | Type | Description |
|---|---|---|---|
| 400 | invalid_request | invalid_request_error | Missing or invalid parameters |
| 401 | invalid_api_key | authentication_error | API key is invalid, disabled, or deleted |
| 402 | insufficient_credits | billing_error | Credit balance too low, please top up |
| 403 | ip_not_allowed | permission_error | Request IP not in the key's allowlist |
| 404 | model_not_found | invalid_request_error | Model does not exist or is inactive |
| 404 | task_not_found | invalid_request_error | Task ID does not exist |
| 429 | rate_limit_exceeded | rate_limit_error | Too many requests, please slow down |
| 429 | spend_limit_exceeded | billing_error | Key spend limit reached (hourly/daily/total) |
| 500 | internal_error | api_error | Unexpected server error |
| 503 | upstream_error | upstream_error | Upstream AI provider returned an error |
Common Scenarios
invalid_request (400)
Returned when required fields are missing or invalid.
{
"error": {
"code": "invalid_request",
"message": "'model' is required.",
"type": "invalid_request_error"
}
}
insufficient_credits (402)
Your balance is too low. Check your balance with GET /v1/credits and top up in Dashboard > Billing.
invalid_api_key (401)
Possible causes:
- The key does not start with
sk- - The key has been disabled or deleted
- The user account has been banned
upstream_error (503)
The upstream AI provider returned an error. This may happen when:
- The input contains sensitive or prohibited content
- The provider is temporarily unavailable
- The request parameters are not supported by the provider
Credits are automatically refunded when a task fails due to upstream errors.