GPT 5.4

OpenAI GPT 5.4 — multimodal reasoning model exposed via the OpenAI Responses API at /v1/responses. Supports structured input arrays, adjustable reasoning effort (minimal → xhigh), web search, and function calling.

Model

Model Name	Context Window	Reasoning
`gpt-5-4`	256K tokens	Yes (`reasoning.effort` controls effort)

Pricing

Per-token billing:

Type	Credits / 1M tokens	Price / 1M tokens
Input	300 credits	$1.50
Output	1800 credits	$9.00

Endpoint

POST https://api.aivideoapi.ai/v1/responses

OpenAI Responses API–compatible. Use openai SDK's responses.create with baseURL set to https://api.aivideoapi.ai/v1.

Create Response

curl -X POST https://api.aivideoapi.ai/v1/responses \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-4",
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Summarize quantum entanglement in one sentence" }
        ]
      }
    ],
    "reasoning": { "effort": "low" }
  }'

Request Body

Field	Type	Required	Description
`model`	string	Yes	Must be `gpt-5-4`
`input`	string \| array	Yes	Plain string or array of message objects
`stream`	boolean	No	Enable streaming (default: `false`)
`reasoning.effort`	string	No	`minimal`/`low`/`medium`/`high`/`xhigh` (default: `low`)
`tools`	array	No	`web_search` or `function` (mutually exclusive)
`tool_choice`	string	No	Use with `function` tools, recommended `auto`

Input Message

input can be either a plain string or an array of message objects.

String form (simplest)

{ "model": "gpt-5-4", "input": "Summarize quantum entanglement in one sentence" }

Array form (recommended)

{
  "model": "gpt-5-4",
  "input": [
    {
      "role": "user",
      "content": [
        { "type": "input_text", "text": "Summarize quantum entanglement in one sentence" }
      ]
    }
  ],
  "reasoning": { "effort": "low" }
}

Message fields

Field	Type	Required	Description
`role`	string	Yes	`user` / `assistant` / `system` / `developer` / `tool`
`content`	array	Yes	Array of content blocks

Content block types

`type`	Field	Description
`input_text`	`text: string`	Plain text
`input_image`	`image_url: string`	Publicly accessible image URL
`input_file`	`file_url: string`	Publicly accessible file URL (PDF, document, etc.)

Multimodal mix example

{
  "role": "user",
  "content": [
    { "type": "input_text", "text": "What is in this image?" },
    { "type": "input_image", "image_url": "https://example.com/photo.jpg" },
    { "type": "input_file", "file_url": "https://example.com/doc.pdf" }
  ]
}

Note: Unlike OpenAI Chat Completions, the Responses protocol uses explicit types (input_text / input_image / input_file) instead of reusing the Chat-style image_url wrapper.

Response (Non-Streaming)

{
  "id": "resp_0e90a829f82901960169e10f8f12d081999a38d26a2d3f2f8e",
  "object": "response",
  "model": "gpt-5.4",
  "status": "completed",
  "created_at": 1776357263,
  "completed_at": 1776357264,
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "id": "msg_0e90a829f82901960169e10f8fab94819994201d9d5f8b201c",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum entanglement: two or more particles share correlated states such that measuring one instantly determines the outcome of measuring the others, no matter the distance.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 0,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens": 45,
    "output_tokens_details": { "reasoning_tokens": 0 },
    "total_tokens": 45
  },
  "reasoning": { "effort": "low" },
  "text": { "format": { "type": "text" }, "verbosity": "medium" },
  "tool_choice": "auto",
  "tools": [],
  "parallel_tool_calls": true,
  "temperature": 1,
  "top_p": 0.98,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "service_tier": "default",
  "truncation": "disabled",
  "store": false,
  "background": false,
  "prompt_cache_key": "3e936979-eeac-46b9-b8de-1023adf31fb1",
  "prompt_cache_retention": "24h",
  "credits_consumed": 0.08
}

Field	Type	Description
`id`	string	Unique response ID (`resp_` prefix)
`object`	string	Always `response`
`model`	string	Actual upstream model version
`status`	string	`completed` / `incomplete` / `failed`
`created_at` / `completed_at`	int	Creation / completion Unix timestamps
`output[].type`	string	Output block type: `message` for text reply; may include `reasoning` blocks when thinking is enabled
`output[].content[].type`	string	`output_text` (text) / `output_image` etc.
`output[].content[].text`	string	Text content
`usage.input_tokens`	int	Input tokens
`usage.input_tokens_details.cached_tokens`	int	Tokens served from prompt cache (upstream cache discount may apply)
`usage.output_tokens`	int	Output tokens (includes reasoning)
`usage.output_tokens_details.reasoning_tokens`	int	Reasoning portion within output
`usage.total_tokens`	int	input + output
`reasoning.effort`	string	Echo of the requested reasoning effort
`text.format.type` / `text.verbosity`	string	Output text format and verbosity
`prompt_cache_key` / `prompt_cache_retention`	string	Prompt cache key and retention window
`service_tier`	string	Service tier (`default` / `flex` / etc.)
`credits_consumed`	number	Credits actually charged for this call (computed at this platform's rate, independent of upstream's internal value)

Billing: credits_consumed = input_tokens × 300/1M + output_tokens × 1800/1M. Note: output_tokens in the OpenAI Responses protocol already includes reasoning_tokens (this is OpenAI's official billing model), so reasoning is billed as output. To limit reasoning costs, set reasoning.effort: "minimal" or "low". The example above has input=0, output=45 → 0 + 45 × 1800 / 1_000_000 = 0.081, rounded to credits_consumed = 0.08.

Response (Streaming)

When stream: true, returns SSE. Events fall into three groups: lifecycle (response/output_item/content_part added and done), incremental (output_text.delta), and terminal (completed).

Full event sequence

event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_xxx","object":"response","status":"in_progress","created_at":1776357621,"model":"gpt-5.4","output":[],"usage":null,...}}

event: response.in_progress
data: {"type":"response.in_progress","sequence_number":1,"response":{...}}

event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":2,"output_index":0,"item":{"id":"msg_xxx","type":"message","status":"in_progress","content":[],"phase":"final_answer","role":"assistant"}}

event: response.content_part.added
data: {"type":"response.content_part.added","sequence_number":3,"content_index":0,"item_id":"msg_xxx","output_index":0,"part":{"type":"output_text","annotations":[],"logprobs":[],"text":""}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":4,"content_index":0,"delta":"Quantum","item_id":"msg_xxx","output_index":0,"obfuscation":"Lh5gIRXJduvt83D","logprobs":[]}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":5,"content_index":0,"delta":" entanglement","item_id":"msg_xxx","output_index":0,"obfuscation":"6aRgLGNpXdPvddV","logprobs":[]}

... (more delta events) ...

event: response.output_text.done
data: {"type":"response.output_text.done","sequence_number":56,"content_index":0,"item_id":"msg_xxx","output_index":0,"text":"Quantum entanglement is the phenomenon where two or more quantum systems form an inseparable whole..."}

event: response.content_part.done
data: {"type":"response.content_part.done","sequence_number":57,"content_index":0,"item_id":"msg_xxx","output_index":0,"part":{"type":"output_text","annotations":[],"logprobs":[],"text":"Quantum entanglement..."}}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":58,"output_index":0,"item":{"id":"msg_xxx","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Quantum entanglement..."}],"phase":"final_answer","role":"assistant"}}

event: response.completed
data: {"type":"response.completed","sequence_number":59,"response":{"id":"resp_xxx","object":"response","status":"completed","created_at":1776357621,"completed_at":1776357625,"model":"gpt-5.4","output":[{"role":"assistant","id":"msg_xxx","type":"message","status":"completed","content":[{"annotations":[],"text":"Quantum entanglement...","type":"output_text"}]}],"usage":{"input_tokens":0,"input_tokens_details":{"cached_tokens":0},"output_tokens":57,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":57},...},"credits_consumed":0.10}

⚠️ Note: This upstream does NOT emit a data: [DONE] terminator. Treat the response.completed event as the end-of-stream signal.

Event types

Event	Description
`response.created`	Response object created, `status: "in_progress"`
`response.in_progress`	In progress (same object as `created`)
`response.output_item.added`	New output item appended (e.g. a `message` block)
`response.content_part.added`	New content part within an output item (e.g. `output_text`)
`response.output_text.delta`	Text increment — append `delta` to your buffer
`response.output_text.done`	Text complete; `text` holds the full content
`response.content_part.done`	Content part finalized
`response.output_item.done`	Output item finalized
`response.completed`	End of stream — `response.usage` has final token counts; top-level `credits_consumed` is the actual platform charge

Client assembly

For text-only consumers, subscribe to response.output_text.delta and concatenate delta. To grab the final string in one shot, read text on response.output_text.done. For billing/usage, read response.usage and the top-level credits_consumed from response.completed.

Extra fields

Field	Description
`sequence_number`	Monotonically increasing index — useful for ordering and dedup
`obfuscation`	Upstream-injected random string to prevent replay; safe to ignore
`output_index` / `content_index`	Indices when multiple output items / content parts coexist
`item_id`	Links the event to a specific output item (`msg_xxx`)

Examples

Web Search

curl -X POST https://api.aivideoapi.ai/v1/responses \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-4",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "Top AI news this week" }
      ]}
    ],
    "tools": [{ "type": "web_search" }],
    "reasoning": { "effort": "high" }
  }'

Function Calling

curl -X POST https://api.aivideoapi.ai/v1/responses \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-4",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "How hot is it in San Francisco?" }
      ]}
    ],
    "tools": [
      {
        "type": "function",
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" },
            "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
          },
          "required": ["location", "unit"]
        }
      }
    ],
    "tool_choice": "auto"
  }'

Restriction: web_search and function cannot be used in the same request.

Error Codes

When a request fails, the API returns a JSON error response:

{
  "error": {
    "code": "insufficient_credits",
    "message": "Your credit balance is too low. Please top up.",
    "type": "billing_error"
  }
}

Error Reference

HTTP Status	Code	Type	Description
400	`invalid_request`	invalid_request_error	Missing or invalid parameters
401	`invalid_api_key`	authentication_error	API key is invalid, disabled, or deleted
402	`insufficient_credits`	billing_error	Credit balance too low, please top up
403	`ip_not_allowed`	permission_error	Request IP not in the key's allowlist
404	`model_not_found`	invalid_request_error	Model does not exist or is inactive
404	`task_not_found`	invalid_request_error	Task ID does not exist
429	`rate_limit_exceeded`	rate_limit_error	Too many requests, please slow down
429	`spend_limit_exceeded`	billing_error	Key spend limit reached (hourly/daily/total)
500	`internal_error`	api_error	Unexpected server error
503	`upstream_error`	upstream_error	Upstream AI provider returned an error

Common Scenarios

invalid_request (400)

Returned when required fields are missing or invalid.

{
  "error": {
    "code": "invalid_request",
    "message": "'model' is required.",
    "type": "invalid_request_error"
  }
}

insufficient_credits (402)

Your balance is too low. Check your balance with GET /v1/credits and top up in Dashboard > Billing.

invalid_api_key (401)

Possible causes:

The key does not start with sk-
The key has been disabled or deleted
The user account has been banned

upstream_error (503)

The upstream AI provider returned an error. This may happen when:

The input contains sensitive or prohibited content
The provider is temporarily unavailable
The request parameters are not supported by the provider

Credits are automatically refunded when a task fails due to upstream errors.