Native xAI video generation
Grok Imagine is xAI's first-party video model, designed for prompt fidelity and surreal compositions that other generation APIs struggle with.
Generate Grok Imagine clips with text or image inputs.
Grok Imagine supports prompt-based generation and multi-image references with @image tokens. Choose 480p or 720p, 6-30 second durations, and fun, normal, or spicy motion intensity through the same task, polling, callback, and credit balance used by the rest of your model stack.
Model capabilities
Use Grok Imagine from the same platform surface as the rest of your video and image stack: API keys, credits, logs, webhooks, and docs stay consistent across providers.
Turn prompts into 6-30 second clips with aspect ratio and motion intensity controls.
Provide up to 7 images and reference them inline with @image1, @image2 tokens in the prompt.
Choose fun, normal, or spicy in text-to-video to dial creative range and motion intensity against the brief.
Use text-to-video or image-to-video with one or more external image URLs.
Select 6-30 seconds and 480p or 720p based on cost and output needs.
Poll the task endpoint or use callbacks to deliver the hosted video URL.
curl -X POST https://api.aivideoapi.ai/v1/videos/generations \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-text-to-video",
"input": {
"prompt": "A couple of doors open in a surreal hallway, each revealing a tiny living room.",
"aspect_ratio": "2:3",
"duration": 6,
"resolution": "480p"
}
}'Why choose Grok Imagine
Grok Imagine is xAI's first-party video model, designed for prompt fidelity and surreal compositions that other generation APIs struggle with.
Generate 6 to 30 second clips in a single task — three to five times the duration ceiling of Veo 3.1 Fast, Kling 3.0, and Sora 2's shortest tiers — without stitching multiple jobs.
Attach up to seven image URLs and refer to them inside the prompt as @image1, @image2, etc. Useful for multi-character scenes and product compositions.
Switch between fun, normal, and spicy modes to dial creative range and motion intensity without rewriting the prompt.
Grok Imagine uses the same API key, credit balance, async task lifecycle, webhooks, and logs as every other video and image model on AI Video API.
Cost scales linearly with duration: 2.4 credits per second at 480p and 4.5 credits per second at 720p — no surprise multipliers or hidden fees.
Frequently asked questions
Grok Imagine is xAI's text-to-video and image-to-video model. The AI Video API exposes it through one HTTP endpoint (POST /v1/videos/generations) with credits, webhooks, and logs shared across all supported video models.
Each Grok Imagine task supports 6 to 30 seconds of video output. The exact duration is set with the input.duration parameter (integer, 6-30, default 6).
Yes. Pass image_urls (1-7 public HTTP/HTTPS image URLs) in the input object. You can reference each image inside the prompt with @image1, @image2, etc. JPEG, PNG, and WEBP up to 10MB per image are supported.
Pricing is per second by resolution: 480p is 2.4 credits/s and 720p is 4.5 credits/s. A 6-second 480p clip costs about 15 credits; a 10-second 720p clip costs about 45 credits. 1 credit = $0.005 USD.
Mode controls motion intensity and creative range. fun is playful, normal is balanced, and spicy is the most dynamic. Text-to-video supports all three; image-to-video supports fun and normal only.
Yes. Pass a callback_url when creating the task and the platform will POST the completed result (or error) to that URL. Status is also pollable via GET /v1/tasks/{taskId}.
Pricing and usage
Low-cost iteration
Higher-fidelity output
Minimum-duration draft
Balanced production clip
Create one API key, use one credit balance, and switch between video and image models without provider-specific plumbing.