xAI video generation

Grok Imagine API

Generate Grok Imagine clips with text or image inputs.

Grok Imagine supports prompt-based generation and multi-image references with @image tokens. Choose 480p or 720p, 6-30 second durations, and fun, normal, or spicy motion intensity through the same task, polling, callback, and credit balance used by the rest of your model stack.

Fun/Normal/Spicy
Modes
6-30s
Duration
480p/720p
Resolution
xAI
grok-imagine-text-to-videoasync task
Fun/Normal/Spicy
Modes
6-30s
Duration
480p/720p
Resolution

Model capabilities

Built for production API workflows, not one-off demos.

Use Grok Imagine from the same platform surface as the rest of your video and image stack: API keys, credits, logs, webhooks, and docs stay consistent across providers.

Text-to-video generation

Turn prompts into 6-30 second clips with aspect ratio and motion intensity controls.

Multi-image references

Provide up to 7 images and reference them inline with @image1, @image2 tokens in the prompt.

Mode-based motion control

Choose fun, normal, or spicy in text-to-video to dial creative range and motion intensity against the brief.

API workflow

Submit tasks, track progress, and return generated assets.

01

Choose input type

Use text-to-video or image-to-video with one or more external image URLs.

02

Pick duration and resolution

Select 6-30 seconds and 480p or 720p based on cost and output needs.

03

Receive task output

Poll the task endpoint or use callbacks to deliver the hosted video URL.

POST /v1/videos/generations
curl -X POST https://api.aivideoapi.ai/v1/videos/generations \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-text-to-video",
    "input": {
      "prompt": "A couple of doors open in a surreal hallway, each revealing a tiny living room.",
      "aspect_ratio": "2:3",
      "duration": 6,
      "resolution": "480p"
    }
  }'
What teams build
Surreal short scenes
Image-driven motion
Multi-character clips
Creative iterations
Animated reference shots
Social-format video

Why choose Grok Imagine

The Grok Imagine API benefits that matter to developers.

Native xAI video generation

Grok Imagine is xAI's first-party video model, designed for prompt fidelity and surreal compositions that other generation APIs struggle with.

Longer clips than most APIs

Generate 6 to 30 second clips in a single task — three to five times the duration ceiling of Veo 3.1 Fast, Kling 3.0, and Sora 2's shortest tiers — without stitching multiple jobs.

Inline multi-image references

Attach up to seven image URLs and refer to them inside the prompt as @image1, @image2, etc. Useful for multi-character scenes and product compositions.

Three motion intensity modes

Switch between fun, normal, and spicy modes to dial creative range and motion intensity without rewriting the prompt.

Unified billing and operations

Grok Imagine uses the same API key, credit balance, async task lifecycle, webhooks, and logs as every other video and image model on AI Video API.

Predictable per-second pricing

Cost scales linearly with duration: 2.4 credits per second at 480p and 4.5 credits per second at 720p — no surprise multipliers or hidden fees.

Frequently asked questions

Answers about the Grok Imagine API.

What is the Grok Imagine API?

Grok Imagine is xAI's text-to-video and image-to-video model. The AI Video API exposes it through one HTTP endpoint (POST /v1/videos/generations) with credits, webhooks, and logs shared across all supported video models.

How long can a Grok Imagine video be?

Each Grok Imagine task supports 6 to 30 seconds of video output. The exact duration is set with the input.duration parameter (integer, 6-30, default 6).

Does Grok Imagine support image-to-video?

Yes. Pass image_urls (1-7 public HTTP/HTTPS image URLs) in the input object. You can reference each image inside the prompt with @image1, @image2, etc. JPEG, PNG, and WEBP up to 10MB per image are supported.

How much does the Grok Imagine API cost?

Pricing is per second by resolution: 480p is 2.4 credits/s and 720p is 4.5 credits/s. A 6-second 480p clip costs about 15 credits; a 10-second 720p clip costs about 45 credits. 1 credit = $0.005 USD.

What are the fun, normal, and spicy modes?

Mode controls motion intensity and creative range. fun is playful, normal is balanced, and spicy is the most dynamic. Text-to-video supports all three; image-to-video supports fun and normal only.

Can I receive a webhook when generation finishes?

Yes. Pass a callback_url when creating the task and the platform will POST the completed result (or error) to that URL. Status is also pollable via GET /v1/tasks/{taskId}.

Pricing and usage

Clear model options with shared credits.

480p
2.4 credits/s

Low-cost iteration

720p
4.5 credits/s

Higher-fidelity output

6s @ 480p
~15 credits

Minimum-duration draft

10s @ 720p
~45 credits

Balanced production clip

Start building with Grok Imagine in AI Video API.

Create one API key, use one credit balance, and switch between video and image models without provider-specific plumbing.

Read docs