Grok Imagine Video
AI Video Generation Model
xAI's video generation model based on the Aurora architecture, supporting text-to-video, image-to-video, and video editing with native audio-visual synthesis at up to 720p
Details
grok-imagine-video
Starting from
Prices shown are in USD
Full pricing detailsProvider Performance
Fastest generation through replicate at 379ms median latency with 100.0% success rate.
Aggregated from real API requests over the last 30 days.
Generation Time
Success Rate
Provider Rankings
| # | Provider | p50 Gen Time | p95 Gen Time | Success Rate | TTFB (p50) |
|---|---|---|---|---|---|
| 1 | replicate | 379ms | 1,759ms | 100.0% | — |
Providers & Pricing (1)
Grok Imagine Video is available exclusively through Replicate, starting at $0.05/video.
replicate/grok-imagine-video
grok-video API Async video generation
Lumenfall provides an OpenAI-compatible API for generating 720p videos, performing image-to-video transformations, and executing video edits using Grok Imagine Video.
https://api.lumenfall.ai/v1
grok-imagine-video
Code Examples
Image to Video
/v1/videos/generations# Step 1: Submit image-to-video request
VIDEO_ID=$(curl -s -X POST \
https://api.lumenfall.ai/v1/videos \
-H "Authorization: Bearer $LUMENFALL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-video",
"prompt": "",
"image_url": "https://example.com/start-frame.jpg",
"duration": "10",
"aspect_ratio": "16:9"
}' | jq -r '.id')
echo "Video ID: $VIDEO_ID"
# Step 2: Poll for completion
while true; do
RESULT=$(curl -s \
https://api.lumenfall.ai/v1/videos/$VIDEO_ID \
-H "Authorization: Bearer $LUMENFALL_API_KEY")
STATUS=$(echo $RESULT | jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" = "completed" ]; then
echo $RESULT | jq -r '.output.url'
break
elif [ "$STATUS" = "failed" ]; then
echo $RESULT | jq -r '.error.message'
break
fi
sleep 5
done
const BASE_URL = 'https://api.lumenfall.ai/v1';
const API_KEY = 'YOUR_API_KEY';
// Step 1: Submit image-to-video request
const submitRes = await fetch(`${BASE_URL}/videos`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'grok-imagine-video',
prompt: '',
image_url: 'https://example.com/start-frame.jpg',
duration: '10',
aspect_ratio: '16:9'
})
});
const { id: videoId } = await submitRes.json();
console.log('Video ID:', videoId);
// Step 2: Poll for completion
while (true) {
const pollRes = await fetch(`${BASE_URL}/videos/${videoId}`, {
headers: { 'Authorization': `Bearer ${API_KEY}` }
});
const result = await pollRes.json();
if (result.status === 'completed') {
console.log('Video URL:', result.output.url);
break;
} else if (result.status === 'failed') {
console.error('Error:', result.error.message);
break;
}
await new Promise(r => setTimeout(r, 5000));
}
import requests
import time
BASE_URL = "https://api.lumenfall.ai/v1"
API_KEY = "YOUR_API_KEY"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Step 1: Submit image-to-video request
response = requests.post(
f"{BASE_URL}/videos",
headers=HEADERS,
json={
"model": "grok-imagine-video",
"prompt": "",
"image_url": "https://example.com/start-frame.jpg",
"duration": "10",
"aspect_ratio": "16:9"
}
)
video_id = response.json()["id"]
print(f"Video ID: {video_id}")
# Step 2: Poll for completion
while True:
result = requests.get(
f"{BASE_URL}/videos/{video_id}",
headers=HEADERS
).json()
if result["status"] == "completed":
print(f"Video URL: {result['output']['url']}")
break
elif result["status"] == "failed":
print(f"Error: {result['error']['message']}")
break
time.sleep(5)
Parameter Reference
Core Parameters
| Parameter | Type | Description | Modes |
|---|---|---|---|
prompt
|
string | Required. Text prompt for video generation |
T2V
I2V
V2V
|
duration
|
number | Video duration in seconds |
T2V
I2V
V2V
|
Size & Layout
| Parameter | Type | Description | Modes |
|---|---|---|---|
size
|
string |
Video dimensions as WxH pixels (e.g. "1920x1080") or aspect ratio (e.g. "16:9")
auto
1365x768
768x1365
1254x836
836x1254
887x1182
1024x1024
1183x887
WxH determines both shape and scale (aspect_ratio and resolution are ignored when size is provided). W:H format is equivalent to aspect_ratio.
|
T2V
I2V
V2V
|
aspect_ratio
|
string |
Aspect ratio of the output video (e.g. "16:9", "1:1")
auto
9:16
2:3
3:4
1:1
4:3
3:2
16:9
Controls shape independently of scale. Use with resolution to control both. If size is also provided, size takes precedence. Any ratio is accepted and mapped to the nearest supported value.
|
T2V
I2V
V2V
|
resolution
|
string |
Output resolution tier (e.g. "1K", "4K")
auto
1K
Controls scale independently of shape. Higher tiers produce larger videos and cost more. If size is also provided, size takes precedence for scale. Any tier is accepted and mapped to the nearest supported value.
|
T2V
I2V
V2V
|
| Output |
size
|
aspect_ratio
+
resolution
|
|
|---|---|---|---|
| Flexible | |||
| Auto | "auto" |
— | Model chooses optimal dimensions |
1K 7 sizes
| Output |
size
|
aspect_ratio
+
resolution
|
|
|---|---|---|---|
| 1183 × 887 | "1183x887" |
or |
"4:3"
+
"1K"
|
| 1024 × 1024 | "1024x1024" |
or |
"1:1"
+
"1K"
|
| 887 × 1182 | "887x1182" |
or |
"3:4"
+
"1K"
|
| 836 × 1254 | "836x1254" |
or |
"2:3"
+
"1K"
|
| 1254 × 836 | "1254x836" |
or |
"3:2"
+
"1K"
|
| 768 × 1365 | "768x1365" |
or |
"9:16"
+
"1K"
|
| 1365 × 768 | "1365x768" |
or |
"16:9"
+
"1K"
|
How these parameters work
size
Exact pixel dimensions
"1920x1080"
aspect_ratio
Shape only, default scale
"16:9"
resolution
Scale tier, preserves shape
"1K"
Priority when combined
size is most specific and always wins. aspect_ratio and resolution control shape and scale independently.
How matching works
7:1 on a model with
4:1 and 8:1,
you get 8:1.
0.5K 1K 2K 4K)
or megapixel tiers (0.25 1).
If the exact tier isn't available, you get the nearest one.
Media Inputs
| Parameter | Type | Description | Modes |
|---|---|---|---|
input_reference
|
array | Required for I2V. Input image(s) to animate into video |
T2V
I2V
V2V
|
Output & Format
| Parameter | Type | Description | Modes |
|---|---|---|---|
n
|
integer |
Number of videos to generate
Default:
1Gateway generates multiple videos in parallel even if provider only supports 1.
|
T2V
I2V
V2V
|
Parameter Normalization
How we handle parameters across different providers
Not every provider speaks the same language. When you send a parameter, we handle it in one of four ways depending on what the model supports:
| Behavior | What happens | Example |
|---|---|---|
passthrough |
Sent as-is to the provider | style, quality |
renamed |
Same value, mapped to the field name the provider expects | prompt |
converted |
Transformed to the provider's native format | size |
emulated |
Works even if the provider has no concept of it | n, response_format |
Parameters we don't recognize pass straight through to the upstream API, so provider-specific options still work.
Grok Imagine Video FAQ
How much does Grok Imagine Video cost?
Grok Imagine Video starts at $0.05 per video through Lumenfall. Pricing varies by provider. Lumenfall does not add any markup to provider pricing.
How do I use Grok Imagine Video via API?
You can use Grok Imagine Video through Lumenfall's OpenAI-compatible API. Send requests to the unified endpoint with model ID "grok-imagine-video". Code examples are available in Python, JavaScript, and cURL.
Which providers offer Grok Imagine Video?
Grok Imagine Video is available through Replicate on Lumenfall. Lumenfall automatically routes requests to the best available provider.
Overview
Grok Imagine Video is a high-fidelity video generation model developed by xAI, designed to produce video content with integrated, synchronized audio. Built on the proprietary Aurora architecture, the model supports multiple input modalities including text-to-video, image-to-video, and direct video editing. It is distinctive for its native audio-visual synthesis, meaning the model generates temporal audio that aligns with the visual motion rather than adding a separate soundtrack post-generation.
Strengths
- Native Audio-Visual Synthesis: The model generates synchronized spatial and temporal audio concurrently with the video frames, ensuring that sound effects match the actions on screen.
- Multi-Modal Flexibility: It handles three distinct workflows (text prompts, image-to-video animation, and video-to-video editing) within a single unified framework.
- Temporal Consistency: The Aurora architecture maintains stable object identity and physical logic across sequences, reducing the “morphing” artifacts common in earlier generation diffusion models.
- Resolution and Quality: Capable of outputting video at up to 720p resolution, balancing computational efficiency with visual detail suitable for social media and web content.
Limitations
- Resolution Ceiling: While 720p is effective for many applications, it falls short of the 1080p or 4K standards required for cinematic production or high-end commercial use.
- Action Duration: Like most current generative video models, it is optimized for short-form clips; maintaining narrative or structural coherence over extended durations (e.g., several minutes) remains a challenge.
- Inference Latency: The computational demands of native audio-visual generation may result in longer wait times compared to visual-only models.
Technical Background
Grok Imagine Video is built on xAI’s Aurora architecture, a specialized framework designed for high-dimensional temporal data. Unlike models that layer audio on top of finished video, Aurora utilizes a joint embedding space that treats audio and video as co-dependent signals during the generation process. This approach allows the model to “understand” the relationship between physics, motion, and sound within a single transformer-based or diffusion-variant bottleneck.
Best For
This model is ideal for creating short-form social media content, rapid prototyping for advertisements, and animating still photography with realistic environmental sound. It is a strong choice for developers who need “all-in-one” asset generation where visual motion and audio cues must be perfectly aligned without manual editing. Grok Imagine Video is available for testing and integration through Lumenfall’s unified API and playground, allowing for seamless incorporation into automated media pipelines.
Try Grok Imagine Video in Playground
Generate images with custom prompts — no API key needed.