# Grok Imagine Video > xAI's video generation model based on the Aurora architecture, supporting text-to-video, image-to-video, and video editing with native audio-visual synthesis at up to 720p ## Quick Reference - Model ID: grok-imagine-video - Creator: xAI - Status: active - Family: grok-imagine - Base URL: https://api.lumenfall.ai/v1 ## Specifications - Max Video Duration: 15 seconds - Input Modalities: text, image, video - Output Modalities: video, audio - Supported Modes: Text to Video, Image to Video, Video to Video ## Model Identifiers - Primary Slug: grok-imagine-video - Aliases: grok-video ## Dates - Released: February 2026 ## Tags video-generation, text-to-video, image-to-video, video-editing, audio-generation ## Available Providers ### Replicate - Config Key: replicate/grok-imagine-video - Provider Model ID: xai/grok-imagine-video - Pricing: $0.050/second - Source: https://replicate.com/xai/grok-imagine-video ## Performance Metrics Provider performance over the last 30 days. ### replicate - Median Generation Time (p50): 343ms - 95th Percentile Generation Time (p95): 827ms - Average Generation Time: 395ms - Success Rate: 100.0% - Total Requests: 322 ## Image Gallery 1 images available for this model. Browse all at https://lumenfall.ai/models/xai/grok-imagine-video/gallery ### Arena Video Results - : Elo . Prompt: "Cinematic wide shot of Central Park, New York, at golden hour. A capybara walks confidently and s..." ## Example Prompt The following prompt was used to generate an example video in our playground: Cinematic wide shot of Central Park, New York, at golden hour. A capybara walks confidently and stylishly down a park path like it owns the scene, calm and charismatic. Lush greenery, warm sunlight, soft shadows, and iconic New York apartment buildings in the background. A large sign on a building behind the park clearly reads “Grok Imagine.” Smooth forward camera motion, cinematic composition, shallow depth of field, realistic motion, premium film aesthetic, 4k, highly detailed, natural and believable. ## Code Examples ### Text to Video (/v1/videos/generations) — Async #### cURL # Step 1: Submit video generation request VIDEO_ID=$(curl -s -X POST \ https://api.lumenfall.ai/v1/videos \ -H "Authorization: Bearer $LUMENFALL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "grok-imagine-video", "prompt": "", "size": "1024x1024" }' | jq -r '.id') echo "Video ID: $VIDEO_ID" # Step 2: Poll for completion while true; do RESULT=$(curl -s \ https://api.lumenfall.ai/v1/videos/$VIDEO_ID \ -H "Authorization: Bearer $LUMENFALL_API_KEY") STATUS=$(echo $RESULT | jq -r '.status') echo "Status: $STATUS" if [ "$STATUS" = "completed" ]; then echo $RESULT | jq -r '.output.url' break elif [ "$STATUS" = "failed" ]; then echo $RESULT | jq -r '.error.message' break fi sleep 5 done #### JavaScript const BASE_URL = 'https://api.lumenfall.ai/v1'; const API_KEY = 'YOUR_API_KEY'; // Step 1: Submit video generation request const submitRes = await fetch(`${BASE_URL}/videos`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'grok-imagine-video', prompt: '', size: '1024x1024' }) }); const { id: videoId } = await submitRes.json(); console.log('Video ID:', videoId); // Step 2: Poll for completion while (true) { const pollRes = await fetch(`${BASE_URL}/videos/${videoId}`, { headers: { 'Authorization': `Bearer ${API_KEY}` } }); const result = await pollRes.json(); if (result.status === 'completed') { console.log('Video URL:', result.output.url); break; } else if (result.status === 'failed') { console.error('Error:', result.error.message); break; } await new Promise(r => setTimeout(r, 5000)); } #### Python import requests import time BASE_URL = "https://api.lumenfall.ai/v1" API_KEY = "YOUR_API_KEY" HEADERS = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } # Step 1: Submit video generation request response = requests.post( f"{BASE_URL}/videos", headers=HEADERS, json={ "model": "grok-imagine-video", "prompt": "", "size": "1024x1024" } ) video_id = response.json()["id"] print(f"Video ID: {video_id}") # Step 2: Poll for completion while True: result = requests.get( f"{BASE_URL}/videos/{video_id}", headers=HEADERS ).json() if result["status"] == "completed": print(f"Video URL: {result['output']['url']}") break elif result["status"] == "failed": print(f"Error: {result['error']['message']}") break time.sleep(5) ### Image to Video (/v1/videos/generations) — Async #### cURL # Step 1: Submit image-to-video request VIDEO_ID=$(curl -s -X POST \ https://api.lumenfall.ai/v1/videos \ -H "Authorization: Bearer $LUMENFALL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "grok-imagine-video", "prompt": "", "image_url": "https://example.com/start-frame.jpg", "duration": "10", "aspect_ratio": "16:9" }' | jq -r '.id') echo "Video ID: $VIDEO_ID" # Step 2: Poll for completion while true; do RESULT=$(curl -s \ https://api.lumenfall.ai/v1/videos/$VIDEO_ID \ -H "Authorization: Bearer $LUMENFALL_API_KEY") STATUS=$(echo $RESULT | jq -r '.status') echo "Status: $STATUS" if [ "$STATUS" = "completed" ]; then echo $RESULT | jq -r '.output.url' break elif [ "$STATUS" = "failed" ]; then echo $RESULT | jq -r '.error.message' break fi sleep 5 done #### JavaScript const BASE_URL = 'https://api.lumenfall.ai/v1'; const API_KEY = 'YOUR_API_KEY'; // Step 1: Submit image-to-video request const submitRes = await fetch(`${BASE_URL}/videos`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'grok-imagine-video', prompt: '', image_url: 'https://example.com/start-frame.jpg', duration: '10', aspect_ratio: '16:9' }) }); const { id: videoId } = await submitRes.json(); console.log('Video ID:', videoId); // Step 2: Poll for completion while (true) { const pollRes = await fetch(`${BASE_URL}/videos/${videoId}`, { headers: { 'Authorization': `Bearer ${API_KEY}` } }); const result = await pollRes.json(); if (result.status === 'completed') { console.log('Video URL:', result.output.url); break; } else if (result.status === 'failed') { console.error('Error:', result.error.message); break; } await new Promise(r => setTimeout(r, 5000)); } #### Python import requests import time BASE_URL = "https://api.lumenfall.ai/v1" API_KEY = "YOUR_API_KEY" HEADERS = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } # Step 1: Submit image-to-video request response = requests.post( f"{BASE_URL}/videos", headers=HEADERS, json={ "model": "grok-imagine-video", "prompt": "", "image_url": "https://example.com/start-frame.jpg", "duration": "10", "aspect_ratio": "16:9" } ) video_id = response.json()["id"] print(f"Video ID: {video_id}") # Step 2: Poll for completion while True: result = requests.get( f"{BASE_URL}/videos/{video_id}", headers=HEADERS ).json() if result["status"] == "completed": print(f"Video URL: {result['output']['url']}") break elif result["status"] == "failed": print(f"Error: {result['error']['message']}") break time.sleep(5) ### Video to Video (/v1/videos/generations) — Async #### cURL # Step 1: Submit video-to-video request VIDEO_ID=$(curl -s -X POST \ https://api.lumenfall.ai/v1/videos \ -H "Authorization: Bearer $LUMENFALL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "grok-imagine-video", "prompt": "Apply cinematic color grading to @Video1", "video_url": "https://example.com/source.mp4", "keep_audio": true, "aspect_ratio": "16:9" }' | jq -r '.id') echo "Video ID: $VIDEO_ID" # Step 2: Poll for completion while true; do RESULT=$(curl -s \ https://api.lumenfall.ai/v1/videos/$VIDEO_ID \ -H "Authorization: Bearer $LUMENFALL_API_KEY") STATUS=$(echo $RESULT | jq -r '.status') echo "Status: $STATUS" if [ "$STATUS" = "completed" ]; then echo $RESULT | jq -r '.output.url' break elif [ "$STATUS" = "failed" ]; then echo $RESULT | jq -r '.error.message' break fi sleep 5 done #### JavaScript const BASE_URL = 'https://api.lumenfall.ai/v1'; const API_KEY = 'YOUR_API_KEY'; // Step 1: Submit video-to-video request const submitRes = await fetch(`${BASE_URL}/videos`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'grok-imagine-video', prompt: 'Apply cinematic color grading to @Video1', video_url: 'https://example.com/source.mp4', keep_audio: true, aspect_ratio: '16:9' }) }); const { id: videoId } = await submitRes.json(); console.log('Video ID:', videoId); // Step 2: Poll for completion while (true) { const pollRes = await fetch(`${BASE_URL}/videos/${videoId}`, { headers: { 'Authorization': `Bearer ${API_KEY}` } }); const result = await pollRes.json(); if (result.status === 'completed') { console.log('Video URL:', result.output.url); break; } else if (result.status === 'failed') { console.error('Error:', result.error.message); break; } await new Promise(r => setTimeout(r, 5000)); } #### Python import requests import time BASE_URL = "https://api.lumenfall.ai/v1" API_KEY = "YOUR_API_KEY" HEADERS = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } # Step 1: Submit video-to-video request response = requests.post( f"{BASE_URL}/videos", headers=HEADERS, json={ "model": "grok-imagine-video", "prompt": "Apply cinematic color grading to @Video1", "video_url": "https://example.com/source.mp4", "keep_audio": True, "aspect_ratio": "16:9" } ) video_id = response.json()["id"] print(f"Video ID: {video_id}") # Step 2: Poll for completion while True: result = requests.get( f"{BASE_URL}/videos/{video_id}", headers=HEADERS ).json() if result["status"] == "completed": print(f"Video URL: {result['output']['url']}") break elif result["status"] == "failed": print(f"Error: {result['error']['message']}") break time.sleep(5) ## About ## Overview Grok Imagine Video is a high-fidelity video generation model developed by xAI, designed to produce video content with integrated, synchronized audio. Built on the proprietary Aurora architecture, the model supports multiple input modalities including text-to-video, image-to-video, and direct video editing. It is distinctive for its native audio-visual synthesis, meaning the model generates temporal audio that aligns with the visual motion rather than adding a separate soundtrack post-generation. ## Strengths * **Native Audio-Visual Synthesis:** The model generates synchronized spatial and temporal audio concurrently with the video frames, ensuring that sound effects match the actions on screen. * **Multi-Modal Flexibility:** It handles three distinct workflows (text prompts, image-to-video animation, and video-to-video editing) within a single unified framework. * **Temporal Consistency:** The Aurora architecture maintains stable object identity and physical logic across sequences, reducing the "morphing" artifacts common in earlier generation diffusion models. * **Resolution and Quality:** Capable of outputting video at up to 720p resolution, balancing computational efficiency with visual detail suitable for social media and web content. ## Limitations * **Resolution Ceiling:** While 720p is effective for many applications, it falls short of the 1080p or 4K standards required for cinematic production or high-end commercial use. * **Action Duration:** Like most current generative video models, it is optimized for short-form clips; maintaining narrative or structural coherence over extended durations (e.g., several minutes) remains a challenge. * **Inference Latency:** The computational demands of native audio-visual generation may result in longer wait times compared to visual-only models. ## Technical Background Grok Imagine Video is built on xAI’s Aurora architecture, a specialized framework designed for high-dimensional temporal data. Unlike models that layer audio on top of finished video, Aurora utilizes a joint embedding space that treats audio and video as co-dependent signals during the generation process. This approach allows the model to "understand" the relationship between physics, motion, and sound within a single transformer-based or diffusion-variant bottleneck. ## Best For This model is ideal for creating short-form social media content, rapid prototyping for advertisements, and animating still photography with realistic environmental sound. It is a strong choice for developers who need "all-in-one" asset generation where visual motion and audio cues must be perfectly aligned without manual editing. Grok Imagine Video is available for testing and integration through Lumenfall's unified API and playground, allowing for seamless incorporation into automated media pipelines. ## Frequently Asked Questions ### How much does Grok Imagine Video cost? Grok Imagine Video starts at $0.05 per video through Lumenfall. Pricing varies by provider. Lumenfall does not add any markup to provider pricing. ### How do I use Grok Imagine Video via API? You can use Grok Imagine Video through Lumenfall's OpenAI-compatible API. Send requests to the unified endpoint with model ID "grok-imagine-video". Code examples are available in Python, JavaScript, and cURL. ### Which providers offer Grok Imagine Video? Grok Imagine Video is available through Replicate on Lumenfall. Lumenfall automatically routes requests to the best available provider. ## Links - Model Page: https://lumenfall.ai/models/xai/grok-imagine-video - About: https://lumenfall.ai/models/xai/grok-imagine-video/about - Providers, Pricing & Performance: https://lumenfall.ai/models/xai/grok-imagine-video/providers - API Reference: https://lumenfall.ai/models/xai/grok-imagine-video/api - Benchmarks: https://lumenfall.ai/models/xai/grok-imagine-video/benchmarks - Use Cases: https://lumenfall.ai/models/xai/grok-imagine-video/use-cases - Gallery: https://lumenfall.ai/models/xai/grok-imagine-video/gallery - Playground: https://lumenfall.ai/playground?model=grok-imagine-video - API Documentation: https://docs.lumenfall.ai