# Kling V3 > Kuaishou's cinematic video generation model supporting text-to-video and image-to-video with multi-shot control, native audio with voice control, negative prompts, and CFG scale at 720p ## Quick Reference - Model ID: kling-v3 - Creator: Kuaishou - Status: active - Family: kling - Base URL: https://api.lumenfall.ai/v1 ## Specifications - Max Video Duration: 15 seconds - Input Modalities: text, image - Output Modalities: video, audio - Supported Modes: Text to Video, Image to Video ## API Parameters The compiled parameter schema for this model is available via the API: `GET /v1/models/kling-v3?schema=true`. ### Core Parameters - `prompt` (string) — REQUIRED: Text prompt for video generation. Modes: Text to Video, Image to Video - `negative_prompt` (string): Negative prompt to guide generation away from undesired content. Modes: Text to Video, Image to Video - `duration` (number): Video duration in seconds. Modes: Text to Video, Image to Video - `mode` (string): 'standard' generates 720p, 'pro' generates 1080p.. Default: pro. Values: pro, standard. Modes: Text to Video, Image to Video. Only available via replicate ### Size & Layout - `size` (string): Video dimensions as WxH pixels (e.g. "1920x1080") or aspect ratio (e.g. "16:9"). Values: 1365x768, 768x1365, 1024x1024. Modes: Text to Video, Image to Video - `aspect_ratio` (string): Aspect ratio of the output video (e.g. "16:9", "1:1"). Values: 9:16, 1:1, 16:9. Modes: Text to Video, Image to Video - `resolution` (string): Output resolution tier (e.g. "1K", "4K"). Values: 1K. Modes: Text to Video, Image to Video ### Character Elements - `elements` (array): Elements (characters/objects) to include in the video. Each example can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc.. Modes: Image to Video. Only available via fal ### Multi-Shot Control - `multi_prompt` (array): List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations.. Modes: Text to Video, Image to Video - `shot_type` (string): The type of multi-shot video generation. Default: customize. Values: customize, intelligent. Modes: Text to Video, Image to Video. Only available via fal ### Audio - `generate_audio` (boolean): Whether to generate audio alongside video. Modes: Text to Video, Image to Video ### Output & Format - `n` (integer): Number of videos to generate. Default: 1. Modes: Text to Video, Image to Video ### Additional Parameters - `input_reference` (array): Input image(s) to animate into video. Modes: Image to Video - `cfg_scale` (number): Classifier-free guidance scale. Modes: Text to Video, Image to Video - `end_image` (string): End frame image URL for video interpolation. Modes: Image to Video ## Model Identifiers - Primary Slug: kling-v3 - Aliases: kling-3, kling-video-v3 ## Dates - Released: December 2025 ## Tags video-generation, text-to-video, image-to-video, audio-generation ## Available Providers ### fal.ai - Config Key: fal/kling-v3 - Provider Model ID: fal-ai/kling-video/v3/standard/text-to-video - Pricing: $0.084/second, $0.126/second, $0.154/second - Source: https://fal.ai/models/fal-ai/kling-video/v3/standard/text-to-video ### fal.ai - Config Key: fal/kling-v3-i2v - Provider Model ID: fal-ai/kling-video/v3/standard/image-to-video - Pricing: $0.084/second, $0.126/second, $0.154/second - Source: https://fal.ai/models/fal-ai/kling-video/v3/standard/image-to-video ### Replicate - Config Key: replicate/kling-v3 - Provider Model ID: kwaivgi/kling-v3-video - Pricing: $0.168/second, $0.252/second - Source: https://replicate.com/kwaivgi/kling-v3-video ## Performance Metrics Provider performance over the last 30 days. ### fal - Median Generation Time (p50): 89ms - 95th Percentile Generation Time (p95): 602ms - Average Generation Time: 147ms - Success Rate: 100.0% - Total Requests: 52 ## Image Gallery 1 images available for this model. Browse all at https://lumenfall.ai/models/kuaishou/kling-v3/gallery ### Arena Video Results - : Elo . Prompt: "Cinematic drone shot of a sleek, futuristic electric car gliding smoothly along a winding coastal..." ## Example Prompt The following prompt was used to generate an example video in our playground: Cinematic drone shot of a sleek, futuristic electric car gliding smoothly along a winding coastal highway at sunset. Golden hour light reflects off the car’s metallic surface and the ocean waves below. On a grassy roadside turnout in the lower corner of the frame, a small capybara sits peacefully, watching the car pass by. High-quality 4K, photorealistic, fluid camera motion, ambient synthwave soundtrack, "KLING V3" subtly etched on the roadside milestone. ## Code Examples ### Text to Video (/v1/videos/generations) — Async #### cURL # Step 1: Submit video generation request VIDEO_ID=$(curl -s -X POST \ https://api.lumenfall.ai/v1/videos \ -H "Authorization: Bearer $LUMENFALL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "kling-v3", "prompt": "", "size": "1024x1024" }' | jq -r '.id') echo "Video ID: $VIDEO_ID" # Step 2: Poll for completion while true; do RESULT=$(curl -s \ https://api.lumenfall.ai/v1/videos/$VIDEO_ID \ -H "Authorization: Bearer $LUMENFALL_API_KEY") STATUS=$(echo $RESULT | jq -r '.status') echo "Status: $STATUS" if [ "$STATUS" = "completed" ]; then echo $RESULT | jq -r '.output.url' break elif [ "$STATUS" = "failed" ]; then echo $RESULT | jq -r '.error.message' break fi sleep 5 done #### JavaScript const BASE_URL = 'https://api.lumenfall.ai/v1'; const API_KEY = 'YOUR_API_KEY'; // Step 1: Submit video generation request const submitRes = await fetch(`${BASE_URL}/videos`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'kling-v3', prompt: '', size: '1024x1024' }) }); const { id: videoId } = await submitRes.json(); console.log('Video ID:', videoId); // Step 2: Poll for completion while (true) { const pollRes = await fetch(`${BASE_URL}/videos/${videoId}`, { headers: { 'Authorization': `Bearer ${API_KEY}` } }); const result = await pollRes.json(); if (result.status === 'completed') { console.log('Video URL:', result.output.url); break; } else if (result.status === 'failed') { console.error('Error:', result.error.message); break; } await new Promise(r => setTimeout(r, 5000)); } #### Python import requests import time BASE_URL = "https://api.lumenfall.ai/v1" API_KEY = "YOUR_API_KEY" HEADERS = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } # Step 1: Submit video generation request response = requests.post( f"{BASE_URL}/videos", headers=HEADERS, json={ "model": "kling-v3", "prompt": "", "size": "1024x1024" } ) video_id = response.json()["id"] print(f"Video ID: {video_id}") # Step 2: Poll for completion while True: result = requests.get( f"{BASE_URL}/videos/{video_id}", headers=HEADERS ).json() if result["status"] == "completed": print(f"Video URL: {result['output']['url']}") break elif result["status"] == "failed": print(f"Error: {result['error']['message']}") break time.sleep(5) ### Image to Video (/v1/videos/generations) — Async #### cURL # Step 1: Submit image-to-video request VIDEO_ID=$(curl -s -X POST \ https://api.lumenfall.ai/v1/videos \ -H "Authorization: Bearer $LUMENFALL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "kling-v3", "prompt": "", "image_url": "https://example.com/start-frame.jpg", "duration": "10", "aspect_ratio": "16:9" }' | jq -r '.id') echo "Video ID: $VIDEO_ID" # Step 2: Poll for completion while true; do RESULT=$(curl -s \ https://api.lumenfall.ai/v1/videos/$VIDEO_ID \ -H "Authorization: Bearer $LUMENFALL_API_KEY") STATUS=$(echo $RESULT | jq -r '.status') echo "Status: $STATUS" if [ "$STATUS" = "completed" ]; then echo $RESULT | jq -r '.output.url' break elif [ "$STATUS" = "failed" ]; then echo $RESULT | jq -r '.error.message' break fi sleep 5 done #### JavaScript const BASE_URL = 'https://api.lumenfall.ai/v1'; const API_KEY = 'YOUR_API_KEY'; // Step 1: Submit image-to-video request const submitRes = await fetch(`${BASE_URL}/videos`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'kling-v3', prompt: '', image_url: 'https://example.com/start-frame.jpg', duration: '10', aspect_ratio: '16:9' }) }); const { id: videoId } = await submitRes.json(); console.log('Video ID:', videoId); // Step 2: Poll for completion while (true) { const pollRes = await fetch(`${BASE_URL}/videos/${videoId}`, { headers: { 'Authorization': `Bearer ${API_KEY}` } }); const result = await pollRes.json(); if (result.status === 'completed') { console.log('Video URL:', result.output.url); break; } else if (result.status === 'failed') { console.error('Error:', result.error.message); break; } await new Promise(r => setTimeout(r, 5000)); } #### Python import requests import time BASE_URL = "https://api.lumenfall.ai/v1" API_KEY = "YOUR_API_KEY" HEADERS = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } # Step 1: Submit image-to-video request response = requests.post( f"{BASE_URL}/videos", headers=HEADERS, json={ "model": "kling-v3", "prompt": "", "image_url": "https://example.com/start-frame.jpg", "duration": "10", "aspect_ratio": "16:9" } ) video_id = response.json()["id"] print(f"Video ID: {video_id}") # Step 2: Poll for completion while True: result = requests.get( f"{BASE_URL}/videos/{video_id}", headers=HEADERS ).json() if result["status"] == "completed": print(f"Video URL: {result['output']['url']}") break elif result["status"] == "failed": print(f"Error: {result['error']['message']}") break time.sleep(5) ## About ## Overview Kling V3 is a cinematic video generation model developed by Kuaishou, designed to produce high-fidelity video sequences from either text prompts or static images. It represents a significant iteration in the Kling family, introducing native audio generation and precise control over cinematic parameters like multi-shot coordination and voice-synchronized output. The model is distinctive for its ability to output video at 720p resolution while maintaining temporal consistency across complex motions. ## Strengths * **Integrated Audio Synthesis:** Unlike models that require post-production dubbing, Kling V3 generates native audio with direct voice control, ensuring sound effects and speech are synchronized with the visual action. * **Multi-Shot Control:** The model excels at maintaining character and environmental consistency across multiple shots within a single generation, reducing the visual "drift" common in long-form AI video. * **Fine-Grained Steering:** Developers can utilize negative prompts and adjustable Classifier-Free Guidance (CFG) scales to tightly constrain the output, allowing for better adherence to specific brand guidelines or aesthetic requirements. * **Dynamic Motion Handling:** It demonstrates high proficiency in rendering complex human movements and fluid physics, making it suitable for realistic storytelling rather than just static "living portraits." ## Limitations * **Resolution Constraints:** While the model produces high-quality cinematic content, it is currently capped at 720p native resolution, which may require upscaling for 4K professional broadcast workflows. * **Inference Latency:** Due to the complexity of simultaneous video and audio synthesis, generation times may be higher compared to models that focus exclusively on visual frames. * **Niche Stylization:** While excellent for realistic and cinematic styles, it may struggle with highly abstract or non-Euclidean artistic prompts where spatial logic is intentionally broken. ## Technical Background Kling V3 is built on a sophisticated diffusion transformer architecture optimized for spatio-temporal modeling. It utilizes a joint training approach where video and audio data are processed in the same latent space, allowing the model to learn the fundamental relationships between visual motion and acoustic signals. This version places a heavy emphasis on CFG scaling and negative prompt integration to improve prompt adherence over its predecessors. ## Best For Kling V3 is ideal for creators developing marketing assets, cinematic trailers, and social media content that requires "one-shot" generation of both visuals and sound. It is particularly effective for character-driven narratives where lip-syncing or specific voice parameters are necessary. You can experiment with Kling V3’s Text-to-Video and Image-to-Video modes through Lumenfall’s unified API and interactive playground to integrate high-end video synthesis into your existing applications. ## Frequently Asked Questions ### How much does Kling V3 cost? Kling V3 starts at $0.084 per video through Lumenfall. Pricing varies by provider. Lumenfall does not add any markup to provider pricing. ### How do I use Kling V3 via API? You can use Kling V3 through Lumenfall's OpenAI-compatible API. Send requests to the unified endpoint with model ID "kling-v3". Code examples are available in Python, JavaScript, and cURL. ### Which providers offer Kling V3? Kling V3 is available through fal.ai and Replicate on Lumenfall. Lumenfall automatically routes requests to the best available provider. ## Links - Model Page: https://lumenfall.ai/models/kuaishou/kling-v3 - About: https://lumenfall.ai/models/kuaishou/kling-v3/about - Providers, Pricing & Performance: https://lumenfall.ai/models/kuaishou/kling-v3/providers - API Reference: https://lumenfall.ai/models/kuaishou/kling-v3/api - Benchmarks: https://lumenfall.ai/models/kuaishou/kling-v3/benchmarks - Use Cases: https://lumenfall.ai/models/kuaishou/kling-v3/use-cases - Gallery: https://lumenfall.ai/models/kuaishou/kling-v3/gallery - Playground: https://lumenfall.ai/playground?model=kling-v3 - API Documentation: https://docs.lumenfall.ai