How To...
/
Luma Video Models Guide: Ray3.14, Veo, Sora & Kling Compared

Quick search…

/alt

How To...

Luma Video Models Field Guide

Written by

Davicho Barona

Published

Feb 27, 2026

How To...

Luma Video Models Field Guide

Written by

Davicho Barona

Published

Feb 27, 2026

Luma Video Generation Models — Field Guide

A comprehensive reference guide for video generation AI models available in Luma, their strengths, limitations, and practical application strategies.


Ray3.14 (Luma)

Overview: Luma's latest and fastest video model. This is your default workhorse for video generation. Native 1080p and HDR support make it the go-to for professional video work.


Strengths:

  • DEFAULT recommended model — fastest generation of the Luma lineup

  • Native 1080p resolution support (also 720p, 540p, Draft)

  • HDR output with expanded dynamic range for dramatic lighting scenarios. 

  • EXR export option for professional color grading workflows (available in 540p and 720p)
    *HDR and HDR+EXR are available only for text2video and image2video; Modify and HDR/EXR is not supported

  • Supports start frame AND end frame keyframes (precise interpolation between two images)

  • 6 aspect ratios: 9:16 (portrait), 3:4, 1:1 (square), 4:3, 16:9 (landscape), 21:9 (ultrawide)

  • Seamless loop support for product showcases and repeating content

  • 5s or 10s duration for Text-to-video; image-to-video is 5s; Modify supports up to 18s; extensions can reach ~30s.

  • Strong cinematic quality — filmic lighting, steady camera work, believable motion

  • Excellent prompt adherence when following Luma's prompting best practices

  • Excels in Modify V2V (video-to-video) transformations 


Weaknesses:

  • No character reference support (must use Ray3 for that)

  • No native audio — need separate audio tools

  • Prompt-sensitive: certain words like "vibrant," "whimsical," "hyper-realistic" may  degrade output quality (test)

  • Can struggle with very complex multi-subject action scenes compared to Sora

  • Requires understanding of Luma's specific prompting methodology for best results


Specialties:

Cinematic footage, product videos, smooth keyframe interpolation, HDR content, professional workflows requiring EXR export, looping content, ultrawide cinematic sequences


Best Practices:

DO:

  • Use mid-action verbs: "running" not "begins to run"

  • “Positive only” model. Negative prompting is counterproductive.

  • Include secondary consequences: wind in hair, fabric movement, reflections, dust kicked up, water ripples

  • Default to cinematic style unless otherwise specified

  • With keyframes: describe only what CHANGES, don't redescribe static elements

  • Keep prompts about 100 words long,  action-focused, and present tense

  • Describe temporal movement — what happens as the scene progresses

  • Be specific about camera movement: "camera dollies forward," "slow pan right," "aerial descending shot"

DON'T:

  • Avoid using: "vibrant," "whimsical," "hyper-realistic" – these tend to degrade quality

  • Avoid vague descriptors like "beautiful," "amazing," "stunning"

  • Don't use temporal phrases like "begins to" or "starts to"

  • Don't over-describe static elements, focus on describing the elements that need to move


Prompt Template:

Create a video of [SUBJECT] [MID-ACTION VERB] in [SETTING], [SECONDARY MOTION/CONSEQUENCE], [CAMERA MOVEMENT if any], [LIGHTING/MOOD].

Examples:

  • "A golden retriever running through a wheat field, ears flapping in the wind, dust particles catching golden hour sunlight, camera tracking alongside."

  • "Espresso pouring into a white ceramic cup, steam rising, liquid swirling, macro close-up, warm morning light."

  • "Dancer spinning on a rooftop at sunset, dress billowing outward, hair flowing, city lights glowing in background, camera orbiting slowly."


Tips & Gotchas:

  • Start here for 90% of video work — it's fast, high-quality, and versatile

  • Use 720p for iteration, 1080p for finals (saves time and credits)

  • Loop mode is excellent for product showcase content that needs to repeat seamlessly

  • HDR mode is outstanding for dramatic lighting scenarios: sunsets, neon signs, fire, and stage lighting

  • The EXR export is a game-changer for professional color grading pipelines

  • 21:9 ultrawide is perfect for cinematic letterbox aesthetic


Ray3 (Luma)

Overview: Luma's character-reference-capable model. Use when you need a specific character maintained across shots without needing perfect keyframes of that character.


Strengths:

  • CHARACTER REFERENCE support – upload a character reference image to maintain character identity across your video generations, supported in T2V, I2V, V2V and Reference Mode.

  • Same keyframe support as Ray3.14 – start frame, end frame (5s frame/120th frame)

  • 6 aspect ratios (9:16, 3:4, 1:1, 4:3, 16:9, 21:9)

  • 5s or 10s duration

  • Seamless loop support

  • HDR and EXR support

  • Cinematic quality with more expressive and intense movement

  • Enhanced prompt mode available for better adherence


Weaknesses:

  • Slower than Ray3.14 — noticeably longer generation times

  • Native 1080p

  • No native audio

  • Same prompt sensitivity as Ray3.14 (avoid "vibrant," "whimsical," etc.)


Specialties:

Character-consistent multi-shot sequences, narrative projects, storyboards with recurring characters, and any work requiring the same character across multiple video generation modes.


Best Practices:

Same rules as Ray3.14 (mid-action verbs, secondary consequences, no banned words)

Character Reference Specific:

  • When using character reference, the prompt should describe the scene/action, and the reference image handles identity

  • You don’t have to re-describe the character's physical appearance if using a reference — let the reference image do that work

  • With keyframes AND character reference: describe action/setting changes, not character features


Prompt Template:

Create a video of [CHARACTER DESCRIPTION if no ref] [MID-ACTION VERB] in [SETTING], [SECONDARY MOTION], [CAMERA if any], [LIGHTING/MOOD].

Examples:

  • With character ref: "Character walking through a foggy forest, leaves crunching underfoot, mist swirling around legs, low-angle tracking shot."

  • Without ref: "Young woman in red jacket hiking up a mountain trail, backpack bouncing, hair tied back, breathing visible in cold air, golden hour side lighting."


Tips & Gotchas:

  • Only reach for this when you specifically need character reference — Ray 3.14 is faster and higher-res for everything else

  • Ray is a “positive only” model. Negative prompting is counterproductive.

  • Character reference works best with clear, well-lit reference images showing the character's face and defining features

  • Can combine character reference WITH keyframes for maximum control over both identity and composition

  • Great for narrative sequences where the same character appears in multiple locations/scenarios

  • The enhanced prompt mode can help with complex scenes but increases generation time


Veo 3 (Google)

Overview: Google's video model with native audio generation. Generates synchronized dialogue, sound effects, and ambient audio alongside video.


Strengths:

  • NATIVE AUDIO generation — dialogue, SFX, ambient sound, all synchronized with video

    • Include dialogue in quotes for automatic lip-sync: e.g., 'A man says, "Hello world"

  • Text-based "inpainting" for scene modifications.

  • Uses reference images for character/background consistency.

  • Significantly SLOWER than Sora for longer videos. 

  • 720p and 1080p resolution (1080p only available for 16:9 aspect ratio)

  • 4s, 6s, or 8s duration options

  • High adherence to professional camera, lens, and lighting terminology.

  • Good for standalone videos that need integrated audio from the start

  • Cinematic quality

  • High temporal adherence means more physical realism.

  • Long context window, it can handle long, detailed prompts


Weaknesses:

  • Only 2 aspect ratios (16:9, 9:16) — very limited framing options compared to Luma `models

  • No keyframe support (no start/end frame control for precise visuals)

  • No character reference

  • No loop support

  • Less cinematic quality than Ray models — visuals can feel "AI-ish" with occasional artifacts

  • Poor for multi-video projects requiring visual consistency across shots

  • Audio quality can be hit-or-miss; dialogue can sound robotic or uncanny

  • 1080p locked to 16:9 only


Specialties:

Audio-integrated videos, talking head content, ambient scene videos with natural sound, quick social content with sound, dialogue-driven shorts


Best Practices:

Audio Integration:

  • Put dialogue in quotes: 'A woman says, "Welcome to our channel"'

  • Describe sound effects explicitly: "thunder rumbling," "waves crashing," "footsteps echoing"

  • Be specific about ambient sound: "busy coffee shop chatter," "forest birds chirping," "city traffic humming"

Visual:

  • Keep prompts clean and concise (1-2 sentences)

  • Focus on what's visually happening AND what you want to hear

  • Simple is better — complex scenes can degrade quality


Prompt Template:

[Cinematography] + [Subject] + [Action] + [Context] + [Style] + [Ambiance] + [Audio].

Examples:

  • “Tracking shot following the explorer as she steps into the clearing and runs her hand over the intricate carvings on a crumbling stone wall. Emotion: Wonder and reverence."

  • “Wide, high-angle crane shot, revealing the lone explorer standing small in the center of the vast, forgotten temple complex, half-swallowed by the jungle. SFX: A swelling, gentle orchestral score begins to play.”

  • “Reverse shot of the explorer's freckled face, her expression filled with awe as she gazes upon ancient, moss-covered ruins in the background. SFX: The rustle of dense leaves, distant exotic bird calls.”


Tips & Gotchas:

  • Best when you need audio baked in from the start and don't want to sync separately

  • For higher visual quality without audio, use Ray 3.14 + separate audio tools

  • 1080p only works at 16:9 — portrait is locked to 720p (significant limitation)

  • Multi-shot consistency —  While each generation is independent,  use "First Frame" anchoring or the "Scene Extension" tool to carry character details across shots.

  • The dialogue feature is powerful, but the voice quality is inconsistent (can sound robotic).

  • Good for quick social content where audio matters more than cinematic perfection


Veo 3.1 (Google)

Overview: Updated version of Veo 3 with keyframe support. Same native audio capabilities with added visual control through start and end frame keyframes.


Strengths:

  • Everything Veo 3 has PLUS keyframe support (start frame and end frame)

  • Native audio generation with dialogue, SFX, ambient sound

  • 1080p and 720p at 4s, 6s, or 8s durations.

  • Optional 4K upscalingspecifically reconstructs textures and skin pores rather than just stretching pixels.

  • Can bridge between two keyframe images with audio — unique capability

  • Better visual control than Veo 3 for specific compositional needs


Weaknesses:

  • Same limited aspect ratios as Veo 3 (16:9, 9:16 only)

  • Up to three (3) reference images.

  • No loop support—unofficial workflow possible

  • 1080p available for all durations—4s, 6s, 8s

  • Same visual quality limitations as Veo 3 — less cinematic than Ray models

  • Audio quality is still variable and can be uncanny

  • Falls back to Veo 3 if generation fails


Specialties:

Keyframe-guided videos with native audio, image-to-video with sound, scene transitions with audio, and bridging two specific visual moments with synchronized sound


Best Practices:

Same as Veo 3 — dialogue in quotes, describe sounds explicitly

With Keyframes:

  • Describe the transition/action between frames AND the audio you want

  • Focus on the journey from start frame to end frame

  • Don't redescribe static elements visible in the keyframes


Prompt Template:

[Cinematography] + [Subject] + [Action] + [Context] + [Style] + [Ambiance] + [Audio].

Examples:

  • “Tracking shot following the explorer as she steps into the clearing and runs her hand over the intricate carvings on a crumbling stone wall. Emotion: Wonder and reverence."

  • “Wide, high-angle crane shot, revealing the lone explorer standing small in the center of the vast, forgotten temple complex, half-swallowed by the jungle. SFX: A swelling, gentle orchestral score begins to play.”

  • “Reverse shot of the explorer's freckled face, her expression filled with awe as she gazes upon ancient, moss-covered ruins in the background. SFX: The rustle of dense leaves, distant exotic bird calls.”


Tips & Gotchas:

  • Use this over Veo 3 when you have keyframe images and need audio

  • Use Veo 3.1 Fast model ( ~70% cheaper and 2x faster) for testing. Once locked, re-run the same seed and prompt on Standard with “4k” enabled for the final render.

  • The keyframe + audio combo is unique — no other model does this natively

  • Falls back gracefully if it fails, so it's worth trying

  • Still inherits Veo 3's visual quality limitations — not as cinematic as Ray models


Sora 2 (OpenAI)

Overview: OpenAI's video model. Strong at high-energy, multi-subject motion scenes with believable physics and crowd dynamics. Automatic audio generation included.


Strengths:

  • Excellent multi-subject motion: crowds, sports, group dynamics, action scenes with multiple moving elements

  • Strong character and world persistence within a single clip

  • Faster than Veo 3.1 (at high-quality)

  • Believable background activity and physics — things move naturally in the periphery

  • Automatic audio generation (ambient, SFX, even auto-generated dialogue)

  • Can suppress audio selectively ("no dialogue," "no music," "no audio")

  • Duration options vary by UI and wrapper:: 4s, 8s, or 15s. Pro users up to 25s 

  • LONGEST duration of models listed.

  • Sora 2 (base, $0.10/ second) is cheaper than  Veo 3.1 Fast  $0.15/second).

  • Good at historical recreations and documentary-style content

  • High scene realism and physics accuracy

  • Handles complex action better than most other models


Weaknesses:

  • Only 2 aspect ratios: 16:9, 9:16: 1792×1024 (landscape) and 1024×1792 (portrait)

  • Only 720p resolution — 1080p option only for Pro users.

  • No keyframe interpolation (start frame only, no end frame)

  • Strict content moderation — no copyrighted material, living celebrities, political figures

  • Sora doesn’t guarantee continuity across separate fresh generations; use additional features (Remix/Re-cut/Storyboard), or use a reference frame.

  • Not ideal for long cinematic storytelling sequences that require shot-to-shot consistency

  • Expensive in both time and compute resources.

  • Sora 2 Pro high-res ($0.50/sec) can exceed Veo 3.1 Standard ($0.40/sec).


Specialties:

High-energy social content, crowd/group dynamics, action scenes with multiple subjects, sports footage, historical recreations, meme-style videos, longer single clips (20s), documentary-style content


Best Practices:

General:

  • Sora 2 prompting guide says both shorter prompts (more creative freedom) and longer detailed prompts (more control/consistency) are valid, depending on goals

  • Be specific about shot type, subject, action, setting, and lighting for best results.

  • Sora 2 calls it “briefing a cinematographer.”

  • Let the model handle background activity naturally

Audio:

  • For dialogue: 'A woman says, "Welcome home"'

  • For SFX: describe explicitly — "waves crash on the shore," "engine revs loudly"

  • To suppress audio, OpenAI’s official guide uses “Diegetic only,” “No score,” “Natural ambience only”

Content:

  • Can reference historical figures (Lincoln, MLK, Freddie Mercury) but NOT living celebrities

  • No copyrighted characters or IP


Prompt Template:

[SHOT TYPE/FRAMING] of [SUBJECTS] [ACTION] in [SETTING], (optional: [SECONDARY MOTION / ENV EFFECTS]), [CAMERA MOVEMENT], [LIGHTING/PALETTE]. (optional: Audio: [DIALOGUE / BACKGROUND SOUND])

Examples:

  • "Wide handheld street-level shot of four basketball players in a gritty city court during golden hour, fast crossovers and a drive to the rim; camera tracks alongside the ball-handler, slight shake, shallow depth of field. Lighting: warm low sun, long shadows, dust in the air. Background sound: sneakers squeak on asphalt, ball thumps, crowd cheers, and claps."

  • "Wide shot transitioning to a medium tracking shot through a busy open-air market in Morocco; vendors gesture and call out while shoppers pass colorful fabric stalls. Camera: steady handheld gimbal feel, slow weave forward at walking pace. Lighting: bright midday sun with patchy shade under awnings; saturated color palette. Background sound: overlapping voices, bargaining calls, footsteps, cloth flapping in the wind. "

  • "Wide shot transitioning to a medium tracking shot through a busy open-air market in Morocco; vendors gesture and call out while shoppers pass colorful fabric stalls. Camera: steady handheld gimbal feel, slow weave forward at walking pace. Lighting: bright midday sun with patchy shade under awnings; saturated color palette. Background sound: overlapping voices, bargaining calls, footsteps, cloth flapping in the wind. "


Tips & Gotchas:

  • Best for short-form social content with lots of movement and energy

  • The 20s option is powerful but VERY slow — use strategically

  • Use as fallback when Veo or Ray aren't delivering what you need

  • The auto-audio is a nice bonus but less controllable than Veo's quoted dialogue

  • Excellent for memes and viral content where energy matters more than cinematic polish

  • Good at historical/period recreations if you need figures from the past


Kling 2.6 (Kuaishou)

Overview: Kuaishou's video model with native audio and lip-sync capabilities. Good motion quality with integrated audio generation.


Strengths:

  • Native audio generation: speech, SFX, ambient sound

  • LIP-SYNC capability — put dialogue in quotes for synchronized speech

  • Supports English and Chinese (other languages auto-translate)

  • Both text-to-video and image-to-video

  • Start and end frame reference images.

  • 3 aspect ratios (16:9, 9:16, 1:1 square)

  • 720p and 1080p resolution

  • 5s or 10s duration

  • CFG scale control (0-1) for prompt adherence tuning.

  • Good motion quality and consistency

  • Square aspect ratio option (1:1) is great for Instagram


Weaknesses:

  • Only 3 aspect ratios (no ultrawide, no 3:4, no 4:3)

  • No loop support

  • No HDR/EXR

  • Less cinematic quality than Ray models

  • Language limitations (best in English and Chinese)

  • No dedicated character reference support


Specialties:

Talking head content with lip-sync, audio-integrated scenes, social media content (especially Instagram with 1:1), bilingual English/Chinese content


Best Practices:

Audio/Dialogue:

  • Put dialogue in quotes for lip-sync: 'Character says "Hello!"'

  • Describe sounds explicitly: "coffee shop chatter, espresso machine hissing"

CFG Scale:

  • Use CFG scale to control how strictly the model follows your prompt

  • 0.5 = balanced default (recommended starting point)

  • Higher values (0.7-1.0) = stricter adherence to prompt

  • Lower values (0.2-0.4) = more creative freedom

General:

  • Clean concise prompts work best

  • Be specific about action and setting


Prompt Template:

[SETTING + LIGHTING]. [SUBJECT] [ACTION/MOTION] (optional: [CAMERA]). Audio: [AMBIENCE + SFX]. [CHARACTER] says "[DIALOGUE]" (optional: [VOICE TRAITS]). Style: [VISUAL AESTHETIC].

Examples:

  • “Medium close-up, creator facing camera in a home studio, subtle hand gestures, gentle camera push-in. Audio: low-volume lo-fi beat under clear voice, faint room tone. She says, “I’ll show you three AI tricks you can use today.” Warm key light, soft background bokeh, natural skin texture.”

  • Nighttime neon street after rain, slow tracking shot from behind as the subject walks, then turns to camera. Audio: rain on pavement, distant traffic, footsteps. She says, “Okay… this is where it starts.” Moody noir lighting, high contrast, shallow depth of field.

  • 360-degree rotating shot of a sleek smartphone on a minimal pedestal, slow floating motion. Audio: soft studio room tone, subtle whoosh as it rotates, tiny click as the screen lights. No dialogue. Crisp commercial lighting, clean reflections, product photography look.


Tips & Gotchas:

  • CFG scale is useful for fine-tuning adherence vs. creativity

  • Add weight to key elements by using emphasis indicators (++) for critical elements
    Example: "++sleek red convertible++ driving along coastal highway"

  • Lip-sync quality is still a standout feature compared to other models

  • Square (1:1) aspect ratio option is excellent for Instagram feed posts

  • If you need wider framing or more cinematic work, use Ray models instead

  • Works well for influencer-style content with direct-to-camera dialogue

  • Bilingual capability is useful for international campaigns


Ray3 & Ray3.14 Video Modify (V2V)

Overview: Transform existing videos with prompt-guided frame and/or video modifications. This is video-to-video transformation, not generation from scratch.


What It Does:

Takes an existing video and modifies it based on your prompt — can change style, lighting, environment, weather, time of day, and more while preserving the underlying motion and composition.


Strength Control:

Three modes with three intensity levels each:

  • Adhere (1-3): Preserves original video closely, subtle changes

    • Level 1: Minimal changes

    • Level 3: Moderate changes while staying faithful

  • Flex (1-3): Balanced transformation

    • Level 1: Moderate changes

    • Level 3: Significant changes

  • Reimagine (1-3): Creative freedom, major transformations

    • Level 1: Significant changes

    • Level 3: Dramatic reinterpretation


Keyframe Support:

  • Can use start and/or end frame keyframes to guide the modification

  • Helps maintain specific visual elements or compositions

  • Useful for ensuring the output matches a desired aesthetic


Best Practices:

CRITICAL RULES:

  • Describe desired END STATE, not commands

  • NO temporal language — don't say "changes to" or "transforms into"

  • POSITIVE descriptions only — say "clear blue sky" NOT "no clouds"

  • Be specific about the transformation you want

Good Examples:

  • "Cyberpunk neon city at night, rain-slicked streets, purple and blue lighting"

  • "Watercolor painting style, soft pastel colors, impressionistic"

  • "Golden hour lighting, warm orange glow, long shadows"

Bad Examples:

  • ❌ "Change the sky to blue" (command, not description)

  • ❌ "Remove the clouds" (negative description)

  • ❌ "The scene transforms into a forest" (temporal language)


Use Cases:

  • Style transfer: Turn live-action into animation, or realistic into illustrated

  • Re-lighting: Change time of day, add dramatic lighting

  • Environment changes: Urban to nature, summer to winter, day to night

  • Weather modifications: Add rain, fog, snow, sunshine

  • Artistic stylization: Oil painting, watercolor, comic book, cinematic grading


Tips & Gotchas:

  • Start with lower strength levels (Adhere 1-2, Flex 1) and increase if needed

  • Higher strength = more deviation from original — can lose important details

  • Works best when the original video has good motion and composition already

  • Use keyframes to anchor specific visual elements you want to preserve

  • Great for repurposing existing footage into different styles or moods

  • Can be used iteratively — modify a video, then modify the result again


Comparison Summary


Speed Ranking (Fastest to Slowest - Heavily dependent on Quality):

  1. Ray3.14 — Fastest, especially at lower resolutions

  2. Ray3 — Slightly slower than 3.14

  3. Veo 3 / Veo 3.1 — Fast for longer durations

  4. Sora 2 — Noticeably slow, especially at 12s and 20s

  5. Kling 2.6 — Moderate speed


Audio Capabilities:

Ray 3.14 - Audio generation available in supported UIs.

Veo 3.1 - Native audio generation. Slightly robotic
Kling 2.6 (Kuaishou) - Native audio. Strong lip-sync capability—multilingual voice support.
Sora (OpenAI) - Native Audio. Varies by UI


Resolution Options:

  • Ray 3.14 — Draft Mode, 540p, 720p, 1080p (native), 4K via upscaling

  • Ray 3 — 720p, 1080p

  • Veo 3.1 — 720p, 1080p, 4K

  • Kling 2.6 — Up to 1080p

  • Sora (Sora Editor surface) — 720p (higher resolutions vary by UI


Duration Options:

  • Ray 3.14 — 4 s, 8 s, 12 s (up to ~18 s orchestration)

  • Ray 3 — Short-form durations similar to Ray 3.14 (varies by platform)

  • Veo 3.1 — Up to 8 s

  • Kling (e.g., 2.1 / 2.6 class) — ~5 s and options to extend toward ~10 s is possible

  • Sora 2 — Up to 12 s 


Aspect Ratio Flexibility:

  • Ray 3.14 / Ray 3 — 16:9, 9:16, 1:1, 4:3, 3:4, 21:9

  • Veo 3.1 — 16:9, 9:16 (with some reference-image limitations reported)

  • Kling 2.x / 2.6 — 16:9, 9:16, 1:1

  • Sora 2 — 16:9, 9:16, 1:1


Special Features:

  • Ray 3 — Modify workflows, motion & character controls, part of Luma API ecosystem.

  • Veo 3 / 3.1 — Integrated audio, cinematic camera semantics, longer sequence support & scene continuity.

  • Kling 2.x (e.g., 2.6) — Native audio, strong lip sync, 3D VAE architecture, spatiotemporal motion quality, improved realism.

  • Sora 2 — Narrative/storytelling optimization, social-ready app features, visual coherence focus.


Cost/Resource Considerations:

  • Most efficient: Ray3.14 (fast + high quality)

  • Time-intensive: Sora 2 (especially 12s and 20s), Ray3 (slower than 3.14)

  • Resource-heavy: 1080p renders, HDR/EXR exports, longer durations

  • Budget-friendly iteration: Use 720p or lower for testing, 1080p for finals


Decision Flowchart


1️⃣ General video, need high consistency and temporal stability?

Ray3.14

  • Fast, high-quality, most aspect ratios

  • Native 1080p, use 720p for iteration. 

  • HDR for dramatic lighting

  • Loop mode for product showcases


2️⃣ Need character consistency across shots?

Ray3

  • Character reference maintains identity

  • Slower than 3.14 but worth it for multi-shot narratives

  • Native 1080p, 720p for iteration

  • More intense motion


3️⃣ Need native audio/dialogue?

Choose based on your specific need:

a) Have keyframe images + need audio?Veo 3.1

  • Start and end frame keyframes with synchronized audio

  • Best for bridging specific visual moments with sound

b) Text-to-video with audio, no keyframes?Veo 3

  • Fast, dialogue with lip-sync

  • Limited aspect ratios (16:9, 9:16 only)

c) Need lip-sync quality + social content?Kling 2.6

  • Strong lip-sync, 1:1 square for Instagram

  • CFG scale for fine-tuning

  • Bilingual (English/Chinese)


4️⃣ Need high-energy multi-subject action?

Sora 2

  • Best for crowds, sports, group dynamics

  • Believable background activity

  • Automatic audio included

  • Slower generation, 720p max


5️⃣ Need longest possible single clip (20s)?

Sora 2

  • Only model that goes to 20s

  • Be prepared for slow generation

  • Good for longer narrative moments


6️⃣ Need to modify existing video?

Modify Video • Ray3

  • Style transfer, re-lighting, environment changes

  • Adhere/Flex/Reimagine strength control

  • Start with lower strength levels


7️⃣ Other model failing or not delivering?

→ Try Sora 2 or Veo 3 as fallback

  • Different architectures may handle your prompt better

  • Sora for complex action, Veo for audio-integrated content


8️⃣ Need specific features?

  • Ultrawide (21:9): Ray3.14 or Ray3

  • HDR/EXR export: Ray3.14 720p & 540p only 

  • Seamless loops: Ray3.14 or Ray3

  • Character reference: Ray3 only

  • 1:1 square: Kling 2.6 (or Ray models)

  • End frame keyframes: Ray3.14, Ray 3, Veo 3.1


Recommendations for Creative Professionals


Default Workflow:

  1. Start with Ray3.14 for 90% of projects

  2. Iterate at 720p to save time and resources

  3. Use keyframes when you have specific compositions in mind

  4. Add audio separately for maximum control (unless you specifically need integrated audio)

  5. Upscale to 1080p for final delivery


When to Break the Rules:

  • Character-driven narratives → Ray3

  • Audio-first content → Veo 3.1, Kling 2.6

  • High-energy social → Sora 2

  • Longer single takes → Sora 2 (20s)

  • Style transformations → Modify Video


Prompting Discipline:

  • Luma models (Ray3.14, Ray3): Follow their rules religiously — mid-action verbs, secondary motion, NO forbidden words

  • Veo models: Quotes for dialogue, explicit sound descriptions

  • Sora: Clean and concise, let it handle complexity

  • Kling: Use CFG scale strategically, lip-sync in quotes

  • Modify Video: Describe end state, no commands, positive language only


Quality Control:

  • Always review at full resolution before delivery

  • Test audio sync if using native audio models

  • Check for artifacts in high-motion scenes

  • Verify aspect ratio matches platform requirements

  • Iterate strategically — don't waste credits on random attempts


Platform-Specific Recommendations:

  • Instagram Feed (1:1): Kling 2.6 or Ray models

  • Instagram Reels (9:16): Ray 3.14, Veo 3, Kling 2.6

  • TikTok (9:16): Ray3.14, Sora 2 for high-energy

  • YouTube (16:9): Ray3.14 (1080p, HDR if applicable)

  • Cinematic/Film (21:9): Ray3.14, Ray3

  • Professional Grading: Ray3.14 (EXR export)


Remember: These tools are powerful but not magic. Understanding their strengths, limitations, and prompting requirements will dramatically improve your results. Start with the default (Ray3.14), learn its quirks, then expand to other models as your specific needs require.