Avoiding Common Mistakes

March 9, 2026

Avoiding Common Mistakes

Using our multimodal agent means you’re working with something powerful.

But power comes with a predictable set of failure modes.

The good news is:
Most frustrating outcomes come from a small list of common mistakes.

This article covers the biggest ones across:

  • task planning
  • working with the agent
  • uploading + analyzing assets
  • image generation
  • video generation
  • consistency
  • iteration and batching

And for every mistake, you’ll see:

  • what not to do
  • why it breaks
  • what to do instead (with copy/paste templates)

Don’t start generating before you define the goal

❌ What not to do

Make something cinematic.

Or:

Generate a trailer.

Why it fails

The agent has no idea what “good” means to you.
So it guesses — and you get random outputs.

✅ Do this instead

Start with a clear end result.

Prompt template

I want to end up with:
[describe the final deliverable]
This is for:
[where it will be used]
The most important qualities are:
[list qualities]
Before generating anything, summarize your understanding and ask any questions you need.

Don’t treat the agent like a vending machine

❌ What not to do

Give me 10 options.

With no context, no references, no constraints.

Why it fails

You’ll get 10 different interpretations of your vague idea — not 10 useful options.

✅ Do this instead

Treat it like a teammate you’re briefing.

Prompt template

You are my [creative assistant]
Before creating anything:]
- ask questions if unclear
- confirm the goal
- suggest the best approach
Then generate [10] options that are meaningfully different.

Don’t ask for “perfect” on the first try

❌ What not to do

Make it perfect.
Make it final.

Why it fails

It sets unrealistic expectations and encourages you to restart instead of refine.

✅ Do this instead

Ask for exploration first, refinement second.

Prompt template

Let’s start with exploration.
Generate [10] options that explore different directions.
Then we will pick 1 or more and refine.

Don’t generate one output at a time

❌ What not to do

Generate one image → stare at it → feel disappointed → restart.

Why it fails

One output is basically a coin flip.
It also gives you no comparison.

✅ Do this instead

Generate small batches.

Prompt template

Generate [10] options.
Make them meaningfully different.
Label them clearly so I can see how you made them and compare.

Don’t give feedback like “I don’t like it”

❌ What not to do

No.
Not good.
Try again.

Why it fails

The agent has no actionable direction.
So it guesses again.

✅ Do this instead

Use focused feedback.

Prompt template

I like:
[what works]
I don’t like:
[what doesn’t]
Please change:
[specific adjustment]
Then show me [2] improved versions.

Asset Upload + File Analysis Mistakes

Don’t upload a file and say “do something with this”

❌ What not to do

Upload a PDF and ask:

What do you think?

Why it fails

You’ll get generic summaries or random guesses about what you want.

✅ Do this instead

Select the file you’ve uploaded, then tell the agent what kind of work you want done.

Prompt template

Analyze this file.
First:
- summarize what it contains
- extract the key points
Then:
- ask me what output I want to create from it

Don’t ask the agent to invent missing info from a file

❌ What not to do

“Fill in the missing parts.”

Why it fails

This is how you get hallucinated details.

✅ Do this instead

Make the file the source of truth.

Prompt template

Use this file as the source of truth.
If something is not stated:
- do not invent it
- ask me a question instead

Don’t skip the “extract constraints” step

❌ What not to do

Upload brand guidelines → jump straight to generation.

Why it fails

Brand work is mostly constraints.
If you don’t extract them, outputs may drift.

✅ Do this instead

Extract rules first.

Prompt template

From this file, extract:
- must-have rules
- must-not-do rules
- tone/style rules
- visual rules
Then rewrite them as a short checklist we can reuse in prompts.

Image Generation Mistakes

Don’t change everything at once when iterating

❌ What not to do

“Make it brighter, different style, new camera, new character, new setting.”

Why it fails

You can’t tell what caused improvement or failure.

✅ Do this instead

Change one variable at a time.

Prompt template

Keep everything the same except:
[one change]
Generate [3] variations.

Don’t rely on “style words” alone

❌ What not to do

“Make it cinematic, high quality, detailed.”

Why it fails

Those words are vague and overused.
They don’t define a real look.

✅ Do this instead

Use visual references OR a style description.

Prompt template

Use this style reference:
[select image(s) / shift+click assets]
Match:
- lighting
- color palette
- texture/material feel
- composition style
Do not copy the content.
Only match the aesthetic.

Don’t ignore consistency until later

❌ What not to do

Generate 20 images and only then realize that the character keeps changing.

Why it fails

Consistency is easiest to lock early.

✅ Do this instead

Create a reference block first.

Prompt template

Before generating more images:
Describe this character/object in detail.
Then list:
- what must stay consistent
- what can vary
We will reuse this in every prompt.

Video Generation Mistakes

Don’t try to generate a full 60–90s video in one go

❌ What not to do

Generate a 1:30 cinematic trailer.

Why it fails

Long videos require structure.
One-shot generation creates drift and chaos.

✅ Do this instead

Break it into moments.

Prompt template

Break this video into [4-6] moments.
For each moment:
- what we see
- what we feel
- what must stay consistent
Then we will generate one moment at a time.

Don’t assume video must start from text

❌ What not to do

Only using T2V because it’s the default.

Why it fails

Text-only video is often less stable and requires more iteration.

✅ Do this instead

Choose a start strategy based on what you have.

Prompt template

I want to create a video.
Here is what I have to start with:
- [text] (optional)
- [image] (optional)
- [video] (optional)
- [keyframes] (optional)
Recommend the best start strategy:
T2V, I2V, V2V, or Keyframes.
Explain why.

Don’t skip keyframes when consistency matters

❌ What not to do

Try to generate multiple scenes without any anchors.

Why it fails

Multi-shot sequences drift in:

  • style
  • characters
  • lighting
  • environment

✅ Do this instead

Keyframe first.

Prompt template

Before generating motion:
Create 1 keyframe per moment.
Each keyframe must:
- match the style references
- keep character consistency
- clearly communicate the story beat

Don’t “fix video” by endlessly regenerating

❌ What not to do

Regenerate 20 times hoping it magically improves.

Why it fails

That’s gambling, not iteration.

✅ Do this instead

Diagnose and adjust.

Prompt template

Analyze what is wrong with this video.
Classify issues into:
- motion issues
- camera issues
- style drift
- character drift
- pacing issues
Then propose [3] fixes in order of impact.

Planning + Workflow Mistakes

Don’t start with tools, start with structure

❌ What not to do

“Should I use [model name] or [model name]?”

Why it fails

Tool selection comes after you know what you’re making.

✅ Do this instead

Define the output first.

Prompt template

My output goal is:
[deliverable]
Now recommend:
- a simple plan
- which tools/models to use at each step
- where to iterate

Don’t try to do everything in one session

❌ What not to do

Trying to create:

  • story
  • visuals
  • keyframes
  • final video
  • voiceover
  • sound design

…all at once.

Why it fails

You lose clarity and quality.

✅ Do this instead

Work in layers.

Prompt template

Let’s do this in layers:
1) structure
2) style
3) keyframes
4) shots
5) refinement
Start with layer 1 only.

Multi-Agent + Multi-Model Mistakes

Don’t assume the agent knows what you want without confirmation

❌ What not to do

You know what I mean.

Why it fails

The agent can’t read your taste unless you define it.

✅ Do this instead

Ask for a summary and confirmation.

Prompt template

Before generating:
Summarize what you think I want.
Then ask:
- what should be locked
- what should be explored

Don’t let the agent “choose everything” without telling it your priorities

❌ What not to do

Pick the best model.

Why it fails

“Best” depends on what you care about:

  • realism
  • style
  • speed
  • consistency
  • motion quality

✅ Do this instead

State your priorities.

Prompt template

When choosing tools/models, prioritize:
1) [e.g., consistency]
2) [e.g., cinematic lighting]
3) [e.g., speed]
Explain tradeoffs briefly.

The Biggest Mistake of All

Don’t restart when you should refine

❌ What not to do

Throw everything away because one thing is wrong.

Why it fails

You lose progress and you don’t learn what worked.

✅ Do this instead

Preserve what’s good and fix what isn’t.

Prompt template

Keep everything that is working.
Only fix:
[the specific issue]
Show me [2] improved versions.

Final Reminder–The mindset that prevents 80% of mistakes

Most frustration comes from expecting AI to behave like a magic button.

A better expectation is:

  • you are collaborating
  • you are directing
  • you are iterating
  • you are building in layers

And the best part is:

Once you work this way, you get:

  • more control
  • more consistency
  • faster results
  • and way less randomness