02
Create a video idea
The three-step wizard for one video: topic, scene, review. Step 2 is the longest β€” this article walks each piece.
About 5 min
You'll need: onboarding completed
Every new video moves through the same three-step wizard. The screen chrome stays consistent: heading, content, primary CTA at the bottom. Step 2 is where most decisions live; this article breaks it into its constituent UI sections.
TIP
You can jump between steps
At any point, tap a row in the Step 3 review summary to jump back and edit. Your inputs persist β€” the wizard doesn't reset.

Step 1: What's the video about?

Step 1 captures the topic plus optional script notes. The AI uses both to generate hooks, scripts, and captions later in the flow.
9:41
What's the topic?
Keep it short β€” "vendor tips", not a full sentence.
venue tours
Trending ideas
wedding planning tips
dress shopping advice
venue comparison
πŸ’‘ Script tips for venue tours
Open with a hook: "Wait till you see the bride's suite." Use specific details. Call out unexpected features.
Continue
Heading: 'What's the video about?' Sub: 'A topic, a question, or a keyword.' Hairline-underline input + Trending chips + collapsible script notes.

What you see

Heading β€” "What's the video about?" Sub: "A topic, a question, or a keyword."
Topic input β€” bare typographic line with hairline underline that lights up on focus. Placeholder: "e.g. why our espresso tastes different".
Trending chips β€” appear once Promoat finishes analyzing trends for your audience (eyebrow label "Trending"; spinner while loading). Tap to select; a checkmark badge appears on the active chip.
"Or use what you typed" shortcut β€” appears below the chips when you've typed something that doesn't match a trend. Tap to override the trend selection with your typed text.
Add script or notes (optional) β€” chevron toggle. Expanded, reveals a multiline input with the placeholder "Paste a script or describe how it should sound…" Limit 2,000 characters.

Topic best practices

Narrow but not niche. "Wedding venue tips" beats "weddings" but not "where the bride sits during the vows."
Searchable language. Use words people actually search for on TikTok or Instagram.
Action or value. "How to..." or "Why..." framing outperforms "My thoughts on..."
TIP
Notes are for tone, not exact wording
The AI rewrites whatever you paste. Use script notes to nudge tone or structure ("punchy, no jargon"; "open with a question") more than to dictate the exact text β€” generation will rephrase regardless.
1
Type a topic β€” or tap a Trending chip
2
Optionally tap 'Add script or notes' and paste guidance
3
Tap Continue

Step 2: Your scene

Step 2 is the scene composer. Four UI sections stack vertically: the description input (always shown), reference images (collapsed by default), additional media (collapsed by default), and advanced settings (collapsed by default). A description plus a couple of reference photos is more powerful than any one alone.
TIP
References vs. additional media
Reference images shape the look of your scene β€” what's around you, on you, behind you β€” feeding into one composed shot. Additional media are separate clips Promoat cuts into the timeline as their own beats β€” product close-ups, screen recordings, before/afters. Different jobs.

Description (always shown)

The header pairs your portrait (uploaded during onboarding) with the title "Your scene." and the sub "Describe your video." Below: a multiline textarea with a hairline underline plus a horizontal row of audience-tailored suggestion chips.
9:41
Your scene.
Describe your video.
Walking through the venue, pointing out where the ceremony will be.
venue tour
vendor tip
dress fitting
Header: portrait avatar + 'Your scene.' Description textarea with hairline underline + chip row.

What to write

Good: "Walking through the venue, pointing out where the ceremony and reception will be."
Good: "Holding the new candle in both hands, bringing it close to the camera and smelling it."
Too vague: "A video about my product" or "Me talking."

Suggestion chips

The chips below the textarea are tailored to your audience profile. Tap one to drop the chip's full prompt into the input β€” chips have short labels but expand to longer starter sentences when tapped. Edit from there. The active chip is filled dark to confirm selection.
TIP
Visual beats abstract
Describe what someone would
see
, not what your video is "about." "Showing the new candle next to last year's bestseller" reads better than "Comparison of products."

Reference images (optional, collapsed by default)

A collapsible section. Header reads Reference images with an optional tag and a hint that updates as you add photos: "helps the AI match your style" when empty, "3 images added" once filled.
9:41
Reference images
optional
1 image added
HOW THIS WORKS
you
background
dress
cat
β€œMe in this dress, sitting on a chair, cat on my lap, this building behind me.”
YOUR TURN β€” ADD YOUR IMAGES
silk dress
venue
Expanded: 'HOW THIS WORKS' example block at top, 'YOUR TURN β€” Add your images' caption, then your slots.

The HOW THIS WORKS block

At the top of the expanded section, an inline teaching block headed HOW THIS WORKS shows four labeled thumbnails β€” you, background, dress, cat β€” the example prompt "Me in this dress, sitting on a chair, cat on my lap, this building behind me.", a down-arrow, and a single composed output image showing all four ingredients combined into one scene. The block is static β€” it never uses your photos.

Your slots

Below the example, a small caption YOUR TURN β€” Add your images introduces the input row. Three named slots reveal one at a time as you fill them β€” the placeholders are tailored to your audience profile (e.g. for a wedding-planner audience: "your dress" / "your venue" / "your bouquet"). Internally these are holding, setting, and interacting; the labels you see are looser and audience-specific.
Once the three named slots are filled, an "add more" tile appears. You can stack up to 9 reference images total (the portrait you uploaded in onboarding counts as one of the nine). Past three or four, returns diminish β€” the image model starts cramming rather than composing.

Capabilities

Up to 9 reference images per scene.
Each slot has a small text input below it. Type a label (e.g. "silk dress") so the AI knows what's in the image when reading your description.
Filled slots show the cool-gradient border β€” they're treated like the next-to-add slot for visual emphasis.
Tap the Γ— on any filled slot to remove it.
TIP
You can skip references entirely
References help the AI match your real-world setting. If a generated backdrop is fine, leave the section closed and rely on the description.
1
Tap Reference images to expand
2
Read the example block to learn the pattern
3
Tap the gradient-bordered slot to pick a photo
4
Type a short label below the photo
5
Repeat for the next slots; tap 'add more' for a 4th-9th

Additional media (cut-ins, optional)

Same collapsible pattern as Reference images. This section is for clips or screenshots Promoat should insert into the finished video as their own beats β€” separate from you talking. Think product close-ups, app screen recordings, before/after shots, explainer graphics.
9:41
Additional media
optional
will be inserted separately in the video
0:08
product close-up
0:15
before / after
add
Expanded: each cut-in slot has its own placeholder hint and a label input below.

What it's for

Reference images shape the look of your talking-head shot. Additional media plays as separate beats cut into the timeline alongside that shot. Both can stack with the description.

Capabilities

Accepts both images and videos.
Each item gets a small label (e.g. "product close-up", "before / after") that tells the AI when to drop it in.
Slots reveal progressively β€” fill the first to see the second, and so on.
HEADS-UP
Description, references, and media stack
A strong description + two reference photos + one cut-in clip is more powerful than any one alone. Promoat uses all of it together.
1
Tap Additional media to expand
2
Tap a slot to pick a clip or image from your phone
3
Type a label below the slot

Advanced settings (optional)

A small Advanced settings chevron near the bottom of Step 2. Most people leave both controls on Auto β€” Promoat picks based on your description. Open it only when you want manual control over scene count or movement style.
9:41
Advanced settings
Scenes
Auto
1
2
3
4
Auto picks based on your description.
Movement
Just talking
Acting it out
You speak straight to camera. Fast and reliable.
Two controls: Scenes (Auto / 1 / 2 / 3 / 4) and Movement (Auto / Just talking / Acting it out).

Scenes

How many distinct shots/cuts the video contains. Chip row: Auto (sparkles icon, default) and four numeric chips 1 / 2 / 3 / 4. A small helper line below the chips explains the active choice β€” e.g. "One continuous shot β€” fastest, no cuts" for 1, or "3 cuts β€” montage feel, more visual variety" for 3.

Movement

How you appear on screen. Chip row: Auto plus two text chips:
Just talking β€” sub "Eye contact with camera." The synthetic likeness from your portrait is animated lip-syncing to the script. Fast, reliable, classic talking-head look. Internally this is the fabric pipeline.
Acting it out β€” sub "On-camera action." Generates you moving and interacting in the scene (holding the product, walking through the venue) while your voice plays as voiceover. More cinematic. Internally this is the grok pipeline.
TIP
Leave it on Auto unless you have a reason
Auto reads your description and picks the right combination. Manual overrides are useful for specific looks (e.g. one clean shot for a product reveal, four cuts for a fast-paced list).
1
Tap Advanced settings to expand
2
Pick a Scenes count (or leave on Auto)
3
Pick a Movement style (or leave on Auto)
4
Tap Continue at the bottom of the screen

Step 3: Your idea

A one-screen summary with hairline rows and the primary Generate CTA at the bottom.
9:41
Ready to create?
Review your idea before generating.
Topic
venue tours
Scene
Walking through the venue, pointing out ceremony and reception spaces
Cost
20 credits
← Change topic
← Edit scene
Generate video
Heading: 'Your idea.' Sub: 'Tap any line to edit.' Hairline rows for Topic / Scene / References / Cut-ins. Generate CTA with cost badge.

What you see

Heading β€” "Your idea." Sub: "Tap any line to edit."
Topic row β€” your topic. Tap to jump back to Step 1.
Scene row β€” your scene description (or "No scene details β€” we'll improvise." if blank). Tap to jump back to Step 2.
References row β€” count of reference images, if any.
Cut-ins row β€” count of additional media, if any.
Style summary β€” concatenates your movement choice ("Just talking" or "Acting in scene Β· Studio lipsync") with the scene count if you set one manually.
Generate CTA β€” gradient button with a credit-cost badge. Disabled if your balance is below the cost; a hint appears in red below the button to top up.

Credit cost

Credits are Promoat's generation currency. The cost shown on the button is calculated from your scene composition (image generation runs nano-banana / seedream depending on inputs) plus the downstream video pipeline. Acting it out costs more than Just talking because it runs a per-scene motion model. The exact number for this video is shown on the button before you commit.
HEADS-UP
You need enough credits
If your balance is below the cost shown, the Generate button is disabled and a "top up" hint appears below it. Tap your balance in the header or open Profile / Settings to buy more.
1
Read the summary rows
2
Tap any row to jump back and edit
3
Check the credit cost on the Generate button
4
Tap Generate

What happens after Generate

Generation runs server-side, so it keeps going whether you stay on the progress screen, switch tabs, or close the app. Image generation finishes first (usually within a minute); the video render follows. Failed steps automatically refund the credits charged for that step.
When the render lands, the new entry shows up in the Watch tab feed as a Ready overlay. From there you can save, share, re-edit, schedule, or post directly β€” see Feed actions for the full set. To publish to TikTok / Instagram / YouTube Shorts / LinkedIn, link your accounts under Connect social accounts first.
Promoat How-To Wiki Β· Updated regularly