ProductApril 11, 2026Seedance Team11 min read

Seedance 2.0 Reference: Multi-Modal AI Video from Images, Video & Audio

Seedance 2.0 Reference is the only Seedance variant that accepts up to 9 images, 3 videos, and 3 audio clips as style references. Here's how it changes the game for consistent AI video.

Nine images. Three videos. Three audio clips. One AI video generation. That's the unique pitch for Seedance 2.0 Reference, and it's the only model in the Seedance lineup that treats style like a first-class input rather than a prompt afterthought.

If you've ever tried to wrangle a consistent look across a batch of AI videos, you know the pain. Standard text-to-video models interpret "cinematic," "moody," and "warm film grain" the way a bored intern interprets a brief. Seedance 2.0 Reference skips the interpretation and looks directly at the work you want to emulate.

TL;DR

TL;DR

Accepts up to 9 reference images, 3 reference videos, and 3 audio clips per generation
Pricing: $0.3024 per second → roughly 243 to 907 credits ($2.42 to $9.07) per clip
Output: 720p, 4-15 second duration, native audio sync baked in
Generation time: 60-180 seconds depending on clip length
The only Seedance variant built for multi-modal style transfer
Try it free with 50 credits, no card required at seedance.it.com

What "Multi-Modal Reference" Actually Means

Most AI video models accept one of two things: a prompt, or a prompt plus a single starting image. Seedance 2.0 Reference accepts a whole moodboard's worth of inputs and treats them as a unified style brief.

Upload nine stills from a cyberpunk film and it will match the color grade, the lens feel, and the neon-on-wet-asphalt lighting. Add a short clip that has the exact camera motion you want. Drop in an audio reference for the vibe of the score. The model fuses all of it.

This matters because style is multi-dimensional. A "retro 80s" look is not one thing — it's grain, color temperature, lens flares, film gate weave, and specific compositions. Describing that in words loses information. Showing it with 9 images loses almost none.

Multi-modal AI video from Seedance 2.0 Reference

Ready to match your own visual style? Seedance 2.0 Reference accepts your entire moodboard in a single generation. Try it free with 50 credits.

The Inputs, Explained

Reference images (up to 9). The heart of the model. Use these to lock in color palette, lighting style, subject appearance, composition rules, and texture. More references generally mean tighter style adherence. Nine is the ceiling because beyond that, the fusion starts averaging out your intent.

Reference videos (up to 3). Use these for motion cues — camera moves, pacing, action beats. The model samples motion vectors from the references and blends them into the output. A handheld reference plus a dolly reference will give you something in between.

Reference audio (up to 3). Less about the sound itself and more about the mood it implies. Ambient references guide the visual atmosphere — a stormy audio track will nudge the generation toward stormier visuals even if you didn't specify weather.

Text prompt. Still required. References tell the model how; the prompt tells it what.

How It Compares to Standard Seedance 2.0

Standard Seedance 2.0 is a text-to-video and image-to-video workhorse. It accepts one prompt and optionally one image. That's it. The Reference variant is the same underlying model with a drastically expanded input surface.

| Feature | Seedance 2.0 | Seedance 2.0 Reference | |---|---|---| | Text prompt | Yes | Yes | | Image input | 1 image | Up to 9 images | | Video reference | No | Up to 3 videos | | Audio reference | No | Up to 3 audio clips | | Per-second cost | $0.3034 | $0.3024 | | Best for | Fast generation from a prompt | Style-consistent series, brand videos, lookbooks |

The per-second pricing is nearly identical. What changes is what you can feed the model — and what that unlocks.

Pricing: What You'll Actually Pay

Seedance 2.0 Reference bills at $0.3024 per second of output. That breaks down cleanly:

| Duration | Credits | USD | |---|---|---| | 4 seconds | 243 | $2.42 | | 5 seconds | 303 | $3.02 | | 8 seconds | 484 | $4.84 | | 10 seconds | 605 | $6.05 | | 15 seconds | 907 | $9.07 |

The references themselves are free. You pay for output duration, not input count. Nine images cost the same as one.

New accounts get 50 free credits on signup, no subscription, no card. That's enough to run your first reference-mode test. From there, you top up on the tier that fits:

| Credit Pack | Price | Credits | |---|---|---| | Starter | $10 | 1,050 | | Popular | $25 | 2,750 | | Pro | $50 | 5,750 | | Max | $100 | 12,000 |

See the full breakdown on pricing.

🎨

Try Seedance 2.0 Reference — multi-modal video generation

Match your visual style with up to 9 image references. 50 free credits, no card required.

Try Seedance 2.0 Reference Free

Where Reference Mode Earns Its Keep

Brand videos. Upload your product photography, your existing ad creative, your brand colors. The output matches your visual identity without you having to describe it.

Video series with consistent style. Shooting a 10-part sequence? Use the same reference bundle on every generation and the pieces will feel like they came from one shoot.

Lookbooks and fashion. Reference mode is exceptional at matching a photography style — studio lighting, lens compression, wardrobe color — across multiple models and poses.

Storyboard-to-video. Feed the model your concept art and get moving versions in the same style. No more "close enough" compromises.

Music video pre-viz. Audio references plus visual references plus a prompt equals a rough cut that actually reflects the song's vibe.

The Output: What You Get

Every Seedance 2.0 Reference generation produces:

720p video at 24 fps
4 to 15 seconds of duration (you choose)
Native audio sync — ambient audio generated to match the scene
Three aspect ratios: 16:9, 9:16, 1:1
Delivery in 60 to 180 seconds depending on length

The native audio is worth emphasizing. Even without audio references, you get foley-style ambience baked in. With audio references, you get ambience that matches your mood cue.

Prompt + Reference: The Hybrid Workflow

The mistake people make is leaning entirely on references and writing a lazy prompt. References handle style. The prompt still has to do the heavy lifting on subject, action, and composition.

A good hybrid prompt looks like this:

A woman in a red trench coat walking through a rain-soaked Tokyo alley
at night, neon signs reflecting in puddles, slow dolly forward, handheld
feel, cinematic

With 6 cyberpunk reference images attached, the model knows what "cinematic" means for this shoot. It doesn't have to guess.

How Seedance 2.0 Reference Fits in the ByteDance Lineup

If you've been following ByteDance's AI video releases, you know there's a whole family now: Seedance 1.0 Pro, Seedance 2.0, Seedream for stills, and the OmniHuman avatar line.

Seedance 2.0 Reference is the newest piece and the most specialized. It's not meant to replace standard Seedance 2.0 — for one-off clips from a prompt, standard is faster and cheaper to iterate on. Reference mode is for when you need your output to match something.

Getting Started in Under Five Minutes

Go to seedance.it.com/create/seedance-2-reference
Sign in (50 free credits auto-apply)
Upload 3-9 reference images that represent your target style
Optionally add 1-3 reference videos for motion and 1-3 audio clips for mood
Write your prompt describing the subject and action
Pick a duration (5 seconds is a good first test)
Generate

Your first clip downloads in about 90 seconds.

The Honest Limitations

Reference mode is not magic. If your 9 references contradict each other — half moody horror, half bright comedy — the model will average them and you'll get something bland. Curate tightly.

It also can't generate original dialogue or music. The audio sync is ambient and foley. For lip-synced talking heads you want OmniHuman instead.

And while the model is strong on style transfer, it's not a perfect face-swap tool. Using reference images of a specific person will capture their general look but not their exact likeness. That's a feature, not a bug.

Next Steps

If you want the hands-on tutorial for your first generation, read our style-consistent video guide. If you're deciding between variants, the Reference vs Standard comparison breaks it down. And if you're coming from another platform, the Reference vs Runway piece covers the switch.

Seedance 2.0 Reference is the first AI video model that treats your visual style as input rather than interpretation. Try it with your own references and see what happens.

🎬

Your moodboard, your video

Upload up to 9 images, 3 videos and 3 audio clips. Get a matching AI video in under 3 minutes.

Start Creating Free

Start Creating with Seedance 2.0

Cinema-grade AI video with native audio. Your first clip in about 90 seconds.

50 free credits on signup. No credit card. No subscription.