Multi-person scene prompts: 5 patterns that actually work
Why bigASP and SDXL merge bodies in 2+ person scenes, and the 5 concrete prompt patterns that keep them apart with anatomically separate, correctly-posed figures.
Why bigASP and SDXL merge bodies in 2+ person scenes, and the 5 concrete prompt patterns that keep them apart with anatomically separate, correctly-posed figures.
Multi-person scenes are where most photoreal NSFW models break. You ask for "two people in bed" and you get conjoined twins with three arms, or a single person with two heads, or two people who share a torso. This isn't a bug — it's a known limitation of how SDXL allocates attention. With the right prompt patterns you can dramatically reduce the failure rate.
This guide is specific to bigASP v2.5 and other SDXL-based realism checkpoints. FLUX handles multi-person much better out of the box.
SDXL has a single attention map that has to represent the entire scene. When two subjects overlap spatially in the prompt's mental model, the model has trouble keeping their token attention separate. By default, it favors merging because:
The 5 patterns below all address one or more of these.
The single most impactful change: put two adults (or three adults) at the start of your prompt, before any other descriptors.
- man and woman intimate scene in bedroom
+ two adults, intimate scene, man and woman, in bedroom
two adults at position 1 hits the model's attention layer at full weight. Putting it later (after color, hair, etc.) lets dilution kick in. Same trick works for groups: three adults, four adults.
Why "adults" and not "people": "people" is more generic, "adults" excludes children and sets the SDXL adult-content frame simultaneously.
Generic "couple in bed" gives the model freedom to merge. Explicit spatial relationships block merging:
- two adults having sex on bed
+ two adults, woman lying on back on bed, man kneeling between her legs, missionary position
- couple intimate on couch
+ two adults, woman straddling man, both facing each other, on living room couch
The model can render "woman on top" or "man behind" because these are common training-tag concepts. It can't render "couple sex" because that's ambiguous.
Spatial templates (use one):
[A] lying on back, [B] on top (missionary)[A] kneeling behind [B] (rear-entry)[A] straddling [B] (cowgirl)[A] facing away from [B] (back-to-back / reverse)[A] kneeling between [B]'s legsEach person gets one short clause, not three sentences. The more you describe each, the more the model dilutes attention:
- two adults, beautiful asian woman with long black hair and dark eyes wearing nothing,
muscular caucasian man with brown hair and beard wearing nothing,
having intimate sex on a bed in a hotel room with soft warm lighting
+ two adults, asian woman, long black hair, nude, on bed, missionary position,
caucasian man on top, athletic build, hotel bedroom, evenly lit
Notice the second version uses 30% fewer words and gives the model less to dilute. Less is more for multi-person.
The single most effective negative prompt addition for multi-person scenes:
merged bodies, conjoined bodies, body fusion, fused limbs, body merging,
two heads on one body, multiple heads, siamese twins, fused faces, merged faces,
overlapping bodies, three arms, three legs, extra limbs, extra heads
Add this specifically to multi-person prompts. ximages auto-applies a baseline that includes most of these, but for tough scenes you can layer more in the user negative field.
Wider canvas = better multi-person separation. Vertical 9:16 squeezes two bodies into less width, increasing collision likelihood. Use:
If you're getting merging with vertical compositions, swap to horizontal and regenerate with the same prompt. Often that alone fixes it.
Bad prompt (common writing):
two beautiful asian women and a handsome guy in passionate threesome
on a bed in dimly lit bedroom with sensual atmosphere
What breaks: vague "passionate threesome", no spatial roles, dense descriptors, "sensual" is fluff.
Fixed prompt:
three adults, threesome on bed,
asian woman in center, lying on back, nude,
asian woman on right, sitting astride, nude,
caucasian man on left, kneeling, nude, athletic build,
clear spatial separation between three figures,
evenly lit bedroom, natural skin pores
Negative additions:
merged bodies, fused limbs, three heads on one body, overlapping figures,
extra arms, extra legs, deformed hands
Aspect ratio: 16:9
Run it 4 times. Two should come out cleanly anatomically separate. Pick the best.
For any multi-person scene, before clicking Generate:
two adults / three adults at startwoman on top of man, man kneeling behind)passionate, intense, sensual)If a specific scene consistently merges across 4+ attempts:
For single-person scenes, the count-word and spatial-language patterns aren't needed and add noise. Pattern 4 (merging negatives) is safe to leave in but contributes nothing. Patterns 3 (concise descriptors) and 5 (aspect ratio) apply to all scenes.
Related: Getting started with bigASP v2.5 covers single-person basics first. 12 prompt traps to avoid covers the broader patterns. bigASP vs FLUX vs RealVisXL explains when to switch models for multi-person work.