Multi-person scenes are where most photoreal NSFW models break. You ask for "two people in bed" and you get conjoined twins with three arms, or a single person with two heads, or two people who share a torso. This isn't a bug — it's a known limitation of how SDXL allocates attention. With the right prompt patterns you can dramatically reduce the failure rate.

This guide is specific to bigASP v2.5 and other SDXL-based realism checkpoints. FLUX handles multi-person much better out of the box.

Why bodies merge

SDXL has a single attention map that has to represent the entire scene. When two subjects overlap spatially in the prompt's mental model, the model has trouble keeping their token attention separate. By default, it favors merging because:

Training data bias — most adult content is single-person, so the "two distinct people" pattern is rarer in training
Attention dilution — describing two people doubles the descriptive tokens, diluting each one's signal
No native pose conditioning — SDXL doesn't know "two bodies = two skeletons" the way ControlNet does

The 5 patterns below all address one or more of these.

Pattern 1: Set count explicitly, early

The single most impactful change: put two adults (or three adults) at the start of your prompt, before any other descriptors.

- man and woman intimate scene in bedroom
+ two adults, intimate scene, man and woman, in bedroom

two adults at position 1 hits the model's attention layer at full weight. Putting it later (after color, hair, etc.) lets dilution kick in. Same trick works for groups: three adults, four adults.

Why "adults" and not "people": "people" is more generic, "adults" excludes children and sets the SDXL adult-content frame simultaneously.

Pattern 2: Spatial language — say where each is

Generic "couple in bed" gives the model freedom to merge. Explicit spatial relationships block merging:

- two adults having sex on bed
+ two adults, woman lying on back on bed, man kneeling between her legs, missionary position

- couple intimate on couch
+ two adults, woman straddling man, both facing each other, on living room couch

The model can render "woman on top" or "man behind" because these are common training-tag concepts. It can't render "couple sex" because that's ambiguous.

Spatial templates (use one):

[A] lying on back, [B] on top (missionary)
[A] kneeling behind [B] (rear-entry)
[A] straddling [B] (cowgirl)
[A] facing away from [B] (back-to-back / reverse)
[A] kneeling between [B]'s legs

Pattern 3: Short descriptors per person (the "one breath" rule)

Each person gets one short clause, not three sentences. The more you describe each, the more the model dilutes attention:

- two adults, beautiful asian woman with long black hair and dark eyes wearing nothing,
  muscular caucasian man with brown hair and beard wearing nothing,
  having intimate sex on a bed in a hotel room with soft warm lighting
  
+ two adults, asian woman, long black hair, nude, on bed, missionary position,
  caucasian man on top, athletic build, hotel bedroom, evenly lit

Notice the second version uses 30% fewer words and gives the model less to dilute. Less is more for multi-person.

Pattern 4: Use negative prompts for merging

The single most effective negative prompt addition for multi-person scenes:

merged bodies, conjoined bodies, body fusion, fused limbs, body merging,
two heads on one body, multiple heads, siamese twins, fused faces, merged faces,
overlapping bodies, three arms, three legs, extra limbs, extra heads

Add this specifically to multi-person prompts. ximages auto-applies a baseline that includes most of these, but for tough scenes you can layer more in the user negative field.

Pattern 5: Aspect ratio matters

Wider canvas = better multi-person separation. Vertical 9:16 squeezes two bodies into less width, increasing collision likelihood. Use:

3:4 or 4:3 — best for couples
16:9 — best for 3+ people
1:1 — works for tight close-ups but tough for full-body
9:16 — works for single-person; struggles with 2+ side-by-side

If you're getting merging with vertical compositions, swap to horizontal and regenerate with the same prompt. Often that alone fixes it.

Putting it together: a worked example

Bad prompt (common writing):

two beautiful asian women and a handsome guy in passionate threesome
on a bed in dimly lit bedroom with sensual atmosphere

What breaks: vague "passionate threesome", no spatial roles, dense descriptors, "sensual" is fluff.

Fixed prompt:

three adults, threesome on bed,
asian woman in center, lying on back, nude,
asian woman on right, sitting astride, nude,
caucasian man on left, kneeling, nude, athletic build,
clear spatial separation between three figures,
evenly lit bedroom, natural skin pores

Negative additions:

merged bodies, fused limbs, three heads on one body, overlapping figures,
extra arms, extra legs, deformed hands

Aspect ratio: 16:9

Run it 4 times. Two should come out cleanly anatomically separate. Pick the best.

Quick checklist

For any multi-person scene, before clicking Generate:

[ ] two adults / three adults at start
[ ] Each person gets ONE short descriptor clause
[ ] Explicit spatial relationship (woman on top of man, man kneeling behind)
[ ] No fluff verbs (passionate, intense, sensual)
[ ] Negative includes merge/fuse terms
[ ] Aspect ratio is 3:4, 4:3, or 16:9 (not 9:16 for 2+ people)
[ ] Generate 4 candidates, not 1 — multi-person has higher variance

What if it still merges?

If a specific scene consistently merges across 4+ attempts:

Drop one descriptor per person — you may still be over-describing
Try horizontal aspect if you weren't
Switch to FLUX (we run it as a fallback provider) — FLUX handles 3+ people more reliably
Inpaint after: generate one clean person, then inpaint the second

When NOT to use these patterns

For single-person scenes, the count-word and spatial-language patterns aren't needed and add noise. Pattern 4 (merging negatives) is safe to leave in but contributes nothing. Patterns 3 (concise descriptors) and 5 (aspect ratio) apply to all scenes.

Related: Getting started with bigASP v2.5 covers single-person basics first. 12 prompt traps to avoid covers the broader patterns. bigASP vs FLUX vs RealVisXL explains when to switch models for multi-person work.

Multi-person scene prompts: 5 patterns that actually work · ximages