Technical Laboratory
技术实验室
I2V 视频生成管线对比测试 · Image-to-Video Pipeline Comparison
Reference Input
参考输入图像
Reference image used for all I2V tests
所有测试均使用此参考图像作为输入 · 480×672 portrait
⭐ Recommended
推荐管线
Wan 2.2 + SVI LoRA + LightX2V 4段串联 ⭐ 首选管线
首选内容管线 — 画质+运镜+可迭代
⏱️ ~7 分钟(4段)
480×672 4×81帧 / 20秒 16fps 6步 (3+3 uni_pc) CFG 1.5 Shift 8.0 SVI+LightX2V LoRA ~110秒/段 ~18GB VRAM
Wan 2.2 标准模型 + SVI 画质增强 LoRA + LightX2V 加速 LoRA。双 LoRA 叠加非蒸馏替换,强度可调。支持分段重跑(--redo N)和追加(--append)。CAMERA: 前缀控制运镜有效:dolly in / push in 可靠。CLIP Vision 锁原图保一致性。已验证4段串联:走路→推进→近景→冷笑特写。
📝 查看提示词
Positive (seg1): CAMERA: Steady wide shot, locked tripod. A blonde woman in a sheer black top, leather corset, pinstripe overcoat walks forward on a rain-slicked neon-lit city street at night. Her stiletto heels strike wet pavement with purpose. Coat billows behind her. Wind moves through her hair. Neon reflections shimmer on the ground. She gazes forward with quiet power. 35mm anamorphic lens. Positive (seg2): CAMERA: Slow steady dolly in, camera glides forward toward the woman. The blonde woman continues walking toward camera on the wet neon street. Her stride is deliberate, confident. Coat sways with each step. Neon bokeh shifts and grows larger in background as camera closes distance. Rain mist catches colored light around her figure. 50mm prime lens. Positive (seg3): CAMERA: Continuous push in from medium shot to close-up, steady dolly forward. The blonde woman slows her stride and stops. A cold confident smirk forms on her lips. Her piercing eyes lock directly on camera. Wind pushes hair across her cheek. Neon light plays across her features. Background dissolves into warm colored bokeh as lens closes in. Shallow depth of field. 85mm prime lens. Positive (seg4): CAMERA: Locked static close-up, no camera movement. The blonde woman's face fills the frame. She tilts her chin down slightly, then a slow cold confident smirk spreads across her lips. Her piercing eyes stare directly into camera with quiet menace. Wind barely moves a strand of hair across her cheek. Warm neon light reflects in her eyes. Extreme shallow depth of field, background pure colored bokeh. 85mm prime lens. Negative: static, frozen, still image, no movement, sluggish, slow motion, blurry, distorted face, extra limbs, watermark, text, morphing, deformation ⚠️ CAMERA: 前缀分离运镜和主体描述。双LoRA叠加: LightX2V(high_noise strength=0.5) + SVI(strength=1.0)。尾帧串联 + ffmpeg concat。脚本: scripts/svi-chain.py (--redo/--append/--seed)
🖥️ gpu-local · RTX 4090
SVI + InfiniteTalk 组合管线 ⭐ 说话视频
运动→说话完整叙事 — SVI生成动作 + InfiniteTalk驱动嘴型
⏱️ ~8 分钟(总计)
480×672 5段 / 23秒 16fps SVI 4段(运动) + InfiniteTalk 1段(说话) audio_scale=1.5 Wan 2.1 + Wan 2.2 混合
两套管线串联: 前4段用 Wan 2.2 + SVI LoRA + LightX2V 生成运动视频(走路→推进→近景→冷笑),最后一帧作为 InfiniteTalk 输入,配合英文音频生成说话段。音频在最后3秒叠加。关键经验:AudioSeparation 节点输出 index 0=Bass/1=Drums/2=Other/3=Vocals,必须取 Vocals(index 3);audio_scale 1.5 最自然,3.0 偏大,10.0 会炸画面。
📝 查看提示词
SVI 段: 同 SVI 4段串联管线的提示词(CAMERA: 前缀运镜控制) InfiniteTalk 段 Positive: A woman softly talking with subtle expressions. Close-up portrait shot, piercing blue eyes, neon-lit night scene, shallow depth of field. Cinematic lighting, natural lip movement. InfiniteTalk 段 Negative: bright tones, overexposed, static, blurred details, subtitles, worst quality, low quality, JPEG artifacts, ugly, deformed, blurry ⚠️ InfiniteTalk 关键参数: audio_scale=1.5, audio_cfg_scale=1.0, fps=16, steps=4, shift=8.0。AudioSeparation Vocals=index 3。中文 wav2vec2 也能处理英文音频。生成后需 ffmpeg 合并音频到视频。
🖥️ gpu-local · RTX 4090 (Wan 2.1 + Wan 2.2 混合)

Wan 2.1 VACE — 6帧 Overlap 串联 ⭐⭐

Local
最佳推荐 — 完美无缝长视频
⏱️ ~6 分钟
480×672 3×81帧 / 14.4秒 16fps 4步 (LightX2V LoRA) CFG 1.0 6帧 Overlap ~19GB VRAM
WanVideoVACEStartToEndFrame + 6帧overlap串联。每段最后6帧传给下段作为已知帧,VACE从运动惯性延伸生成。裁剪overlap帧后硬拼,接缝完全看不出。
📝 查看提示词
Positive (段1 - 全身中景): [0:00-0:05] Medium full shot from waist up, a blonde woman in a sheer black top, vinyl corset, pinstripe overcoat, choker, stockings and stiletto high heels walks confidently toward camera on a rain-slicked neon-lit city street at night. Face clearly visible, piercing blue eyes looking straight at camera, coat flowing with each stride, wet ground reflecting neon lights, atmospheric fog. Cinematic lighting, film grain, 4k quality. Positive (段2 - 中近景): [0:00-0:05] Medium close-up from chest up, subtle wind moves her blonde hair across her face, piercing blue eyes locked on camera, deliberate powerful stride, her overcoat flowing, neon bokeh in background, atmospheric fog. Cinematic lighting, film grain, 4k quality. Positive (段3 - 特写冷笑): [0:00-0:05] Close-up on her face, she tilts her head slightly, the corner of her mouth curls into a cold contemptuous smirk, disdain and superiority in her eyes, neon reflections on features, cinematic shallow depth of field. Wind in hair, atmospheric night scene. Film grain, 4k quality. Negative (所有段): blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish, face out of frame, faceless ⚠️ 关键方法: WanVideoVACEStartToEndFrame 传6帧batch → input_frames + input_masks → WanVideoVACEEncode。裁掉段2/3前6帧后硬拼。

Wan 2.2 蒸馏 多段运镜硬拼 ⭐⭐

Local
运镜最佳 — 无手风琴效应
⏱️ ~3.5 分钟
480×672 3×81帧 / 15秒 16fps 6步 (uni_pc) shift=12 CFG 1.0 CLIP Vision 锁原图
每段81帧独立运镜,彻底避免Wan 2.2长视频"回归首帧"的手风琴效应。段间用尾帧→首帧硬拼接,CLIP Vision始终用原图保面部一致性。shift=12 + uni_pc sampler(来自YouTube教程优化参数)。
📝 查看提示词
段1 — 低角静态机位: CAMERA: Static locked tripod at low angle, knee height. Full shot of a blonde woman in a sheer black top, vinyl corset, long pinstripe overcoat, choker, stockings with garter straps, stiletto heels. She stands at the far end of a rain-slicked neon-lit city street at night. Rim lighting from behind outlines her silhouette. Atmospheric fog at ankle height. She begins walking slowly toward the camera with a measured commanding stride. Wet pavement reflects red and blue neon. 35mm anamorphic lens, shallow depth of field, film grain. 段2 — Push in 全身→中景: CAMERA: Slow steady push in from medium full shot toward medium shot. Camera moves only forward. A blonde woman in a sheer black top, vinyl corset, long pinstripe overcoat, choker walks toward camera with deliberate measured steps. Her piercing blue eyes are locked onto the lens. Overcoat sways gently. Wet pavement reflects neon beneath her heels. Wind barely moves her blonde hair. Rack-focus shifts to sharp detail on her face. Teal-and-orange color grading. The distance between camera and subject steadily shrinks. Film grain, shallow depth of field. 段3 — Dolly in 到特写+冷笑: CAMERA: Slow dolly in from medium shot to extreme close-up on her face. A blonde woman in a vinyl corset and overcoat slows to a complete stop. She holds eye contact with the camera. A confident cold smirk spreads across her lips. Her chin lifts. Eyes narrow with quiet superiority and disdain. Single sodium-vapor backlight catches her cheekbone. Background dissolves into soft neon bokeh. 85mm prime lens, extreme shallow depth of field. She holds the gaze. Negative (所有段): blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish, face out of frame, faceless, repetitive motion ⚠️ 关键: 每段81帧独立生成,段间尾帧=下段首帧硬拼。CLIP Vision 始终用原图(非尾帧),锁定面部身份。shift=12 + uni_pc 参数。
Wan 2.2 蒸馏 串联 15秒 ⭐ 最佳推荐
最佳推荐 — 速度+画质+长视频
⏱️ 耗时 ~4 分钟(3段×~47秒 + 拼接)
480×672 3×81帧 / 15秒 16fps 4步 (2+2) × 3段 CFG 1.0 Shift 10.0 ~4分钟 ~18GB VRAM
Wan 2.2 LightX2V 蒸馏,尾帧串联 + 硬拼(无 xfade),CLIP Vision 锁定原图保面部一致性,构图规划保脸。
📝 查看提示词
Positive (段1):[0:00-0:03] Medium full shot from waist up, a blonde woman in a sheer black top, vinyl corset, pinstripe overcoat, choker, stockings and stiletto high heels walks confidently toward camera on a rain-slicked neon-lit city street at night. Face clearly visible, piercing blue eyes, coat flowing, wet ground reflecting neon lights, atmospheric fog. Positive (段2):[0:00-0:03] Medium close-up from chest up, camera slowly zooms in, subtle wind moves blonde hair across face, piercing gaze locked on camera, deliberate powerful stride, neon bokeh in background. Positive (段3):[0:00-0:03] Close-up on face, she tilts head slightly, corner of mouth curls into cold contemptuous smirk, disdain and superiority in eyes, neon reflections on features, cinematic shallow depth of field. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish, face out of frame, faceless
🖥️ gpu-local · RTX 4090
Wan 2.1 VACE + LightX2V 4步 ✅ 推荐
最佳画质 + 运动自然度
⏱️ 耗时 ~110 秒(1分50秒)
480×672 81帧 / 5秒 16fps 4 Steps CFG 1.0 Shift 5.0 ~110秒 19.3GB VRAM
block_swap=20, VACE控制, LightX2V cfg+step蒸馏LoRA。画质和运动自然度均为最优。
📝 查看提示词
Positive:[0:00-0:02] Full body shot, woman walks confidently on rain-slicked neon-lit city street at night, wearing sheer black top and vinyl corset, pinstripe overcoat flowing, stockings and stilettos. [0:02-0:04] Medium shot, camera zooms in, wind in blonde hair, piercing blue eyes locked on camera, deliberate powerful stride, neon bokeh. [0:04-0:05] Close-up face, she tilts head, cold contemptuous smirk, disdain in eyes, neon reflections, shallow depth of field. 4k, cinematic lighting, film grain. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish
🖥️ gpu-local · RTX 4090
Wan 2.1 VACE 串联 15秒 (xfade) ⚠️ 已取代
已被6帧overlap版取代
⏱️ 耗时 ~9 分钟(3段×~3分钟 + 拼接)
480×672 3×81帧 / 15秒 16fps 4步×3段 CFG 1.0 ~9分钟 19.3GB VRAM
尾帧串联 + ffmpeg xfade 0.5s 过渡。三段拼接无明显接缝。
📝 查看提示词
与串联硬拼版相同的分段提示词 Positive (段1):[0:00-0:03] Medium full shot from waist up, a blonde woman in a sheer black top, vinyl corset, pinstripe overcoat, choker, stockings and stiletto high heels walks confidently toward camera on a rain-slicked neon-lit city street at night. Face clearly visible, piercing blue eyes, coat flowing, wet ground reflecting neon lights, atmospheric fog. Positive (段2):[0:00-0:03] Medium close-up from chest up, camera slowly zooms in, subtle wind moves blonde hair across face, piercing gaze locked on camera, deliberate powerful stride, neon bokeh in background. Positive (段3):[0:00-0:03] Close-up on face, she tilts head slightly, corner of mouth curls into cold contemptuous smirk, disdain and superiority in eyes, neon reflections on features, cinematic shallow depth of field. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish, face out of frame, faceless
🖥️ gpu-local · RTX 4090
Wan 2.2 I2V 20步 ✅ 画质好
画质好,运动正常
⏱️ 耗时 ~5 分钟(298秒)
480×672 81帧 / 5秒 16fps 20步 (10+10) CFG 5.0 Shift 10.0 ~5分钟 18.5GB VRAM
两段式采样 (high_noise + low_noise),节拍式提示词。ComfyUI v0.18.1 原生节点。
📝 查看提示词
Positive:[0:00-0:02] Medium full shot from waist up, a blonde woman in a sheer black top, vinyl corset, pinstripe overcoat, choker, stockings and stiletto high heels walks confidently toward camera on rain-slicked neon-lit city street at night. Face clearly visible, piercing blue eyes, coat flowing, wet ground reflecting neon. [0:02-0:04] Camera slowly zooms in, wind in hair, eyes locked on camera, neon bokeh. [0:04-0:05] Close-up, cold contemptuous smirk, shallow depth of field. 4k, cinematic lighting, film grain. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish
🖥️ gpu-local · RTX 4090 · ComfyUI v0.18.1
⚠️ Usable
可用管线(有局限)
Wan 2.2 + LightX2V 4步蒸馏 ⚠️ 待验证
待验证画质
⏱️ 耗时 ~47 秒
480×672 81帧 / 5秒 16fps 4步 (2+2) CFG 1.0 Shift 10.0 ~47秒 ~18GB VRAM
蒸馏完整模型替换,6.4倍加速。速度极快但画质需进一步验证。
📝 查看提示词
与 Wan 2.2 20步原版相同的提示词 Positive:[0:00-0:02] Medium full shot from waist up, a blonde woman in a sheer black top, vinyl corset, pinstripe overcoat, choker, stockings and stiletto high heels walks confidently toward camera on rain-slicked neon-lit city street at night. Face clearly visible, piercing blue eyes, coat flowing, wet ground reflecting neon. [0:02-0:04] Camera slowly zooms in, wind in hair, eyes locked on camera, neon bokeh. [0:04-0:05] Close-up, cold contemptuous smirk, shallow depth of field. 4k, cinematic lighting, film grain. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish
🖥️ gpu-local · RTX 4090
LTX 2.3 22B 本地 15秒 ⚠️ 面部问题
面部一致性差
⏱️ 耗时 ~60 秒
544×960 361帧 / 15秒 24fps 8 Steps CFG 1.0 ~60秒 21.5GB VRAM
单次15秒无接缝,速度极快。但面部会变形,不适合角色一致性要求高的场景。
📝 查看提示词
Positive:A blonde woman wearing a sheer black mesh top, vinyl leather corset, pinstripe overcoat draped over her shoulders, black choker, stockings with garters and stiletto high heels walks confidently toward the camera on a rain-slicked neon-lit city street at night. She starts in a medium full shot from the waist up, her face clearly visible with piercing blue eyes looking straight into the lens, her coat flowing with each powerful stride as she moves through atmospheric fog. Wet pavement reflects red and blue neon lights around her. The camera performs a smooth continuous dolly-in movement as she approaches, gradually framing her in a medium close-up. She tilts her head slightly, cold contemptuous smirk. Shot on a 85mm lens, f/1.8 aperture, shallow depth of field, smooth gimbal-stabilized dolly movement, cinematic color grading, film grain, 4K quality. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, comedy, funny face, exaggerated expressions, cartoon ⚠️ LTX 系列不理解 [0:00-0:02] 时间标注格式,必须用连续叙事 + 镜头技术参数
🖥️ gpu-local · RTX 4090
LTX 2.3 Pro Replicate ⚠️ NSFW过滤
NSFW过滤改变了服装和人物
⏱️ 耗时 ~76 秒 + $0.15
1080×1920 10秒 25fps Cloud ~$0.15 ~76秒
最高分辨率1080p,但内容被安全过滤净化,服装和人物外观被修改。
📝 查看提示词
与 LTX 2.3 22B 本地版相同的叙事式提示词 Positive:A blonde woman wearing a sheer black mesh top, vinyl leather corset, pinstripe overcoat draped over her shoulders, black choker, stockings with garters and stiletto high heels walks confidently toward the camera on a rain-slicked neon-lit city street at night. She starts in a medium full shot from the waist up, her face clearly visible with piercing blue eyes looking straight into the lens, her coat flowing with each powerful stride as she moves through atmospheric fog. Wet pavement reflects red and blue neon lights around her. The camera performs a smooth continuous dolly-in movement as she approaches, gradually framing her in a medium close-up. She tilts her head slightly, cold contemptuous smirk. Shot on a 85mm lens, f/1.8 aperture, shallow depth of field, smooth gimbal-stabilized dolly movement, cinematic color grading, film grain, 4K quality. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, comedy, funny face, exaggerated expressions, cartoon
☁️ Replicate Cloud
Wan 2.2 Replicate Fast ⚠️ 慢镜头
慢镜头问题
⏱️ 耗时 ~23 秒 + $0.05
496×784 81帧 / 5秒 16fps Cloud ~$0.05 ~23秒
PrunaAI加速版,disable_safety_checker有效。运动速度偏慢。
📝 查看提示词
Positive:[0:00-0:02] Medium full shot, blonde woman walks confidently toward camera on neon-lit city street at night... [0:02-0:04] Camera zooms in... [0:04-0:05] Close-up, cold smirk... Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish
☁️ Replicate Cloud
❌ Failed
失败实验
Phantom + VACE Merge ❌ 不推荐
灰暗僵硬,不推荐
⏱️ 耗时 ~7 分钟
480×672 81帧 / 5秒 16fps 12 Steps CFG 2.5 ~7分钟 14.3GB VRAM
第三方merge (VACE + Phantom + CausVid),质量全面退化。画面灰暗,运动僵硬,面部表情缺失。
📝 查看提示词
与 Wan 2.1 VACE + LightX2V 版相同的提示词 Positive:[0:00-0:02] Full body shot, woman walks confidently on rain-slicked neon-lit city street at night, wearing sheer black top and vinyl corset, pinstripe overcoat flowing, stockings and stilettos. [0:02-0:04] Medium shot, camera zooms in, wind in blonde hair, piercing blue eyes locked on camera, deliberate powerful stride, neon bokeh. [0:04-0:05] Close-up face, she tilts head, cold contemptuous smirk, disdain in eyes, neon reflections, shallow depth of field. 4k, cinematic lighting, film grain. Negative:blurry, distorted, low quality, watermark, text, static, morphing, deformation, slow motion, frozen, still image, no movement, sluggish
🖥️ gpu-local · RTX 4090
Comparison
全面对比
模型 分辨率 时长 生成耗时 VRAM / 成本 平台 评级
Wan 2.2 + SVI + LightX2V 480×672 4×81帧 / 20秒 ⏱️ ~7 分钟 ~18GB 本地 4090 ⭐⭐⭐ 首选管线
SVI + InfiniteTalk 组合 480×672 5段 / 23秒 ⏱️ ~8 分钟 ~19GB 本地 4090 ⭐⭐⭐ 说话视频
Wan 2.1 VACE 6帧 Overlap 480×672 231帧 / 14.4秒 ⏱️ ~6分钟 ~19GB 本地 4090 ⭐⭐ 最佳推荐
Wan 2.2 蒸馏 运镜硬拼 480×672 243帧 / 15秒 ⏱️ ~3.5 分钟 本地 4090 ⭐⭐ 运镜最佳
Wan 2.2 蒸馏 串联 480×672 15秒 ⏱️ ~4分钟 ~18GB 本地 4090 ⭐⭐ 最佳推荐
Wan 2.1 VACE + LightX2V 480×672 5秒 ⏱️ ~110秒 19.3GB 本地 4090 ⭐ ✅ 最佳
Wan 2.1 VACE 串联 (xfade) 480×672 15秒 ⏱️ ~9分钟 19.3GB 本地 4090 ⚠️ 已取代
Wan 2.2 I2V 20步 480×672 5秒 ⏱️ ~5分钟 18.5GB 本地 4090 ✅ 画质好
Wan 2.2 LightX2V 4步 480×672 5秒 ⏱️ ~47秒 ~18GB 本地 4090 ⚠️ 待验证
LTX 2.3 22B 本地 544×960 15秒 ⏱️ ~60秒 21.5GB 本地 4090 ⚠️ 面部差
LTX 2.3 Pro Replicate 1080×1920 10秒 ⏱️ ~76秒 ~$0.15 Replicate ⚠️ NSFW过滤
Wan 2.2 Replicate Fast 496×784 5秒 ⏱️ ~23秒 ~$0.05 Replicate ⚠️ 慢镜头
Phantom + VACE Merge 480×672 5秒 ⏱️ ~7分钟 14.3GB 本地 4090 ❌ 失败
Learnings
技术笔记
VACE 是关键
VACE (Video Ace) 控制模块显著提升了动作自然度和画面一致性。相比纯 I2V,VACE 管线输出的运动更流畅、面部更稳定。
LightX2V 蒸馏加速
LightX2V 的 cfg+step 蒸馏 LoRA 能将步数从 20 降至 4,速度提升约 5-6 倍,画质损失极小。是目前本地推理最值得的优化。
block_swap 内存管理
block_swap=20 是 RTX 4090 (24GB) 的最优平衡点。过低会 OOM,过高会大幅增加生成时间。在 19-21GB 范围内 VRAM 使用最优。
长视频串联方案
尾帧串联 + ffmpeg xfade 0.5s 是目前最可靠的长视频方案。3×5秒=15秒几乎无接缝。关键是每段的首帧要用上一段的尾帧。
云端 NSFW 过滤
Replicate 和多数云端 API 会对 NSFW 内容进行过滤/修改。LTX Pro 的安全过滤会直接改变服装和外观。只有 Wan 2.2 的 disable_safety_checker 标志有效。
Merge 模型慎用
第三方 merge 模型(如 VACE+Phantom+CausVid 合并)质量极不稳定。画面灰暗、运动僵硬是常见退化症状。建议使用官方蒸馏版本。
Two-Pass 采样策略
Wan 2.2 的两段式采样(high_noise 10步 + low_noise 10步)比单段 20 步画质更好。需要注意 shift 参数对运动幅度的影响。
InfiniteTalk 音频驱动
InfiniteTalk (MeiGen-AI) 基于 Wan 2.1 的音频驱动说话视频模型。audio_scale 控制嘴型幅度(1.5 最佳),audio_cfg_scale>1 开启额外无音频 pass 增加运动但更慢。AudioSeparation 需取 Vocals 通道 (index 3)。可与 SVI 视频串联实现运动→说话的完整叙事。
LTX vs Wan 面部一致性
LTX 2.3 虽然速度快、分辨率高,但面部一致性明显不如 Wan 系列。对于角色驱动的项目,Wan 仍是首选。