By gerogero
Updated: April 4, 2025
This is a beginner’s guide to help you install Wan and implement every available optimization to maximize the speed of video generation.
Now achieving this involves trade-offs in quality, but you can easily disable any of the optimizations if you prefer to prioritize quality over speed.
The included guide and workflows are tailored for GPUs with 24GB or more of VRAM, typically utilizing 21-23GB during generation. While it’s possible to use a GPU with less than 24GB, you’ll need to make adjustments. For example, a 16GB GPU can use FP8/Q8 models, provided you increase the virtual_vram_gb or block swapping settings in the provided workflows. We’ll get to these later.
If you’re under 16GB, you’ll probably want to use the models quantized below Q8, but keep in mind that using a lower quantization level will reduce the quality of your outputs. In general, the lower you go, the lower the quality you get.
ComfyUI Portable
ComfyUI Manager
CUDA 12.6
Wan 2.1 can be integrated into ComfyUI through two approaches: Native support or Kijai’s Wrapper. Kijai’s Wrapper has additional features that Native does not (flowedit, vid2vid, etc), while Native boasts several advantages unavailable in Kijai’s version. These are; support for gguf models, Adaptive Guidance (a method to speed up generations at the cost of quality), and TorchCompile compatibility across not only the 40XX and 50XX GPU series, but also the 30XX series, which speeds up generations by an additional 30% or so. So if you’re using less than 24GB of VRAM and/or want to the fastest gen speeds, Native is likely the better option.
Once you’ve settled on a method and its associated workflow, proceed to the general installation steps.
Download these modified versions of Kijai’s default workflows. Beyond the optimizations and a few extra features, they use Alibaba’s default settings as a baseline. The workflow outputs two videos, raw 16 fps and an interpolated 32 fps version. You can easily adapt these to use the 720P model/setting. See Generating at 720P.
/ldg/ KJ i2v 480p workflow: ldg_kj_i2v_14b_480p.json
(updated 17th March 2025)
/ldg/ KJ t2v 480p workflow: ldg_kj_t2v_14b_480p.json
(updated 17th March 2025)
Do NOT use Comfy model files with KJ’s! You MUST use these or you will encounter issues!
Download these modified versions of Comfy’s workflows, based on an anon’s from /ldg/. Beyond the optimizations and a few extra features, they use Alibaba’s default settings as a baseline. The workflow outputs two videos, raw 16 fps and an interpolated 32 fps version. You can easily adapt these to use the 720P model/setting. See Generating at 720P.
/ldg/ Comfy i2v 480p workflow: ldg_cc_i2v_14b_480p.json
(updated 17th March 2025)
/ldg/ Comfy t2v 480p workflow: ldg_cc_t2v_14b_480p.json
(updated 17th March 2025)
Do NOT use Kijai’s text encoder files with these models! You MUST use these text encoders or it will error out before generating with Exception during processing !!! mat1 and mat2 shapes cannot be multiplied (77x768 and 4096x5120)
pytorch version: 2.7.0.dev20250306+cu126
is shown during startup. You should also see Enabled fp16 accumulation
and Using sage attention
.There’s a possible bug when you update extensions or restart which reports an incorrect version of pytorch. If that happens, close Comfy and restart. This seems to happen most often if you use the “Restart” button in comfy after updating extensions, so close it manually and start it up manually after updating extensions. It can also happen after updating Comfy. If upon a second restart it still isn’t 2.7.0dev
, do step 5 again.
ComfyUI-GGUF
extension.If it still complains about missing nodes after installing them and restarting Comfy, you might need to install the missing nodes manually. If this happens using KJ’s wrapper, install the wrapper manually from his repo, deleting the old version from custom_nodes beforehand. Same goes for KJNodes if it complains about missing WanVideoEnhanceAVideoKJ
. Make sure you follow the install instructions for the portable install.
2.7.0dev
or fp16_fast / fp16 accumulation
won’t work.The initial generation time you get is NOT accurate. Teacache kicks in during the gen, and Adaptive about midway through if you’re on Comfy Native/Core.
When a video finishes generating, you’ll get two files in their own i2v or t2v directories and subdirectories. The raw files are the 16 frame outputs while the int files are interpolated to 32 frames which gives you much smoother motion.
It is highly recommended you enable previews during generation. If you followed the guide, you’ll have the extension required. Go to ComfyUI Settings (the cog icon at the bottom left) and search for “Display animated previews when sampling”. Enable it. Then open Comfy Manager and set Preview method to TAESD (slow). The output will become clearer by about step 10, and you’ll get a general sense of the composition and movement. This can and will save you a lot of time, as you can cancel gens early if you don’t like how they look.
NEVER use the 720p i2v model at 480p resolutions and vice versa. If you use the 720p i2v model and set your res to 832×480 for example, the output you get will be much worse than simply using the 480p i2v model. You won’t ever improve quality by genning 480p on the 720p model, so don’t do it. The only model which allows you to mix 480p and 720p resolutions is t2v 14B.
Each model is trained and fine-tuned for specific resolutions. In theory, deviating from these precise resolutions may produce poorer results compared to sticking with the supported ones, especially for i2v.
However, in my experience, I’ve successfully used non-standard resolutions with i2v without noticeable problems, as long as the adjustments remained reasonable. For example, you should avoid making drastic departures from 480p or 720p, and always anchor one dimension – either 480 for 480p models or 720 for 720p models – while scaling the other dimension downward (never upward) to adjust the aspect ratio. This means one dimension should consistently be fixed at either 480 or 720, depending on the model, with the other dimension adjusted lower as needed. And you never want to exceed the maximum set value of 832 for 480p and 1280 for 720p, as you’ll drastically increase generation time and go outside the bounds of the resolution limits set by the model’s developers.
These are the ‘supported’ resolutions as listed in Wan’s official repo :
Text to Video – 1.3B | Text to Video – 14B | Image to Video – 480p | Image to Video – 720p |
---|---|---|---|
480*832 | 720*1280 | 832*480 | 1280*720 |
832*480 | 1280*720 | 480*832 | 720*1280 |
624*624 | 960*960 | ||
704*544 | 1088*832 | ||
544*704 | 832*1088 | ||
480*832 | |||
832*480 | |||
624*624 | |||
704*544 | |||
544*704 |
If you want to use the 720p model in i2v or 720p res on t2v, you’ll need to:
Several options in this guide speed up inference time. They are fp16_fast (fp16 accumulation), TeaCache, Torch Compile, AdaptiveGuidance (exclusive to Comfy Native) and Sage Attention. If you wish to disable them for testing or to increase quality at the expense of time, do the following :
Complicated desired outputs = Complex prompts with mix of natural language and tags Complex prompt structure and order: Simple Prompt Example: Resu...
This guide was created to bring inspiration to this visual vocabulary. There is a short description for each pose so that you can connect the word ...
GPT-4o, released on March 25, 2025 went viral soon after release, bolstered by the Studio Ghibli animation style trend. Most people are curious if ...
This guide is intended to get you generating quality NSFW images as quickly as possible with Automatic1111 Stable Diffusion WebUI. We’ll be u...