Install and run Stable Diffusion 3.5 in ComfyUI

November 18, 2024

Stability AI recently released their latest image generation model, Stable Diffusion 3.5 (SD 3.5), a marked improvement over Stable Diffusion 3.0. This latest model offers image quality that is similar to FLUX and is definitely worth trying out.

This guide will go through installing and running Stable Diffusion 3.5 Large and Turbo in ComfyUI. The large model is generally recommended for better image quality, while the Turbo model is better for fast generation. The workflow we will be using is a simple text-to-image one, but it can easily be adapted to incorporate advanced techniques such as controlnets.

The latest version of ComfyUI supports SD 3.5 without the need to install anything extra. So, the only thing you need to do to get started is download the models (you might have to fill in the agreement form in Huggingface before you can download them):

You can add Stable Diffusion 3.5 Large and/or Stable Diffusion 3.5 Large Turbo to your “ComfyUI/models/checkpoint” folder.
And the three CLIP models: clip_g.safetensors, clip_l.safetensors, and t5xxl_fp16.safetensors, to “ComfyUI/models/clip”. If you have less than 32 GB CPU RAM (not GPU VRAM), you can use t5xxl_fp8_e4m3fn.safetensors instead of t5xxl_fp16.safetensors.

Once you have the models ready, you can drop this image in ComfyUI and start generating right away:

If you want to get the most out of this workflow, there are a few parameters you can experiment with.

Load Checkpoint

This is where you will load the turbo or the large model depending on which one you are using. Alternatively, you can load any community SD 3.5 model or your own custom one.

EmptySD3LatentImage

The width and the height parameters set the dimension of your image in pixels. For best performance, we recommend using one of the following settings — aspect ratio:

1024 x 1024 — 1:1
1152 x 896 — 5:4
1216 x 832 — 3:2
1344 x 768 — 16:9

The batch size is how many images you will generate at once. Making one at a time is the best way to avoid running into out-of-memory errors, but depending on the type of GPU you have, you might be able to do more.

ConditioningSetTimestepRange

These nodes give you control over how much the prompt (top node) and negative prompt (button node) will influence the generation process. For maximum influence, start at 0.000 and end at 1.000. If you want to let the model be more creative, which can improve the quality of the image, you can increase the start and/or decrease the end.

The CLIP Text Encoder (Prompt)

These two nodes are your classic positive and negative prompts.

KSampler

This is where the image is generated. The number of steps affects how long each generation takes and needs to be set to different values depending on whether you are using SD 3.5 Large or SD 3.5 Turbo. For the large model, we recommend using 20 steps and for the turbo model, we recommend 10 steps, which cuts the generation time by about half.

This workflow works on GPUs with 12GB of VRAM or higher. If you don’t have the right hardware or you want to skip the installation process, you can upload your workflow to ViewComfy cloud and access it via a ViewComfy app, an API or the ComfyUI interface.