ControlNet

From Stable Diffusion Wiki
Jump to navigation Jump to search

A powerful computer program like Stable Diffusion that can create pictures from descriptions is already pretty amazing. But someone created a new tool called ControlNet that makes it even better.

Think of a neural network, like Stable Diffusion, that can turn text into images. ControlNet is a new system designed to fine-tune this process, making it more flexible to different tasks. It can work with small amounts of data, be trained quickly, and doesn't require huge computers to run. This system can be tailored to various image-related tasks, making it more efficient and useful.

ControlNet Diagram

ControlNet is a neural network architecture designed to control pre-trained large diffusion models, enabling them to support additional input conditions and tasks. This end-to-end learning approach ensures robustness, even with small training datasets. Training a ControlNet is comparable in speed to fine-tuning a diffusion model, and it can be done on personal devices or scaled up if powerful computation clusters are available. This flexibility makes ControlNet an effective tool for augmenting large diffusion models like Stable Diffusion, allowing for conditional inputs and facilitating diverse applications.

It can best be described by the the very scientists who developed it: "We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k)." https://arxiv.org/abs/2302.05543

In other words, in addition to the word prompts and other numeric parametric inputs, the user can introduce an additional model to further indicate the desired output. There are many different ways this is done. The important point is that whereas before, users were working with one single model (a) and manually adjusting parameters, with ControlNet users are introducing an additional small model (b) that has much more capabilities of influencing the outputs.

Installation

  1. Open "Extensions" tab.
  2. Open "Install from URL" tab in the tab.
  3. Enter `https://github.com/Mikubill/sd-webui-controlnet.git` to "URL for extension's git repository".
  4. Press "Install" button.
  5. Wait for 5 seconds, and you will see the message "Installed into stable-diffusion-webui\extensions\sd-webui-controlnet. Use Installed tab to restart".
  6. Go to "Installed" tab, click "Check for updates", and then click "Apply and restart UI". (The next time you can also use these buttons to update ControlNet.)
  7. Completely restart A1111 webui including your terminal. (If you do not know what is a "terminal", you can reboot your computer to achieve the same effect.)
  8. Download models. The models can be found here. You need to download model files ending with ".pth", when ControlNet is initially installed, the "yaml" file is already included. The files are quite large (~1.5 GB), so you can elect to choose one or two models at a time. You can only use up to 3 ControlNet units at a time anyway.
  9. After you put models in the correct folder, you may need to refresh to see the models. The refresh button is right to your "Model" dropdown.

Control Types

Canny

Canny detects the edges of objects in an image. It produces a layout for the output follow. Works well with single objects or images with very simple backgrounds.

Simple text prior to processing with the Canny Control Type.
Canny Example