Bigger isn’t always better, especially when running generative AI models on commodity hardware.
This principle underscores Stability AI’s latest release: Stable Diffusion 3 Medium. As Stability AI's flagship model, Stable Diffusion excels in text-to-image generation. A preview of Stable Diffusion 3 was shared on February 22, with public API access beginning on April 17.
The new Stable Diffusion Medium is designed to be a smaller yet highly capable model that operates efficiently on consumer-grade GPUs. This makes Stable Diffusion 3 an appealing option for users and organizations with limited resources seeking effective image generation technology.
Stable Diffusion Medium is available for testing via API and on the Stable Artisan service through Discord. Additionally, the model weights can be accessed for non-commercial use on Hugging Face.
With the introduction of Stable Diffusion Medium, the initial release now goes by Stable Diffusion 3 (SD3) Large, which features 8 billion parameters. In contrast, SD3 Medium has 2 billion parameters. According to Christian Laforte, co-CEO of Stability AI, "Unlike SD3 Large, SD3 Medium is smaller and will run efficiently on consumer hardware."
To run Stable Diffusion Medium, users need only 5GB of GPU VRAM, allowing it to function on various consumer PCs and high-end laptops. While this is the minimum requirement, Stability AI recommends 16GB of GPU VRAM for optimal performance, which, while still reasonable, may pose a challenge for some laptops.
Despite its smaller size, SD3 Medium boasts impressive features comparable to SD3 Large. Laforte emphasizes that SD3 Medium excels in photorealism, prompt adherence, typography, resource efficiency, and fine-tuning. "SD3 Medium matches the capabilities of the SD3 Large API that users appreciate today," he stated.
Users can expect highly realistic image outputs from SD3, thanks to the 16-channel VAE (Variational Autoencoder), which offers greater detail per megapixel than previous models. SD3 also exhibits remarkable natural language prompt adherence, including spatial awareness in image composition.
The model's fine-tuning capabilities make it highly adaptable and efficient in capturing details from fine-tuning datasets. Improved typography is another significant enhancement present in SD3 that carries over to SD3 Medium.
The standout feature of SD3 Medium is its resource efficiency. "The smaller size and modularity of the 2 billion parameter model reduce computational requirements without sacrificing performance," Laforte noted. "This makes SD3 Medium an ideal choice in environments where resource management is critical."