Stable Diffusion 3.5 Large
Overview
| Property |
Value |
| Model Name |
Stable Diffusion 3.5 Large |
| Model ID |
stabilityai/stable-diffusion-3.5-large |
| Developer |
Stability AI |
| Model Type |
Multimodal Diffusion Transformer (MMDiT) |
| Task |
Text-to-Image Generation |
| Research Paper |
arXiv:2403.03206 |
| License |
Stability Community License |
Description
Stable Diffusion 3.5 Large is the most powerful model in the Stable Diffusion family, featuring superior quality and prompt adherence. It is a Multimodal Diffusion Transformer (MMDiT) text-to-image generative model that generates high-resolution images with fine details.
Key Capabilities
- Improved Image Quality: Enhanced visual fidelity and detail
- Enhanced Typography: Superior text rendering in generated images
- Complex Prompt Understanding: Market-leading prompt adherence, rivaling much larger models
- Resource Efficiency: Optimized for performance
- Versatile Styles: Generates diverse image styles including 3D, photography, painting, and line art
- Diverse Outputs: Produces images representing various people and scenes globally
Pricing
API Provider Pricing
| Provider |
Cost Structure |
Notes |
| Replicate |
$0.065/image (~15 images/$1) |
Median cost: $0.00087/run |
| fal.ai |
$0.065/megapixel |
Commercial use permitted |
| Stability AI |
Credits-based |
Via Platform API |
Compute Costs (Replicate)
| Metric |
Value |
| Compute Cost |
$0.0001/second |
| Output Cost |
$0.065/image |
| Median Cost (p50) |
$0.00087/run |
| Example Run Time |
~8.8 seconds |
Supported Parameters
Core Parameters
| Parameter |
Type |
Range/Values |
Default |
Description |
prompt |
string |
- |
Required |
Text description for image generation |
negative_prompt |
string |
- |
- |
Elements to exclude from generation |
cfg / guidance_scale |
number |
0-20 |
3.5-5.0 |
Classifier-free guidance scale |
num_inference_steps |
integer |
1-50 |
28 |
Number of denoising steps |
seed |
integer |
- |
Random |
For reproducible outputs |
Image Dimensions
| Parameter |
Type |
Options |
Default |
aspect_ratio |
enum |
16:9, 1:1, 21:9, 2:3, 3:2, 4:5, 5:4, 9:16, 9:21 |
1:1 |
width |
integer |
Up to 14,142 px |
512 |
height |
integer |
Up to 14,142 px |
512 |
Image-to-Image Parameters
| Parameter |
Type |
Range |
Default |
Description |
image |
URI |
- |
- |
Input image for img2img mode |
prompt_strength / strength |
number |
0-1 |
0.83-0.85 |
Denoising strength |
Output Parameters
| Parameter |
Type |
Options |
Default |
output_format |
enum |
webp, jpg, png |
webp |
num_images |
integer |
1-4 |
1 |
Advanced Parameters (fal.ai)
| Parameter |
Description |
| ControlNet |
Integration with scaling (0-2.0) and timing windows |
| LoRA weights |
Multiple supported with user-defined scaling |
| IP-Adapter |
Mask threshold adjustment (0.01-0.99) |
| Safety checker |
Toggle on/off |
| Sync mode |
Data URI returns |
API Endpoints
fal.ai Endpoints
| Endpoint |
Method |
Description |
/ |
POST |
Standard generation |
/turbo |
POST |
Turbo mode generation |
/image-to-image |
POST |
Image-to-image generation |
/turbo/image-to-image |
POST |
Turbo image-to-image |
/inpaint |
POST |
Inpainting |
/turbo/inpaint |
POST |
Turbo inpainting |
/health |
GET |
Health check |
Timeouts (fal.ai)
| Setting |
Value |
| Request timeout |
3600 seconds |
| Startup timeout |
600 seconds |
| Max concurrency |
20 |
| Model |
Description |
| Stable Diffusion 3.5 Turbo |
Optimized for speed, generates images in 4 steps |
| Stable Diffusion 3.5 Medium |
Balances quality with customization for consumer hardware |
| Stable Diffusion 3.0 |
Previous generation |
| Stable Diffusion XL |
Previous architecture |
Architecture
Core Architecture
- Type: Multimodal Diffusion Transformer (MMDiT)
- Key Feature: QK-normalization (Query-Key Normalization) for improved training stability
- Output Resolution: Designed for 1 megapixel resolution output
Text Encoders
The model uses 3 fixed, pretrained text encoders:
| Encoder |
Context Length |
| OpenCLIP-ViT/G |
77 tokens |
| CLIP-ViT/L |
77 tokens |
| T5-xxl |
77/256 tokens (variable) |
Model Files
stabilityai/stable-diffusion-3.5-large/
├── text_encoders/
│ ├── clip_g.safetensors
│ ├── clip_l.safetensors
│ ├── t5xxl_fp16.safetensors
│ └── t5xxl_fp8_e4m3fn.safetensors
├── sd3_large.safetensors
├── scheduler/
├── transformer/
├── vae/
├── tokenizers (3x)
└── model_index.json
| Metric |
Value |
| Downloads (monthly) |
38,883 |
| Community Likes |
3.28k |
| Model Variants (Adapters) |
348 |
| Model Variants (Fine-tunes) |
33 |
| Model Variants (Merges) |
4 |
| Model Variants (Quantizations) |
11 |
| Spaces Using Model |
100+ |
| Metric |
Value |
| Run Count |
1.8M executions |
| Hardware |
H100 GPU |
| Example Run Time |
~8.8 seconds |
| Status |
Online (Official) |
Infrastructure Requirements
Hardware
| Provider |
Hardware |
Notes |
| Replicate |
H100 GPU |
Official model |
| fal.ai |
GPU-H100 |
Max concurrency: 20 |
Self-Hosted Requirements
- GPU: NVIDIA GPU with sufficient VRAM
- Precision: bfloat16 recommended
- Quantization: Supports 4-bit NF4 quantization via BitsAndBytes
- Memory Optimization: Model CPU offload available
Usage Examples
Basic Usage (Diffusers)
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large",
torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")
image = pipe(
"A capybara holding a sign that reads Hello World",
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
image.save("capybara.png")
Quantized Usage (4-bit NF4)
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from diffusers import StableDiffusion3Pipeline
import torch
model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16
)
pipeline = StableDiffusion3Pipeline.from_pretrained(
model_id,
transformer=model_nf4,
torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()
image = pipeline(
"A beautiful sunset over mountains",
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
image.save("sunset.png")
Replicate API
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input": {
"prompt": "A capybara holding a sign that reads Hello World",
"cfg": 5,
"aspect_ratio": "1:1",
"output_format": "webp"
}
}' \
https://api.langmart.ai/v1/predictions
Licensing
| Criteria |
Details |
| Eligible Users |
Individuals and organizations with less than $1M annual revenue |
| Research |
Allowed |
| Non-commercial Use |
Allowed |
| Commercial Use |
Allowed (under revenue threshold) |
Enterprise License
Intended Uses
Approved Uses
- Generation of artworks and design
- Creative tools and applications
- Educational purposes
- Research on generative models and limitations
Out-of-Scope Uses
- Generating factual/true representations of people or events
- Historical or factual accuracy requirements (model not trained for this)
Safety and Mitigations
Implemented Measures
- Filtered training datasets
- Safety safeguards throughout development
- Integrity evaluation and red-teaming testing
- Content safety considerations
Risk Mitigations
| Risk |
Mitigation |
| Harmful Content |
Filtered datasets + safeguards; developers should add additional guardrails |
| Misuse |
Technical limitations + education + Acceptable Use Policy |
| Privacy Violations |
Adherence to privacy regulations recommended |
Deployment Options
Self-Hosted
| Platform |
Description |
| ComfyUI |
Node-based UI inference |
| Diffusers |
Programmatic Python use |
| GitHub |
Official implementation |
| Provider |
Type |
| Hugging Face Spaces |
Official space |
| Stability AI API |
Official endpoint |
| Replicate |
Third-party API |
| fal.ai |
Third-party API |
Additional Resources
Last Updated: December 2024
Sources: Hugging Face, Stability AI, Replicate, fal.ai