S

Stable Diffusion 3.5 Large

Stabilityai
Vision
N/A
Context
Free
Input /1M
Free
Output /1M
N/A
Max Output

Stable Diffusion 3.5 Large

Overview

Property Value
Model Name Stable Diffusion 3.5 Large
Model ID stabilityai/stable-diffusion-3.5-large
Developer Stability AI
Model Type Multimodal Diffusion Transformer (MMDiT)
Task Text-to-Image Generation
Research Paper arXiv:2403.03206
License Stability Community License

Description

Stable Diffusion 3.5 Large is the most powerful model in the Stable Diffusion family, featuring superior quality and prompt adherence. It is a Multimodal Diffusion Transformer (MMDiT) text-to-image generative model that generates high-resolution images with fine details.

Key Capabilities

  • Improved Image Quality: Enhanced visual fidelity and detail
  • Enhanced Typography: Superior text rendering in generated images
  • Complex Prompt Understanding: Market-leading prompt adherence, rivaling much larger models
  • Resource Efficiency: Optimized for performance
  • Versatile Styles: Generates diverse image styles including 3D, photography, painting, and line art
  • Diverse Outputs: Produces images representing various people and scenes globally

Pricing

API Provider Pricing

Provider Cost Structure Notes
Replicate $0.065/image (~15 images/$1) Median cost: $0.00087/run
fal.ai $0.065/megapixel Commercial use permitted
Stability AI Credits-based Via Platform API

Compute Costs (Replicate)

Metric Value
Compute Cost $0.0001/second
Output Cost $0.065/image
Median Cost (p50) $0.00087/run
Example Run Time ~8.8 seconds

Supported Parameters

Core Parameters

Parameter Type Range/Values Default Description
prompt string - Required Text description for image generation
negative_prompt string - - Elements to exclude from generation
cfg / guidance_scale number 0-20 3.5-5.0 Classifier-free guidance scale
num_inference_steps integer 1-50 28 Number of denoising steps
seed integer - Random For reproducible outputs

Image Dimensions

Parameter Type Options Default
aspect_ratio enum 16:9, 1:1, 21:9, 2:3, 3:2, 4:5, 5:4, 9:16, 9:21 1:1
width integer Up to 14,142 px 512
height integer Up to 14,142 px 512

Image-to-Image Parameters

Parameter Type Range Default Description
image URI - - Input image for img2img mode
prompt_strength / strength number 0-1 0.83-0.85 Denoising strength

Output Parameters

Parameter Type Options Default
output_format enum webp, jpg, png webp
num_images integer 1-4 1

Advanced Parameters (fal.ai)

Parameter Description
ControlNet Integration with scaling (0-2.0) and timing windows
LoRA weights Multiple supported with user-defined scaling
IP-Adapter Mask threshold adjustment (0.01-0.99)
Safety checker Toggle on/off
Sync mode Data URI returns

API Endpoints

fal.ai Endpoints

Endpoint Method Description
/ POST Standard generation
/turbo POST Turbo mode generation
/image-to-image POST Image-to-image generation
/turbo/image-to-image POST Turbo image-to-image
/inpaint POST Inpainting
/turbo/inpaint POST Turbo inpainting
/health GET Health check

Timeouts (fal.ai)

Setting Value
Request timeout 3600 seconds
Startup timeout 600 seconds
Max concurrency 20
Model Description
Stable Diffusion 3.5 Turbo Optimized for speed, generates images in 4 steps
Stable Diffusion 3.5 Medium Balances quality with customization for consumer hardware
Stable Diffusion 3.0 Previous generation
Stable Diffusion XL Previous architecture

Architecture

Core Architecture

  • Type: Multimodal Diffusion Transformer (MMDiT)
  • Key Feature: QK-normalization (Query-Key Normalization) for improved training stability
  • Output Resolution: Designed for 1 megapixel resolution output

Text Encoders

The model uses 3 fixed, pretrained text encoders:

Encoder Context Length
OpenCLIP-ViT/G 77 tokens
CLIP-ViT/L 77 tokens
T5-xxl 77/256 tokens (variable)

Model Files

stabilityai/stable-diffusion-3.5-large/
├── text_encoders/
│   ├── clip_g.safetensors
│   ├── clip_l.safetensors
│   ├── t5xxl_fp16.safetensors
│   └── t5xxl_fp8_e4m3fn.safetensors
├── sd3_large.safetensors
├── scheduler/
├── transformer/
├── vae/
├── tokenizers (3x)
└── model_index.json

Performance Metrics

Community Statistics (Hugging Face)

Metric Value
Downloads (monthly) 38,883
Community Likes 3.28k
Model Variants (Adapters) 348
Model Variants (Fine-tunes) 33
Model Variants (Merges) 4
Model Variants (Quantizations) 11
Spaces Using Model 100+

Runtime Performance (Replicate)

Metric Value
Run Count 1.8M executions
Hardware H100 GPU
Example Run Time ~8.8 seconds
Status Online (Official)

Infrastructure Requirements

Hardware

Provider Hardware Notes
Replicate H100 GPU Official model
fal.ai GPU-H100 Max concurrency: 20

Self-Hosted Requirements

  • GPU: NVIDIA GPU with sufficient VRAM
  • Precision: bfloat16 recommended
  • Quantization: Supports 4-bit NF4 quantization via BitsAndBytes
  • Memory Optimization: Model CPU offload available

Usage Examples

Basic Usage (Diffusers)

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

image = pipe(
    "A capybara holding a sign that reads Hello World",
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]
image.save("capybara.png")

Quantized Usage (4-bit NF4)

from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from diffusers import StableDiffusion3Pipeline
import torch

model_id = "stabilityai/stable-diffusion-3.5-large"

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model_nf4 = SD3Transformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

pipeline = StableDiffusion3Pipeline.from_pretrained(
    model_id,
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

image = pipeline(
    "A beautiful sunset over mountains",
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]
image.save("sunset.png")

Replicate API

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "A capybara holding a sign that reads Hello World",
      "cfg": 5,
      "aspect_ratio": "1:1",
      "output_format": "webp"
    }
  }' \
  https://api.langmart.ai/v1/predictions

Licensing

Community License (Free Tier)

Criteria Details
Eligible Users Individuals and organizations with less than $1M annual revenue
Research Allowed
Non-commercial Use Allowed
Commercial Use Allowed (under revenue threshold)

Enterprise License

Criteria Details
Eligible Organizations Annual revenue greater than $1M
Contact https://stability.ai/enterprise

Intended Uses

Approved Uses

  • Generation of artworks and design
  • Creative tools and applications
  • Educational purposes
  • Research on generative models and limitations

Out-of-Scope Uses

  • Generating factual/true representations of people or events
  • Historical or factual accuracy requirements (model not trained for this)

Safety and Mitigations

Implemented Measures

  • Filtered training datasets
  • Safety safeguards throughout development
  • Integrity evaluation and red-teaming testing
  • Content safety considerations

Risk Mitigations

Risk Mitigation
Harmful Content Filtered datasets + safeguards; developers should add additional guardrails
Misuse Technical limitations + education + Acceptable Use Policy
Privacy Violations Adherence to privacy regulations recommended

Deployment Options

Self-Hosted

Platform Description
ComfyUI Node-based UI inference
Diffusers Programmatic Python use
GitHub Official implementation

Cloud Platforms

Provider Type
Hugging Face Spaces Official space
Stability AI API Official endpoint
Replicate Third-party API
fal.ai Third-party API

Contact Information

Issue Type Contact
Safety Issues [email protected]
Security Issues [email protected]
Privacy Issues [email protected]
Licensing https://stability.ai/license
Enterprise https://stability.ai/enterprise

Additional Resources

Resource Link
Hugging Face Model Card https://huggingface.co/stabilityai/stable-diffusion-3.5-large
Fine-tuning Guide Official Notion tutorial
Research Paper https://arxiv.org/abs/2403.03206
Acceptable Use Policy https://stability.ai/use-policy
Safety Information https://stability.ai/safety

Last Updated: December 2024 Sources: Hugging Face, Stability AI, Replicate, fal.ai