Skip to content

Commit 26a4688

Browse files
authored
Merge branch 'huggingface:main' into laplace-scheduler
2 parents 9bd5b09 + d63e6fc commit 26a4688

File tree

82 files changed

+1058
-225
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+1058
-225
lines changed

docker/diffusers-onnxruntime-cpu/Dockerfile

+3-3
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ ENV PATH="/opt/venv/bin:$PATH"
2828
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
2929
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3030
python3 -m uv pip install --no-cache-dir \
31-
torch==2.1.2 \
32-
torchvision==0.16.2 \
33-
torchaudio==2.1.2 \
31+
torch \
32+
torchvision \
33+
torchaudio\
3434
onnxruntime \
3535
--extra-index-url https://download.pytorch.org/whl/cpu && \
3636
python3 -m uv pip install --no-cache-dir \

docs/source/en/_toctree.yml

+30-30
Original file line numberDiff line numberDiff line change
@@ -290,12 +290,12 @@
290290
title: AuraFlowTransformer2DModel
291291
- local: api/models/cogvideox_transformer3d
292292
title: CogVideoXTransformer3DModel
293-
- local: api/models/consisid_transformer3d
294-
title: ConsisIDTransformer3DModel
295293
- local: api/models/cogview3plus_transformer2d
296294
title: CogView3PlusTransformer2DModel
297295
- local: api/models/cogview4_transformer2d
298296
title: CogView4Transformer2DModel
297+
- local: api/models/consisid_transformer3d
298+
title: ConsisIDTransformer3DModel
299299
- local: api/models/dit_transformer2d
300300
title: DiTTransformer2DModel
301301
- local: api/models/easyanimate_transformer3d
@@ -310,12 +310,12 @@
310310
title: HunyuanVideoTransformer3DModel
311311
- local: api/models/latte_transformer3d
312312
title: LatteTransformer3DModel
313-
- local: api/models/lumina_nextdit2d
314-
title: LuminaNextDiT2DModel
315-
- local: api/models/lumina2_transformer2d
316-
title: Lumina2Transformer2DModel
317313
- local: api/models/ltx_video_transformer3d
318314
title: LTXVideoTransformer3DModel
315+
- local: api/models/lumina2_transformer2d
316+
title: Lumina2Transformer2DModel
317+
- local: api/models/lumina_nextdit2d
318+
title: LuminaNextDiT2DModel
319319
- local: api/models/mochi_transformer3d
320320
title: MochiTransformer3DModel
321321
- local: api/models/omnigen_transformer
@@ -324,10 +324,10 @@
324324
title: PixArtTransformer2DModel
325325
- local: api/models/prior_transformer
326326
title: PriorTransformer
327-
- local: api/models/sd3_transformer2d
328-
title: SD3Transformer2DModel
329327
- local: api/models/sana_transformer2d
330328
title: SanaTransformer2DModel
329+
- local: api/models/sd3_transformer2d
330+
title: SD3Transformer2DModel
331331
- local: api/models/stable_audio_transformer
332332
title: StableAudioDiTModel
333333
- local: api/models/transformer2d
@@ -342,10 +342,10 @@
342342
title: StableCascadeUNet
343343
- local: api/models/unet
344344
title: UNet1DModel
345-
- local: api/models/unet2d
346-
title: UNet2DModel
347345
- local: api/models/unet2d-cond
348346
title: UNet2DConditionModel
347+
- local: api/models/unet2d
348+
title: UNet2DModel
349349
- local: api/models/unet3d-cond
350350
title: UNet3DConditionModel
351351
- local: api/models/unet-motion
@@ -354,6 +354,10 @@
354354
title: UViT2DModel
355355
title: UNets
356356
- sections:
357+
- local: api/models/asymmetricautoencoderkl
358+
title: AsymmetricAutoencoderKL
359+
- local: api/models/autoencoder_dc
360+
title: AutoencoderDC
357361
- local: api/models/autoencoderkl
358362
title: AutoencoderKL
359363
- local: api/models/autoencoderkl_allegro
@@ -370,10 +374,6 @@
370374
title: AutoencoderKLMochi
371375
- local: api/models/autoencoder_kl_wan
372376
title: AutoencoderKLWan
373-
- local: api/models/asymmetricautoencoderkl
374-
title: AsymmetricAutoencoderKL
375-
- local: api/models/autoencoder_dc
376-
title: AutoencoderDC
377377
- local: api/models/consistency_decoder_vae
378378
title: ConsistencyDecoderVAE
379379
- local: api/models/autoencoder_oobleck
@@ -521,40 +521,40 @@
521521
- sections:
522522
- local: api/pipelines/stable_diffusion/overview
523523
title: Overview
524-
- local: api/pipelines/stable_diffusion/text2img
525-
title: Text-to-image
524+
- local: api/pipelines/stable_diffusion/depth2img
525+
title: Depth-to-image
526+
- local: api/pipelines/stable_diffusion/gligen
527+
title: GLIGEN (Grounded Language-to-Image Generation)
528+
- local: api/pipelines/stable_diffusion/image_variation
529+
title: Image variation
526530
- local: api/pipelines/stable_diffusion/img2img
527531
title: Image-to-image
528532
- local: api/pipelines/stable_diffusion/svd
529533
title: Image-to-video
530534
- local: api/pipelines/stable_diffusion/inpaint
531535
title: Inpainting
532-
- local: api/pipelines/stable_diffusion/depth2img
533-
title: Depth-to-image
534-
- local: api/pipelines/stable_diffusion/image_variation
535-
title: Image variation
536+
- local: api/pipelines/stable_diffusion/k_diffusion
537+
title: K-Diffusion
538+
- local: api/pipelines/stable_diffusion/latent_upscale
539+
title: Latent upscaler
540+
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
541+
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
536542
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
537543
title: Safe Stable Diffusion
544+
- local: api/pipelines/stable_diffusion/sdxl_turbo
545+
title: SDXL Turbo
538546
- local: api/pipelines/stable_diffusion/stable_diffusion_2
539547
title: Stable Diffusion 2
540548
- local: api/pipelines/stable_diffusion/stable_diffusion_3
541549
title: Stable Diffusion 3
542550
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
543551
title: Stable Diffusion XL
544-
- local: api/pipelines/stable_diffusion/sdxl_turbo
545-
title: SDXL Turbo
546-
- local: api/pipelines/stable_diffusion/latent_upscale
547-
title: Latent upscaler
548552
- local: api/pipelines/stable_diffusion/upscale
549553
title: Super-resolution
550-
- local: api/pipelines/stable_diffusion/k_diffusion
551-
title: K-Diffusion
552-
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
553-
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
554554
- local: api/pipelines/stable_diffusion/adapter
555555
title: T2I-Adapter
556-
- local: api/pipelines/stable_diffusion/gligen
557-
title: GLIGEN (Grounded Language-to-Image Generation)
556+
- local: api/pipelines/stable_diffusion/text2img
557+
title: Text-to-image
558558
title: Stable Diffusion
559559
- local: api/pipelines/stable_unclip
560560
title: Stable unCLIP

docs/source/en/api/loaders/lora.md

+4
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
2020
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
2121
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
2222
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
23+
- [`AuraFlowLoraLoaderMixin`] provides similar functions for [AuraFlow](https://huggingface.co/fal/AuraFlow).
2324
- [`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
2425
- [`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
2526
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
@@ -56,6 +57,9 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
5657
## Mochi1LoraLoaderMixin
5758

5859
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
60+
## AuraFlowLoraLoaderMixin
61+
62+
[[autodoc]] loaders.lora_pipeline.AuraFlowLoraLoaderMixin
5963

6064
## LTXVideoLoraLoaderMixin
6165

docs/source/en/api/pipelines/aura_flow.md

+15
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,21 @@ image = pipeline(prompt).images[0]
8989
image.save("auraflow.png")
9090
```
9191

92+
## Support for `torch.compile()`
93+
94+
AuraFlow can be compiled with `torch.compile()` to speed up inference latency even for different resolutions. First, install PyTorch nightly following the instructions from [here](https://pytorch.org/). The snippet below shows the changes needed to enable this:
95+
96+
```diff
97+
+ torch.fx.experimental._config.use_duck_shape = False
98+
+ pipeline.transformer = torch.compile(
99+
pipeline.transformer, fullgraph=True, dynamic=True
100+
)
101+
```
102+
103+
This enables from 100% (on low resolutions) to a 30% (on 1536x1536 resolution) speed improvements.
104+
105+
Thanks to [AstraliteHeart](https://github.com/huggingface/diffusers/pull/11297/) who helped us rewrite the [`AuraFlowTransformer2DModel`] class so that the above works for different resolutions ([PR](https://github.com/huggingface/diffusers/pull/11297/)).
106+
92107
## AuraFlowPipeline
93108

94109
[[autodoc]] AuraFlowPipeline

docs/source/en/quantization/bitsandbytes.md

+16-16
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
4949
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
5050
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
5151

52-
from diffusers import FluxTransformer2DModel
52+
from diffusers import AutoModel
5353
from transformers import T5EncoderModel
5454

5555
quant_config = TransformersBitsAndBytesConfig(load_in_8bit=True,)
@@ -63,7 +63,7 @@ text_encoder_2_8bit = T5EncoderModel.from_pretrained(
6363

6464
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True,)
6565

66-
transformer_8bit = FluxTransformer2DModel.from_pretrained(
66+
transformer_8bit = AutoModel.from_pretrained(
6767
"black-forest-labs/FLUX.1-dev",
6868
subfolder="transformer",
6969
quantization_config=quant_config,
@@ -74,7 +74,7 @@ transformer_8bit = FluxTransformer2DModel.from_pretrained(
7474
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
7575

7676
```diff
77-
transformer_8bit = FluxTransformer2DModel.from_pretrained(
77+
transformer_8bit = AutoModel.from_pretrained(
7878
"black-forest-labs/FLUX.1-dev",
7979
subfolder="transformer",
8080
quantization_config=quant_config,
@@ -133,7 +133,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
133133
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
134134
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
135135

136-
from diffusers import FluxTransformer2DModel
136+
from diffusers import AutoModel
137137
from transformers import T5EncoderModel
138138

139139
quant_config = TransformersBitsAndBytesConfig(load_in_4bit=True,)
@@ -147,7 +147,7 @@ text_encoder_2_4bit = T5EncoderModel.from_pretrained(
147147

148148
quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True,)
149149

150-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
150+
transformer_4bit = AutoModel.from_pretrained(
151151
"black-forest-labs/FLUX.1-dev",
152152
subfolder="transformer",
153153
quantization_config=quant_config,
@@ -158,7 +158,7 @@ transformer_4bit = FluxTransformer2DModel.from_pretrained(
158158
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
159159

160160
```diff
161-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
161+
transformer_4bit = AutoModel.from_pretrained(
162162
"black-forest-labs/FLUX.1-dev",
163163
subfolder="transformer",
164164
quantization_config=quant_config,
@@ -217,11 +217,11 @@ print(model.get_memory_footprint())
217217
Quantized models can be loaded from the [`~ModelMixin.from_pretrained`] method without needing to specify the `quantization_config` parameters:
218218

219219
```py
220-
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
220+
from diffusers import AutoModel, BitsAndBytesConfig
221221

222222
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
223223

224-
model_4bit = FluxTransformer2DModel.from_pretrained(
224+
model_4bit = AutoModel.from_pretrained(
225225
"hf-internal-testing/flux.1-dev-nf4-pkg", subfolder="transformer"
226226
)
227227
```
@@ -243,13 +243,13 @@ An "outlier" is a hidden state value greater than a certain threshold, and these
243243
To find the best threshold for your model, we recommend experimenting with the `llm_int8_threshold` parameter in [`BitsAndBytesConfig`]:
244244

245245
```py
246-
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
246+
from diffusers import AutoModel, BitsAndBytesConfig
247247

248248
quantization_config = BitsAndBytesConfig(
249249
load_in_8bit=True, llm_int8_threshold=10,
250250
)
251251

252-
model_8bit = FluxTransformer2DModel.from_pretrained(
252+
model_8bit = AutoModel.from_pretrained(
253253
"black-forest-labs/FLUX.1-dev",
254254
subfolder="transformer",
255255
quantization_config=quantization_config,
@@ -305,7 +305,7 @@ NF4 is a 4-bit data type from the [QLoRA](https://hf.co/papers/2305.14314) paper
305305
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
306306
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
307307

308-
from diffusers import FluxTransformer2DModel
308+
from diffusers import AutoModel
309309
from transformers import T5EncoderModel
310310

311311
quant_config = TransformersBitsAndBytesConfig(
@@ -325,7 +325,7 @@ quant_config = DiffusersBitsAndBytesConfig(
325325
bnb_4bit_quant_type="nf4",
326326
)
327327

328-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
328+
transformer_4bit = AutoModel.from_pretrained(
329329
"black-forest-labs/FLUX.1-dev",
330330
subfolder="transformer",
331331
quantization_config=quant_config,
@@ -343,7 +343,7 @@ Nested quantization is a technique that can save additional memory at no additio
343343
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
344344
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
345345

346-
from diffusers import FluxTransformer2DModel
346+
from diffusers import AutoModel
347347
from transformers import T5EncoderModel
348348

349349
quant_config = TransformersBitsAndBytesConfig(
@@ -363,7 +363,7 @@ quant_config = DiffusersBitsAndBytesConfig(
363363
bnb_4bit_use_double_quant=True,
364364
)
365365

366-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
366+
transformer_4bit = AutoModel.from_pretrained(
367367
"black-forest-labs/FLUX.1-dev",
368368
subfolder="transformer",
369369
quantization_config=quant_config,
@@ -379,7 +379,7 @@ Once quantized, you can dequantize a model to its original precision, but this m
379379
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
380380
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
381381

382-
from diffusers import FluxTransformer2DModel
382+
from diffusers import AutoModel
383383
from transformers import T5EncoderModel
384384

385385
quant_config = TransformersBitsAndBytesConfig(
@@ -399,7 +399,7 @@ quant_config = DiffusersBitsAndBytesConfig(
399399
bnb_4bit_use_double_quant=True,
400400
)
401401

402-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
402+
transformer_4bit = AutoModel.from_pretrained(
403403
"black-forest-labs/FLUX.1-dev",
404404
subfolder="transformer",
405405
quantization_config=quant_config,

docs/source/en/quantization/torchao.md

+12-9
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ The example below only quantizes the weights to int8.
2626

2727
```python
2828
import torch
29-
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
29+
from diffusers import FluxPipeline, AutoModel, TorchAoConfig
3030

3131
model_id = "black-forest-labs/FLUX.1-dev"
3232
dtype = torch.bfloat16
3333

3434
quantization_config = TorchAoConfig("int8wo")
35-
transformer = FluxTransformer2DModel.from_pretrained(
35+
transformer = AutoModel.from_pretrained(
3636
model_id,
3737
subfolder="transformer",
3838
quantization_config=quantization_config,
@@ -99,10 +99,10 @@ To serialize a quantized model in a given dtype, first load the model with the d
9999

100100
```python
101101
import torch
102-
from diffusers import FluxTransformer2DModel, TorchAoConfig
102+
from diffusers import AutoModel, TorchAoConfig
103103

104104
quantization_config = TorchAoConfig("int8wo")
105-
transformer = FluxTransformer2DModel.from_pretrained(
105+
transformer = AutoModel.from_pretrained(
106106
"black-forest-labs/Flux.1-Dev",
107107
subfolder="transformer",
108108
quantization_config=quantization_config,
@@ -115,9 +115,9 @@ To load a serialized quantized model, use the [`~ModelMixin.from_pretrained`] me
115115

116116
```python
117117
import torch
118-
from diffusers import FluxPipeline, FluxTransformer2DModel
118+
from diffusers import FluxPipeline, AutoModel
119119

120-
transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_int8wo", torch_dtype=torch.bfloat16, use_safetensors=False)
120+
transformer = AutoModel.from_pretrained("/path/to/flux_int8wo", torch_dtype=torch.bfloat16, use_safetensors=False)
121121
pipe = FluxPipeline.from_pretrained("black-forest-labs/Flux.1-Dev", transformer=transformer, torch_dtype=torch.bfloat16)
122122
pipe.to("cuda")
123123

@@ -131,10 +131,10 @@ If you are using `torch<=2.6.0`, some quantization methods, such as `uint4wo`, c
131131
```python
132132
import torch
133133
from accelerate import init_empty_weights
134-
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
134+
from diffusers import FluxPipeline, AutoModel, TorchAoConfig
135135

136136
# Serialize the model
137-
transformer = FluxTransformer2DModel.from_pretrained(
137+
transformer = AutoModel.from_pretrained(
138138
"black-forest-labs/Flux.1-Dev",
139139
subfolder="transformer",
140140
quantization_config=TorchAoConfig("uint4wo"),
@@ -146,10 +146,13 @@ transformer.save_pretrained("/path/to/flux_uint4wo", safe_serialization=False, m
146146
# Load the model
147147
state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
148148
with init_empty_weights():
149-
transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
149+
transformer = AutoModel.from_config("/path/to/flux_uint4wo/config.json")
150150
transformer.load_state_dict(state_dict, strict=True, assign=True)
151151
```
152152

153+
> [!TIP]
154+
> The [`AutoModel`] API is supported for PyTorch >= 2.6 as shown in the examples below.
155+
153156
## Resources
154157

155158
- [TorchAO Quantization API](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md)

0 commit comments

Comments
 (0)