[rewrite] Transpose initializer -> initializer (transposed) #2158

titaiwangms · 2025-04-02T17:10:12Z

Essentially, we are upstreaming https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_constant_fold.py

If initializer is not consumed by other inputs, we can transpose the initializer in advance.

### Description  Essentially, the vision model is traced differently (this time it's without mask.), and the input indices of op.Add and op.MatMul can be different. Also, fp16 and fp32 need different tracing patterns (op.Cast). 1. Add another traced pattern to CLIP attention to cover no attention_mask case 2. Accept different index of input on op.Add and op.MatMul (be more general) 3. fp16 and fp32 shows different pattern (op.Cast after op.Softmax) 4. Refactor test_fastgelu.py to cover torch.onnx.export(..., dynamo=True) 5. Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32 ### Motivation and Context  To optimize Gemma3 multi-modal model, the changes are needed. https://huggingface.co/google/gemma-3-4b-it NOTE: some related follow-ups (upstream optimizations to onnxscript-optimizer): microsoft/onnxscript#2158 microsoft/onnxscript#2156

titaiwangms self-assigned this Apr 2, 2025

titaiwangms mentioned this issue Apr 2, 2025

Support Gemma3 with Clip fused attention microsoft/onnxruntime#24280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rewrite] Transpose initializer -> initializer (transposed) #2158

[rewrite] Transpose initializer -> initializer (transposed) #2158

titaiwangms commented Apr 2, 2025

[rewrite] Transpose initializer -> initializer (transposed) #2158

[rewrite] Transpose initializer -> initializer (transposed) #2158

Comments

titaiwangms commented Apr 2, 2025