You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Description
<!-- Describe your changes. -->
Essentially, the vision model is traced differently (this time it's
without mask.), and the input indices of op.Add and op.MatMul can be
different. Also, fp16 and fp32 need different tracing patterns
(op.Cast).
1. Add another traced pattern to CLIP attention to cover no
attention_mask case
2. Accept different index of input on op.Add and op.MatMul (be more
general)
3. fp16 and fp32 shows different pattern (op.Cast after op.Softmax)
4. Refactor test_fastgelu.py to cover torch.onnx.export(...,
dynamo=True)
5. Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To optimize Gemma3 multi-modal model, the changes are needed.
https://huggingface.co/google/gemma-3-4b-it
NOTE: some related follow-ups (upstream optimizations to
onnxscript-optimizer):
microsoft/onnxscript#2158microsoft/onnxscript#2156
### Description
<!-- Describe your changes. -->
Essentially, the vision model is traced differently (this time it's
without mask.), and the input indices of op.Add and op.MatMul can be
different. Also, fp16 and fp32 need different tracing patterns
(op.Cast).
1. Add another traced pattern to CLIP attention to cover no
attention_mask case
2. Accept different index of input on op.Add and op.MatMul (be more
general)
3. fp16 and fp32 shows different pattern (op.Cast after op.Softmax)
4. Refactor test_fastgelu.py to cover torch.onnx.export(...,
dynamo=True)
5. Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To optimize Gemma3 multi-modal model, the changes are needed.
https://huggingface.co/google/gemma-3-4b-it
NOTE: some related follow-ups (upstream optimizations to
onnxscript-optimizer):
microsoft/onnxscript#2158microsoft/onnxscript#2156
Essentially, we are upstreaming https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_constant_fold.py
If initializer is not consumed by other inputs, we can transpose the initializer in advance.
The text was updated successfully, but these errors were encountered: