feature(nyz&dcy): add LLM/VLM reward model #859

PaParaZz1 · 2025-03-10T05:47:34Z

Description

math rule reward model
math reward model like Qwen/Qwen2.5-Math-PRM-7B
VLM reward model

Related Issue

#548

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

codecov · 2025-03-10T06:20:32Z

Codecov Report

Attention: Patch coverage is 18.73990% with 503 lines in your changes missing coverage. Please review.

Project coverage is 74.82%. Comparing base (8f48cb1) to head (1cd74e5).
Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
ding/reward_model/math_rule_reward_model.py	10.00%	333 Missing ⚠️
...eward_model/tests/test_multi_modal_reward_model.py	34.37%	42 Missing ⚠️
ding/reward_model/math_reward_model.py	32.69%	35 Missing ⚠️
ding/reward_model/multi_modal_reward_model.py	31.25%	33 Missing ⚠️
ding/reward_model/tests/test_math_reward_model.py	17.50%	33 Missing ⚠️
.../reward_model/tests/test_math_rule_reward_model.py	35.71%	27 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #859      +/-   ##
==========================================
- Coverage   75.37%   74.82%   -0.56%     
==========================================
  Files         701      707       +6     
  Lines       57026    57955     +929     
==========================================
+ Hits        42982    43362     +380     
- Misses      14044    14593     +549

Flag	Coverage Δ
unittests	`74.82% <18.73%> (-0.54%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaParaZz1 · 2025-03-16T04:30:05Z

ding/reward_model/math_reward_model.py

+        self.logger = logger
+        self.tb_logger = tb_logger
+
+        # 初始化tokenizer和model


English comments

feature(nyz): add basic math reward model interfaces

04a586b

PaParaZz1 added enhancement New feature or request algo Add new algorithm or improve old one labels Mar 10, 2025

PaParaZz1 mentioned this pull request Mar 10, 2025

Roadmap for DI-engine #548

Open

style(nyz): polish flake8 style

ab5f6e7

(dcy) add math_reward_model and its test file

60d88f9

PaParaZz1 commented Mar 16, 2025

View reviewed changes

(dcy) add math_rule_reward_model and its test file

68db31f

Berit-chengyi force-pushed the dev-rm-verifier branch 2 times, most recently from d519b12 to 828cd4d Compare March 17, 2025 06:57

polish flake8

7314bff

Berit-chengyi force-pushed the dev-rm-verifier branch from 828cd4d to 7314bff Compare March 17, 2025 08:13

Berit-chengyi force-pushed the dev-rm-verifier branch from ea83f44 to 7314bff Compare April 2, 2025 08:43

(dcy)polish flake8 add multimodal_rewardmodel and test

1cd74e5

Berit-chengyi force-pushed the dev-rm-verifier branch from d8e9868 to 1cd74e5 Compare April 2, 2025 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(nyz&dcy): add LLM/VLM reward model #859

feature(nyz&dcy): add LLM/VLM reward model #859

PaParaZz1 commented Mar 10, 2025 •

edited

Loading

codecov bot commented Mar 10, 2025 •

edited

Loading

PaParaZz1 Mar 16, 2025

feature(nyz&dcy): add LLM/VLM reward model #859

Are you sure you want to change the base?

feature(nyz&dcy): add LLM/VLM reward model #859

Conversation

PaParaZz1 commented Mar 10, 2025 • edited Loading

Description

Related Issue

TODO

Check List

codecov bot commented Mar 10, 2025 • edited Loading

Codecov Report

PaParaZz1 Mar 16, 2025

Choose a reason for hiding this comment

PaParaZz1 commented Mar 10, 2025 •

edited

Loading

codecov bot commented Mar 10, 2025 •

edited

Loading