-
Notifications
You must be signed in to change notification settings - Fork 62
Dev branch for the ToolUseAgent #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new ToolUseAgent and updates benchmarking support by replacing references to bgym.Benchmark and bgym.HighLevelActionSetArgs with their counterparts in agentlab.experiments.benchmark. Key changes include updating import statements and type annotations across study, agent, and benchmark files, as well as adding new benchmark configuration and metadata files.
Reviewed Changes
Copilot reviewed 27 out of 29 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/agentlab/experiments/study.py | Updated benchmark and import references to the new Benchmark class. |
src/agentlab/experiments/reproducibility_util.py | Replaced bgym.Benchmark with the new Benchmark type. |
src/agentlab/experiments/loop.py | Adjusted imports and commented out redundant TapeAgent-related code. |
src/agentlab/experiments/benchmark/*.py | Added new benchmark configuration, utility, and metadata files. |
src/agentlab/agents/tool_use_agent/agent.py | Introduced ToolUseAgent and updated benchmark type references. |
Other agent and agent_configs files | Updated type annotations for benchmark and HighLevelActionSetArgs references. |
Files not reviewed (2)
- src/agentlab/experiments/benchmark/metadata/assistantbench.csv: Language not supported
- src/agentlab/experiments/benchmark/metadata/miniwob.csv: Language not supported
Comes in combination with this bgym PR:
ServiceNow/BrowserGym#340
Description by Korbit AI
What change is being made?
Introduce a new
ToolUseAgent
and supporting benchmark data, and replace existing usage ofbgym.Benchmark
andbgym.HighLevelActionSetArgs
with the newly definedBenchmark
andHighLevelActionSetArgs
fromagentlab.experiments.benchmark
.Why are these changes being made?
These changes are being introduced to expand the functionality of the agent system by adding a
ToolUseAgent
which leverages tool descriptions to perform actions, while also supporting more refined benchmarking capabilities through the newBenchmark
andHighLevelActionSetArgs
classes which allow for more consistent and modular benchmarking configurations. This improves scalability and ease of future adaptations and improvements in the agent's capabilities and testing environments.