Skip to content

Dev branch for the ToolUseAgent #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Dev branch for the ToolUseAgent #239

wants to merge 6 commits into from

Conversation

TLSDC
Copy link
Collaborator

@TLSDC TLSDC commented Apr 23, 2025

Comes in combination with this bgym PR:
ServiceNow/BrowserGym#340

Description by Korbit AI

What change is being made?

Introduce a new ToolUseAgent and supporting benchmark data, and replace existing usage of bgym.Benchmark and bgym.HighLevelActionSetArgs with the newly defined Benchmark and HighLevelActionSetArgs from agentlab.experiments.benchmark.

Why are these changes being made?

These changes are being introduced to expand the functionality of the agent system by adding a ToolUseAgent which leverages tool descriptions to perform actions, while also supporting more refined benchmarking capabilities through the new Benchmark and HighLevelActionSetArgs classes which allow for more consistent and modular benchmarking configurations. This improves scalability and ease of future adaptations and improvements in the agent's capabilities and testing environments.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

Copy link

korbit-ai bot commented Apr 23, 2025

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

@TLSDC TLSDC requested review from Copilot, recursix and amanjaiswal73892 and removed request for Copilot April 23, 2025 18:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new ToolUseAgent and updates benchmarking support by replacing references to bgym.Benchmark and bgym.HighLevelActionSetArgs with their counterparts in agentlab.experiments.benchmark. Key changes include updating import statements and type annotations across study, agent, and benchmark files, as well as adding new benchmark configuration and metadata files.

Reviewed Changes

Copilot reviewed 27 out of 29 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/agentlab/experiments/study.py Updated benchmark and import references to the new Benchmark class.
src/agentlab/experiments/reproducibility_util.py Replaced bgym.Benchmark with the new Benchmark type.
src/agentlab/experiments/loop.py Adjusted imports and commented out redundant TapeAgent-related code.
src/agentlab/experiments/benchmark/*.py Added new benchmark configuration, utility, and metadata files.
src/agentlab/agents/tool_use_agent/agent.py Introduced ToolUseAgent and updated benchmark type references.
Other agent and agent_configs files Updated type annotations for benchmark and HighLevelActionSetArgs references.
Files not reviewed (2)
  • src/agentlab/experiments/benchmark/metadata/assistantbench.csv: Language not supported
  • src/agentlab/experiments/benchmark/metadata/miniwob.csv: Language not supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant