Dev branch for the ToolUseAgent #239

TLSDC · 2025-04-23T18:47:50Z

Comes in combination with this bgym PR:
ServiceNow/BrowserGym#340

Description by Korbit AI

What change is being made?

Introduce a new ToolUseAgent and supporting benchmark data, and replace existing usage of bgym.Benchmark and bgym.HighLevelActionSetArgs with the newly defined Benchmark and HighLevelActionSetArgs from agentlab.experiments.benchmark.

Why are these changes being made?

These changes are being introduced to expand the functionality of the agent system by adding a ToolUseAgent which leverages tool descriptions to perform actions, while also supporting more refined benchmarking capabilities through the new Benchmark and HighLevelActionSetArgs classes which allow for more consistent and modular benchmarking configurations. This improves scalability and ease of future adaptations and improvements in the agent's capabilities and testing environments.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai · 2025-04-23T18:47:56Z

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

Copilot

Pull Request Overview

This PR introduces a new ToolUseAgent and updates benchmarking support by replacing references to bgym.Benchmark and bgym.HighLevelActionSetArgs with their counterparts in agentlab.experiments.benchmark. Key changes include updating import statements and type annotations across study, agent, and benchmark files, as well as adding new benchmark configuration and metadata files.

Reviewed Changes

Copilot reviewed 27 out of 29 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/agentlab/experiments/study.py	Updated benchmark and import references to the new Benchmark class.
src/agentlab/experiments/reproducibility_util.py	Replaced bgym.Benchmark with the new Benchmark type.
src/agentlab/experiments/loop.py	Adjusted imports and commented out redundant TapeAgent-related code.
src/agentlab/experiments/benchmark/*.py	Added new benchmark configuration, utility, and metadata files.
src/agentlab/agents/tool_use_agent/agent.py	Introduced ToolUseAgent and updated benchmark type references.
Other agent and agent_configs files	Updated type annotations for benchmark and HighLevelActionSetArgs references.

Files not reviewed (2)

src/agentlab/experiments/benchmark/metadata/assistantbench.csv: Language not supported
src/agentlab/experiments/benchmark/metadata/miniwob.csv: Language not supported

…_use_agent

TLSDC added 5 commits April 23, 2025 14:37

moving the browsergym.experiment.benchmark module to agentlab

9ee2367

added comment for new parameter

c2e2b9c

BaseMessages take into account 'input_text' key too (for xray)

596fcd2

convenient array to base64 function

f9d7b91

tool agent embryo

73ba428

TLSDC requested review from Copilot, recursix and amanjaiswal73892 and removed request for Copilot April 23, 2025 18:48

Copilot AI reviewed Apr 23, 2025

View reviewed changes

Merge branch 'main' of github.com:ServiceNow/AgentLab into tlsdc/tool…

c11db49

…_use_agent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev branch for the ToolUseAgent #239

Dev branch for the ToolUseAgent #239

TLSDC commented Apr 23, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot commented Apr 23, 2025

Copilot AI left a comment

Dev branch for the ToolUseAgent #239

Are you sure you want to change the base?

Dev branch for the ToolUseAgent #239

Conversation

TLSDC commented Apr 23, 2025 • edited by korbit-ai bot Loading

Description by Korbit AI

What change is being made?

Why are these changes being made?

korbit-ai bot commented Apr 23, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

TLSDC commented Apr 23, 2025 •

edited by korbit-ai bot

Loading