Browser tools
1. Conceptual Overview
Browser tools in JiuwenSwarm enable driving a real Chrome instance for form filling, clicks, uploads, and web tasks. Users first configure Chrome path in the web UI and start the browser service; the system launches an attachable Chrome instance. When needed, the agent connects to and controls this browser.
1.1 Key Capabilities
The browser tools currently support the following capabilities:
- Open web pages and wait for loading
- Continue operations on already logged-in websites
- Click buttons, input text, select page elements
- Execute multi-step web tasks
- Reuse the same browser session to reduce repeated logins
- Read page titles, URLs, and page content when needed
- Support complex tasks like file uploads, email composition, and web form submissions
1.2 Typical Use Cases
- Web Information Extraction: Extract structured information from news sites, document pages, etc.
- Email Operations: Log in to email to send messages, check inbox, download attachments
- Form Filling: Automatically fill out online registration, application forms
- Online Shopping: Browse products, compare prices, add to cart
- Enterprise System Operations: Complete approvals, queries, etc. in internal systems
2. Quick Start (Frontend Operations Only)
Install Chrome first.
2.1 Step 1: Chrome path and profile
- Open Chrome.
- Visit
chrome://version. - Note:
- Executable path →
CHROME_PATH - Profile path → confirms user data for login debugging
- Executable path →
Where:
- Executable path is generally the complete path to
chrome.exe. - Profile directory helps confirm the current browser account and data directory, facilitating troubleshooting of login state or authorization issues.
2.2 Step 2: Open the browser service panel
-
Open the JiuwenSwarm web UI.

-
Go to Settings → Browser service.
-
Find the Chrome path field.

2.3 Step 3: Set CHROME_PATH
-
Copy the executable path from
chrome://version.
-
Paste into CHROME_PATH and Save path.
2.4 Step 4: Start the browser service
- Click Start browser service.
- A new Chrome window should open — that instance is controlled by the agent.
This popped-up Chrome is the browser instance that can be controlled by the agent later.
2.5 Step 5: Log in manually
For mail, SSO, or intranet, complete login in that Chrome window first.
If your tasks require website login, email authorization, enterprise system access, or using existing account states, complete the necessary manual operations in that Chrome, such as:
- Log in to Gmail / Outlook / corporate email
- Complete manual authentication via SMS, verification code, QR code scan, etc.
- Allow site access permissions
- Open the target page to be operated
2.6 Step 6: Use from chat
Ask the agent to open pages, fill forms, or continue logged-in flows. It uses the authorized Chrome, not a cold profile.
After completing authorization, you can ask the agent to perform browser tasks in the conversation, such as:
- Open a webpage and read information
- Continue clicking and filling forms after login
- Compose emails, upload attachments, wait for confirmation in email
When the task requires a browser, the agent will control this already authorized Chrome, not create a completely stateless browser again.
3. Usage Tips
To make the browser tools more stable, follow these recommendations:
- Prefer using the real Chrome installed locally rather than a temporary browser.
- After starting the browser service, prioritize completing login and authorization in the popped-up Chrome.
- For long-flow tasks, try to maintain the same session to avoid frequently changing browser states.
- For scenarios involving email, enterprise systems, online banking, etc., it is recommended to complete manual login first before letting the agent continue operations.
- If the browser state is abnormal, restart the browser service and test again.
- Ensure
PLAYWRIGHT_CDP_URLmatches the debugging address and port inconfig.yaml. - Keep
BROWSER_ALLOW_SHORT_TIMEOUT_OVERRIDE=0to prevent the model from splitting browser tasks into too many short calls.
4. Practical Cases
4.1 Case 1: Web Information Extraction
Task Description: Extract news titles and summaries from a specific web page
Operation Steps:
- Start the browser service and complete necessary authorization
- Enter in the conversation: "Please extract today's headline news title and summary from https://example.com/news"
- The agent will automatically call browser tools to access the specified web page
- The browser will parse the page content and extract the required information
- The agent will organize the extracted information and return it to the user
4.2 Case 2: Email Sending (with Attachment)
Task Description: Send an email with attachment using Gmail
Operation Steps:
- Start the browser service
- Log in to Gmail account in the popped-up Chrome
- Enter in the conversation: "Please send an email to test@example.com, subject: Test Email, content: This is a test email, attachment: /path/to/file.pdf"
- The agent will call browser tools to open Gmail compose page
- The browser will automatically fill in recipient, subject, content, and upload the attachment
- After sending is complete, the agent will notify the user that the email has been sent
Note: Operation screenshots will be added after future frontend updates. Each case will display 1-2 images, mainly showing the key execution process and final results.
5. Backend Configuration
5.1 Configuration Files Overview
The browser tool configuration involves several core files that work together to configure and run the browser:
config/config.yaml: Mainly configures Chrome startup parameters, such as Chrome executable path, remote debugging address and port. These configurations are the foundation for browser startup..env: Configures browser runtime, MCP connection, Playwright parameters, timeout settings and other environment variables that affect the browser's runtime behavior..env.template: Environment variable template file containing all available environment variables and their default values, which can be used as a configuration reference.
The relationship between these three files is: config/config.yaml provides the basic parameters for browser startup, and .env provides the runtime environment configuration. Together, they ensure the browser tools work properly.
5.2 Browser Configuration in config.yaml
| Configuration Item | Type | Default Value | Description |
|---|---|---|---|
browser.chrome_path |
string/map | - | Chrome executable path, can be a single string or a map by OS |
browser.remote_debugging_address |
string | "127.0.0.1" | Chrome remote debugging listening address |
browser.remote_debugging_port |
integer | 9222 | Chrome remote debugging port |
browser.user_data_dir |
string | "" | Chrome user data directory, use system default when empty |
browser.profile_directory |
string | "Default" | Chrome Profile name to use |
Example configuration by OS:
browser:
chrome_path:
windows: "C:\\Users\\YOUR_USER\\AppData\\Local\\Google\\Chrome\\Application\\chrome.exe"
macos: "/Applications/Google Chrome.app"
linux: "/usr/bin/google-chrome"
remote_debugging_address: "127.0.0.1"
remote_debugging_port: 9222
user_data_dir: ""
profile_directory: "Default"
5.2 Browser Configuration in .env
5.2.1 Browser MCP Wrapper Configuration
| Environment Variable | Default Value | Description |
|---|---|---|
BROWSER_RUNTIME_MCP_ENABLED |
1 | Whether to enable browser MCP wrapper |
BROWSER_RUNTIME_MCP_CLIENT_TYPE |
streamable-http | MCP client type |
BROWSER_RUNTIME_MCP_SERVER_ID |
playwright_runtime_wrapper | Wrapper identifier information |
BROWSER_RUNTIME_MCP_SERVER_NAME |
playwright-runtime-wrapper | Wrapper name |
BROWSER_RUNTIME_MCP_SERVER_PATH |
http://127.0.0.1:8940/mcp | Wrapper access address |
BROWSER_RUNTIME_MCP_TIMEOUT_S |
300 | MCP connection or call layer timeout control |
BROWSER_RUNTIME_MCP_HOST |
127.0.0.1 | Host used when automatically starting wrapper locally |
BROWSER_RUNTIME_MCP_PORT |
8940 | Port used when automatically starting wrapper locally |
BROWSER_RUNTIME_MCP_PATH |
/mcp | Path used when automatically starting wrapper locally |
BROWSER_RUNTIME_MCP_COMMAND |
- | Wrapper startup command override, usually left empty |
BROWSER_RUNTIME_MCP_ARGS |
- | Wrapper startup arguments override, usually left empty |
BROWSER_RUNTIME_MCP_AUTO_SSE_FALLBACK |
1 | Whether to allow SSE fallback in certain modes |
5.2.2 Official Playwright MCP Configuration
| Environment Variable | Default Value | Description |
|---|---|---|
PLAYWRIGHT_MCP_COMMAND |
npx | Command to start official Playwright MCP |
PLAYWRIGHT_MCP_ARGS |
-y @playwright/mcp@latest | Arguments to start official Playwright MCP |
PLAYWRIGHT_CDP_URL |
http://127.0.0.1:9222 | CDP address to connect to the started Chrome, should match debugging address and port in config.yaml |
5.2.3 Timeout and Execution Strategy Configuration
| Environment Variable | Default Value | Description |
|---|---|---|
PLAYWRIGHT_TOOL_TIMEOUT_S |
300 | Total watchdog timeout for browser tool execution |
BROWSER_TIMEOUT_S |
300 | Default long timeout for browser tasks; if the model passes a smaller timeout_s, it will be clamped to at least this value |
BROWSER_ALLOW_SHORT_TIMEOUT_OVERRIDE |
0 | Whether to allow the model to shorten task timeouts, recommended to keep as 0 |
5.3 Recommended Minimal Configuration
To use the browser tools properly, ensure at least the following fields are correct:
config/config.yaml
browser:
chrome_path: "C:\\Users\\YOUR_USER\\AppData\\Local\\Google\\Chrome\\Application\\chrome.exe"
remote_debugging_address: "127.0.0.1"
remote_debugging_port: 9222
user_data_dir: ""
profile_directory: "Default"
.env
BROWSER_RUNTIME_MCP_ENABLED=1
BROWSER_RUNTIME_MCP_CLIENT_TYPE=streamable-http
BROWSER_RUNTIME_MCP_SERVER_PATH=http://127.0.0.1:8940/mcp
PLAYWRIGHT_MCP_COMMAND=npx
PLAYWRIGHT_MCP_ARGS=-y @playwright/mcp@latest
PLAYWRIGHT_CDP_URL=http://127.0.0.1:9222
PLAYWRIGHT_TOOL_TIMEOUT_S=300
BROWSER_TIMEOUT_S=300
BROWSER_ALLOW_SHORT_TIMEOUT_OVERRIDE=0
6. Principle and Code Architecture
6.1 Technical Architecture
The core flow of the current browser tools is as follows:
- Web UI: Provides a "Browser service" panel for configuring Chrome executable path and starting/stopping the browser service
- Backend Application: Starts local Chrome with remote debugging capabilities via
browser_start_client.py - Browser Runtime: Based on Playwright MCP encapsulation, browser tools communicate with the runtime via MCP client
- Agent Call: When the agent calls tools like
browser_run_task, the runtime converts natural language tasks into browser operation steps - Session Management: Browsers are reused by
session_id, maintaining login state, page context, and authorization status within the same session
In simple terms:
Frontend
jiuwenswarm/channels/web/frontend/src/components/BrowserPanel/index.tsx— path, save, start service.
Backend
app.py—path.get,path.set,browser.start, etc.jiuwenswarm/agents/harness/common/tools/browser_start_client.py— Chrome launch fromconfig.yaml.jiuwenswarm/agents/harness/common/tools/browser_tools.py— MCP client, auto-start wrapper.jiuwenswarm/agents/harness/common/tools/browser-move/src/playwright_runtime_mcp_server.py— MCP server..../playwright_runtime/runtime.py,service.py,agents.py,config.py— runtime orchestration.
UI config → start Chrome → runtime attaches → agent runs tasks
6.2 Core Code
The core code of the browser tools is mainly distributed in the following modules:
- Tool Management Module:
jiuwenswarm/agents/harness/common/tools/is the underlying module that manages all tools in the system. Browser-related tools are mainly implemented under this module. - Frontend Interface Module:
jiuwenswarm/channels/web/frontend/is responsible for user interface interaction.
Specific file descriptions:
| Module | File Path | Function Description |
|---|---|---|
| Frontend Browser Service Panel | jiuwenswarm/channels/web/frontend/src/components/BrowserPanel/index.tsx |
Responsible for reading path, saving path, triggering "Start browser service" |
| Backend Application Entry | app.py |
Provides frontend call entries like path.get, path.set, browser.start |
| Chrome Startup Script | tools/browser_start_client.py |
Reads browser.* configuration from config/config.yaml, starts Chrome with remote debugging capabilities |
| Browser MCP Access | tools/browser_tools.py |
Browser MCP wrapper access, automatic startup, client patch, configuration building |
| Browser Runtime MCP Server | tools/browser-move/src/playwright_runtime_mcp_server.py |
Browser runtime MCP server entry |
| Browser Runtime Orchestration Layer | tools/browser-move/src/playwright_runtime/runtime.py |
Browser runtime orchestration layer |
| Browser Task Execution | tools/browser-move/src/playwright_runtime/service.py |
Browser task execution, session reuse, timeout guardrails |
| Browser Runtime Configuration | tools/browser-move/src/playwright_runtime/config.py |
Playwright MCP and browser runtime configuration parsing |
7. Summary
The essence of browser tools is to allow the agent to execute web operations on your already authorized real Chrome; the frontend is responsible for configuration and startup, while the backend is responsible for takeover and automated execution.