agent-studio/backend/openjiuwen_studio/core/executor · openJiuwen/agent-studio - AtomGit

ZYQ5333fix(mcp-stdio): correct discovery and invocation logic for stdio MCP plugins

文件	最后提交记录	最后更新时间
agent	code check fix Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !1066 merge feature/executions into develop code check fix Created-by: aharonamir1 Commit-by: @aharonamir1 Merged-by: ZYQ5333 Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md 2) If you want to contribute your code but don't know who will review and merge, please add label `openJiuwen-assistant` to the pull request, we will find and do it as soon as possible. --> What type of PR is this? /kind bug What does this PR do / why do we need it: --- Summary - Show version (except 'draft') for execution - Fix running execution elapsed time accuracy by computing elapsed_ms server-side and sending server_time_ms for client-server clock offset correction - Fix stale waterfall showing previous execution's data during a new run Changes Backend - workflow_runner.py — Re-enabled incremental save_trace_details() per span so running nodes appear in the waterfall as they complete - agent_trace_utils.py — Same incremental write pattern for agent executions - trace_summary_repository.py — get_running_traces_by_space now returns server-computed elapsed_ms and server_time_ms; get_trace_summary_by_trace_id falls back to live TraceDetailDB data for in-progress traces - execution.py — Removed response_model from /get_running_traces to allow server_time_ms and elapsed_ms through without Pydantic stripping Frontend - ExecutionsPage.tsx — Calculates timeOffset = Date.now() - server_time_ms on each poll; clears trace state on tab switch; auto-selects running > active > completed - ExecutionList.tsx — Uses server elapsed_ms for running entries, applies timeOffset to correct clock skew for active entries, deduplicates active/running entries - executionPanelService.ts / types.ts — Updated response types for new server_time_ms and elapsed_ms fields Test plan - Start a workflow execution — verify it appears as "Running" in the Workflows tab - Start an agent execution — verify it appears as "Running" in the Agents tab - While running, verify elapsed time is accurate (not 2x or skewed by clock difference) - After completion, verify finished duration displays correctly (not a negative number) - Switch tabs — verify no stale data from previous tab - Completed executions (from old WorkflowExecutionDB) still display correctly Which issue(s) this PR fixes: Fixes [#868](https://gitcode.com/openJiuwen/agent-studio/issues/868) Self-checklist:（Please check carefully,and mark an x in the [] brackets. We will review your completion status.） + - [ ] Design: Has the solution corresponding to the PR been reviewed by the Maintainer, and have all review comments been replied to and revised + - [x] Test: Has the code in the PR been fully covered by UT/ST test cases, and have the newly added test cases been uploaded to the repository along with this PR or already uploaded. + - [x] Verification: Does the PR description contains a detailed description of the verification results regarding the achievement of the expected goals for the Feature, Refactor, and Bugfix to this PR. + - [ ] Interface: Does it involve changes to external interfaces? The corresponding changes have been approved by the interface review organization, and the annotation information for the API has been correctly refreshed. + - [ ] Document: Does it involve modifications to the official website documentation? If so, please submit the materials to the Doc repository in a timely manner. <!-- Special notes for your reviewers: --> <!-- + - [ ] Whether it causes forward compatibility failure --> <!-- + - [ ] Whether the dependent third-party library change is involved --> See merge request: openJiuwen/agent-studio!1066	16 天前
component	HTTP Node: fix get url from previous node Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !1068 merge http-body-params into develop HTTP Node: fix get url from previous node Created-by: aharonamir1 Commit-by: @aharonamir1 Merged-by: ZYQ5333 Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: https://gitcode.com/openJiuwen/community/blob/master/CONTRIBUTING.md 2) If you want to contribute your code but don't know who will review and merge, please add label `openjiuwen-assistant` to the pull request, we will find and do it as soon as possible. --> What type of PR is this? /kind bug Issue [#869](https://gitcode.com/openJiuwen/agent-studio/issues/869) Self-checklist:（请自检，在[ ]内打上x，我们将检视你的完成情况，否则会导致pr无法合入） + - [ ] 设计：PR对应的方案是否已经经过Maintainer评审，方案检视意见是否均已答复并完成方案修改 + - [ ] 测试：PR中的代码是否已有UT/ST测试用例进行充分的覆盖，新增测试用例是否随本PR一并上库或已经上库 + - [ ] 验证：PR描述信息中是否已包含对该PR对应的Feature、Refactor、Bugfix的预期目标达成情况的详细验证结果描述 + - [ ] 接口：是否涉及对外接口变更，相应变更已得到接口评审组织的通过，API对应的注释信息已经刷新正确 + - [ ] 文档：是否涉及官网文档修改，如果涉及请及时提交资料到Doc仓 <!-- Special notes for your reviewers: --> <!-- + - [ ] 是否导致无法前向兼容 --> <!-- + - [ ] 是否涉及依赖的三方库变更 --> See merge request: openJiuwen/agent-studio!1068	16 天前
evaluation	feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI) Co-authored-by: Michael<michael.atamuk@huawei.com> Co-authored-by: adi_amir<adi.amir1@huawei.com> Co-authored-by: nizzan<nizzan.kimhi@huawei.com> Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !1023 merge evaluation into develop feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI) Created-by: michaelhuawei Commit-by: Michael;michaelhuawei;aharonamir1;@aharonamir1;nikita-mee;nizzan;adi_amir Merged-by: ZYQ5333 Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: [https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md](https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md) 2) If you want to contribute your code but don't know who will review and merge, please add label `openJiuwen-assistant` to the pull request. --> What type of PR is this? /kind feature /kind refactor --- ## What does this PR do / why do we need it This PR introduces the Evaluation System for Agents and Workflows, a major new module that provides first‑class, systematic evaluation capabilities across all OpenJiuwen workflow patterns. It enables teams to measure correctness, reliability, semantic quality, latency, token usage, and regression behavior for any workflow or agent. The system solves three long‑standing gaps: 1. No regression detection — previously no structured way to verify that workflow changes preserved correctness. 2. No comparative measurement — no shared metrics to compare versions of agents/workflows. 3. No sampling support — LLM nondeterminism required multi‑trial evaluation, which did not exist. This feature adds a complete backend + frontend evaluation pipeline, including suites, tasks, graders, metrics, benchmark loading, and a full results UI. --- ## Which issue(s) this PR fixes Fixes #<issue-number> --- ## What scenarios were tested, and what were the verification results Functional verification - Created evaluation suites, added tasks, updated tasks, deleted tasks. - Loaded all seven benchmark YAML files; validated correct task creation. - Ran evaluation against workflows and agents with deterministic, model-based, and code-based graders. - Verified pattern detection across all six structural patterns (Routing, Chaining, Parallelization, Orchestrator‑Worker, Evaluator‑Optimizer, Memory Usage). - Confirmed correct grader behavior: deterministic checks, LLM judge calls, code-based execution, weight aggregation. - Confirmed metrics engine correctness: pass/fail, pass@k, pass^k, score distribution, latency stats, token usage, reliability, per-grader breakdown. - Verified custom aggregate metrics execution, including error handling. - Confirmed run lifecycle: RUNNING → COMPLETED/FAILED, immutability of completed runs. - Verified large-suite behavior (50 tasks × 5 trials) and UI rendering of large trace sets. Performance verification - Execution engine calls scale linearly with tasks × trials. - Model-based graders correctly issue LLM judge calls per trial. - No regressions to existing workflow/agent execution performance. Reliability verification - Flakiness metric validated using multi-trial runs. - Pattern detection validated with synthetic traces and real workflows. - Code-based grader error paths tested (exceptions, invalid returns). Frontend verification - Full CRUD for suites and tasks. - Run dialog correctly configures workflow/agent target and trial count. - Results UI renders Overview, Metrics, Graders, and Traces tabs with correct visibility rules. - Zustand store state transitions validated. --- ## Self-checklist + - [x] Design: Reviewed with maintainers; all comments addressed. + - [x] Test: Full UT/ST coverage for harness, graders, metrics, pattern validator, API, and frontend store. + - [x] Verification: PR description includes detailed verification results for feature, refactor, and bugfix aspects. + - [x] Interface: Adds new external API endpoints under `/evaluation`; no breaking changes to existing interfaces. + - [x] Document: Benchmark usage, suite/task schema, and evaluation workflow documented; docs PR prepared separately. --- ## Special notes for reviewers - This module is fully additive — no existing endpoints or execution logic are modified. - Code-based graders and custom metrics use `exec()`; this is an accepted constraint for v1 and will be hardened later. - Large evaluation runs can produce multi‑MB result payloads; pagination is planned for a future release. - Pattern detection is heuristic; tasks may override `pattern_type` explicitly. See merge request: openJiuwen/agent-studio!1023	28 天前
plugin	fix(mcp-stdio): correct discovery and invocation logic for stdio MCP plugins Co-authored-by: michaelhuawei<michael.atamuk@huawei.com> # message auto-generated for no-merge-commit merge: !1074 merge fix/mcp-stdio-plugin-params into develop fix(mcp-stdio): correct discovery and invocation logic for stdio MCP plugins Created-by: michaelhuawei Commit-by: michaelhuawei Merged-by: ZYQ5333 Description: What type of PR is this? /kind bug What does this PR do / why do we need it: This PR fixes stdio MCP plugin discovery and invocation, which were still broken even after the fix for #835. The root cause was that the backend constructed incorrect `params` for stdio plugins, and the invocation path attempted to execute the `.py` script directly instead of running it via Python. --- ## ✔ 1. Fix: Stdio discovery was broken ### Root cause `_build_safe_stdio_params` incorrectly used: args = [config.url] But for stdio plugins: - `config.url` is always `""` - The actual script path is stored in: - `config.params["command"]` - `config.params["args"]` - `config.params["env"]` ### Fix Discovery now uses: script_path = params["command"] or config.url or "" args = params["args"] env = params["env"] This matches what `StdioClient` expects. --- ## ✔ 2. Fix: Stdio invocation was broken ### Root cause `plugin_tools.py` passed: command = params["command"] # e.g. "/path/to/server.py" This caused: because the `.py` file was executed directly instead of via Python. ### Fix Invocation now treats: - `params["command"]` as the script path - `sys.executable` as the actual executable Updated logic: script_path = params["command"] extra_args = params["args"] mcp_params["command"] = sys.executable mcp_params["args"] = [script_path] + extra_args Now the process launches as: python /path/to/server.py "arg1" "arg2" --- ## 🎉 Result - Stdio plugin discovery works - Stdio plugin invocation works - Both discovery and execution now correctly use: - Python interpreter - Script path from DB - Args and env from DB This completes the stdio MCP plugin support. --- Which issue(s) this PR fixes: Follow‑up to [#835](https://gitcode.com/openJiuwen/agent-studio/issues/835) --- Code review checklist: + - [ ] whether to verify the function's return value + - [ ] Whether to comply with SOLID principle / Demeter's law + - [ ] Whether there is UT test case && the test case is valid (if no test case, explain why) + - [ ] Whether the API change is involved + - [ ] Whether official document modification is involved See merge request: openJiuwen/agent-studio!1074	15 天前
util	fix: check plugin url for SSRF Co-authored-by: kendel11<zhangdanyang5@huawei.com> # message auto-generated for no-merge-commit merge: !966 fix: check plugin url for SSRF From: @kendel11 Reviewed-by: @cyz95, @ZYQ5333 See merge request: openJiuwen/agent-studio!966	1 个月前
workflow	Add Executions Panel for Agent & Workflow Observability Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !941 merge feature/executions into develop Add Executions Panel for Agent & Workflow Observability Created-by: aharonamir1 Commit-by: @aharonamir1 Merged-by: ZYQ5333 Description: What type of PR is this? /kind feature This is linked to issue [#770](https://gitcode.com/openJiuwen/agent-studio/issues/770) Self-checklist:（请自检，在[ ]内打上x，我们将检视你的完成情况，否则会导致pr无法合入） + - [ ] 设计：PR对应的方案是否已经经过Maintainer评审，方案检视意见是否均已答复并完成方案修改 + - [ ] 测试：PR中的代码是否已有UT/ST测试用例进行充分的覆盖，新增测试用例是否随本PR一并上库或已经上库 + - [ ] 验证：PR描述信息中是否已包含对该PR对应的Feature、Refactor、Bugfix的预期目标达成情况的详细验证结果描述 + - [ ] 接口：是否涉及对外接口变更，相应变更已得到接口评审组织的通过，API对应的注释信息已经刷新正确 + - [ ] 文档：是否涉及官网文档修改，如果涉及请及时提交资料到Doc仓 ![image.png](https://raw.gitcode.com/user-images/assets/8766979/2203d800-d221-4554-bd62-af8e41a5b042/image.png 'image.png') ## Summary - Add a new Executions page (`/dashboard/executions`) with split-panel UI: execution list on the left, waterfall timeline on the right - Support real-time monitoring of running workflows/agents with polling and visibility-aware lifecycle - Enable trace summary creation for completed workflows so they appear in the execution history - Display workflow/agent names instead of raw IDs in the execution list ## Backend Changes ### New API Endpoints (`routers/execution.py`) - `POST /get_all_trace_summaries` - Query `TraceSummaryDB` by space, optionally filtered by business type (WORKFLOW/AGENT), returns list with business names - `POST /list_active_executions` - Read from `WorkflowExecutionManager` in-memory registry, enriched with workflow names from `WorkflowBaseDB` - `POST /get_running_traces` - Find running traces via `TraceDetailDB` records that lack a completed `TraceSummaryDB` entry (subquery-based detection) ### Trace Summary for Workflows (`trace_summary_repository.py`) - `create_trace_summary_from_workflow_execution(trace_id)` - New method that reads completed workflow data from `WorkflowExecutionDB` and creates a corresponding `TraceSummaryDB` record. This is necessary because workflows save execution data to `WorkflowExecutionDB` (not `TraceDetailDB`), so the existing `create_trace_summary_by_trace_id` (which queries `TraceDetailDB`) could never find workflow data. - `_enrich_with_business_names(data_list, db)` - Batch lookup of workflow/agent names from `WorkflowBaseDB` and `AgentBaseDB`, adds `business_name` field to trace summary responses - `get_trace_summary_list_by_space()` and `get_running_traces_by_space()` now return `business_name` ### Workflow Runner (`workflow_runner.py`) - Call `trace_summary_repository.create_trace_summary_from_workflow_execution(trace_id)` after workflow completion in all code paths (success, `JiuWenExecuteException`, `BaseError`/`JiuWenGraphException`, generic `Exception`) - Extract `trace_id` from `trace_logs[0].trace_id` ### Workflow Execution Manager (`workflow_execution_manager.py`) - Added `start_time` tracking to `WorkflowExecutionRegistration` ### Schemas (`trace_summary.py`) - `TraceSummaryListBySpaceRequest` - Request model for space-level trace queries - `TraceSummaryBriefWithStatus` - Response model with `trace_id`, `business_id`, `business_name`, `business_type`, `create_time`, `duration`, `status` - `ActiveExecutionInfo` - Response model for in-memory active executions with `workflow_name` ## Frontend Changes ### New Components - `ExecutionsPage.tsx` - Main page with MUI Tabs (Workflows/Agents), split panel layout, polling with visibility API, merge of multiple data sources (completed traces + running traces + active executions) - `ExecutionList.tsx` - Left panel showing merged/sorted execution entries with name, status badge, timestamp, duration - `ExecutionWaterfall.tsx` - Right panel with horizontal timeline bars, time axis, color-coded nodes, DFS flattening of hierarchical execution info - `ExecutionStatusBadge.tsx` - Colored pill badges (green=Finished, blue+ping=Running, red=Error, orange=Interrupted) - `ExecutionNodeTooltip.tsx` - Dark tooltip with duration/inputs/outputs, text truncated to 300 chars ### API Client (`api-client`) - `executionPanelService.ts` - Service layer with methods for all three new endpoints plus existing debug endpoints; handles 404 as empty arrays - `types.ts` - Added `TraceSummaryBriefWithStatus` and `ActiveExecution` interfaces with `business_name`/`workflow_name` - `config.ts` - Added endpoint URL constants ### Routing & Navigation - `App.tsx` - Lazy import and route for `/dashboard/executions` - `SidebarNew.tsx` - Added Activity icon nav item - Locales - Added `"executions"` key in both `en-US.json` and `zh-CN.json` ## Key Design Decisions 1. Three data sources merged - Completed traces from `TraceSummaryDB`, running traces detected via `TraceDetailDB` subquery, and active workflows from in-memory `WorkflowExecutionManager`. This covers all execution states. 2. Workflow trace summary from WorkflowExecutionDB - Workflows don't write to `TraceDetailDB` (that code path was disabled). Instead of re-enabling it, we read from `WorkflowExecutionDB` which is already populated. 3. Visibility-aware polling - Polling pauses when browser tab is hidden via `document.visibilitychange`, preventing unnecessary API calls. 4. No-flicker polling - Detail polling passes `isPolling=true` to skip loading state, preventing UI flicker on refresh. ## Test Plan - [ ] Navigate to `/dashboard/executions` and verify the page loads with tabs - [ ] Run a workflow and verify it appears as "Running" in the Workflows tab - [ ] After workflow completes, verify it transitions to "Finished" in the list - [ ] Click on a completed execution and verify waterfall timeline renders - [ ] Hover over waterfall bars and verify tooltips show correct data - [ ] Switch to Agents tab and verify agent executions appear - [ ] Switch browser tabs and verify polling pauses (check Network tab) - [ ] Verify execution list shows workflow/agent names, not IDs See merge request: openJiuwen/agent-studio!941	30 天前
__init__.py	refactor: rename app directory to openjiuwen_studio refactor: rename app directory to openjiuwen_studio	5 个月前