code check fix
Co-authored-by: @aharonamir1<amir.aharon@huawei.com>
# message auto-generated for no-merge-commit merge:
!1066 merge feature/executions into develop
code check fix
Created-by: aharonamir1
Commit-by: @aharonamir1
Merged-by: ZYQ5333
Description: <!-- Thanks for sending a pull request! Here are some tips for you:
1) If this is your first time, please read our contributor guidelines: https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md
2) If you want to contribute your code but don't know who will review and merge, please add label openJiuwen-assistant to the pull request, we will find and do it as soon as possible.
-->
**What type of PR is this?**
/kind bug
**What does this PR do / why do we need it**:
---
Summary
- Show version (except 'draft') for execution
- Fix running execution elapsed time accuracy by computing elapsed_ms server-side and sending server_time_ms for client-server clock offset correction
- Fix stale waterfall showing previous execution's data during a new run
Changes
Backend
- workflow_runner.py — Re-enabled incremental save_trace_details() per span so running nodes appear in the waterfall as they complete
- agent_trace_utils.py — Same incremental write pattern for agent executions
- trace_summary_repository.py — get_running_traces_by_space now returns server-computed elapsed_ms and server_time_ms; get_trace_summary_by_trace_id falls back to live TraceDetailDB data
for in-progress traces
- execution.py — Removed response_model from /get_running_traces to allow server_time_ms and elapsed_ms through without Pydantic stripping
Frontend
- ExecutionsPage.tsx — Calculates timeOffset = Date.now() - server_time_ms on each poll; clears trace state on tab switch; auto-selects running > active > completed
- ExecutionList.tsx — Uses server elapsed_ms for running entries, applies timeOffset to correct clock skew for active entries, deduplicates active/running entries
- executionPanelService.ts / types.ts — Updated response types for new server_time_ms and elapsed_ms fields
Test plan
- Start a workflow execution — verify it appears as "Running" in the Workflows tab
- Start an agent execution — verify it appears as "Running" in the Agents tab
- While running, verify elapsed time is accurate (not 2x or skewed by clock difference)
- After completion, verify finished duration displays correctly (not a negative number)
- Switch tabs — verify no stale data from previous tab
- Completed executions (from old WorkflowExecutionDB) still display correctly
**Which issue(s) this PR fixes**:
Fixes [#868](https://gitcode.com/openJiuwen/agent-studio/issues/868)
**Self-checklist**:(**Please check carefully,and mark an x in the [] brackets. We will review your completion status.**)
+ - [ ] **Design**: Has the solution corresponding to the PR been reviewed by the Maintainer, and have all review comments been replied to and revised
+ - [x] **Test**: Has the code in the PR been fully covered by UT/ST test cases, and have the newly added test cases been uploaded to the repository along with this PR or already uploaded.
+ - [x] **Verification**: Does the PR description contains a detailed description of the verification results regarding the achievement of the expected goals for the Feature, Refactor, and Bugfix to this PR.
+ - [ ] **Interface**: Does it involve changes to external interfaces? The corresponding changes have been approved by the interface review organization, and the annotation information for the API has been correctly refreshed.
+ - [ ] **Document**: Does it involve modifications to the official website documentation? If so, please submit the materials to the Doc repository in a timely manner.
<!-- **Special notes for your reviewers**: -->
<!-- + - [ ] Whether it causes forward compatibility failure -->
<!-- + - [ ] Whether the dependent third-party library change is involved -->
See merge request: openJiuwen/agent-studio!1066
HTTP Node: fix get url from previous node
Co-authored-by: @aharonamir1<amir.aharon@huawei.com>
# message auto-generated for no-merge-commit merge:
!1068 merge http-body-params into develop
HTTP Node: fix get url from previous node
Created-by: aharonamir1
Commit-by: @aharonamir1
Merged-by: ZYQ5333
Description: <!-- Thanks for sending a pull request! Here are some tips for you:
1) If this is your first time, please read our contributor guidelines: https://gitcode.com/openJiuwen/community/blob/master/CONTRIBUTING.md
2) If you want to contribute your code but don't know who will review and merge, please add label openjiuwen-assistant to the pull request, we will find and do it as soon as possible.
-->
**What type of PR is this?**
/kind bug
Issue [#869](https://gitcode.com/openJiuwen/agent-studio/issues/869)
**Self-checklist**:(**请自检,在[ ]内打上x,我们将检视你的完成情况,否则会导致pr无法合入**)
+ - [ ] **设计**:PR对应的方案是否已经经过Maintainer评审,方案检视意见是否均已答复并完成方案修改
+ - [ ] **测试**:PR中的代码是否已有UT/ST测试用例进行充分的覆盖,新增测试用例是否随本PR一并上库或已经上库
+ - [ ] **验证**:PR描述信息中是否已包含对该PR对应的Feature、Refactor、Bugfix的预期目标达成情况的详细验证结果描述
+ - [ ] **接口**:是否涉及对外接口变更,相应变更已得到接口评审组织的通过,API对应的注释信息已经刷新正确
+ - [ ] **文档**:是否涉及官网文档修改,如果涉及请及时提交资料到Doc仓
<!-- **Special notes for your reviewers**: -->
<!-- + - [ ] 是否导致无法前向兼容 -->
<!-- + - [ ] 是否涉及依赖的三方库变更 -->
See merge request: openJiuwen/agent-studio!1068
feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI)
Co-authored-by: Michael<michael.atamuk@huawei.com>
Co-authored-by: adi_amir<adi.amir1@huawei.com>
Co-authored-by: nizzan<nizzan.kimhi@huawei.com>
Co-authored-by: @aharonamir1<amir.aharon@huawei.com>
# message auto-generated for no-merge-commit merge:
!1023 merge evaluation into develop
feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI)
Created-by: michaelhuawei
Commit-by: Michael;michaelhuawei;aharonamir1;@aharonamir1;nikita-mee;nizzan;adi_amir
Merged-by: ZYQ5333
Description: <!--
Thanks for sending a pull request!
Here are some tips for you:
1) If this is your first time, please read our contributor guidelines:
[https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md](https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md)
2) If you want to contribute your code but don't know who will review and merge,
please add label openJiuwen-assistant to the pull request.
-->
**What type of PR is this?**
/kind feature /kind refactor
---
## **What does this PR do / why do we need it**
This PR introduces the **Evaluation System for Agents and Workflows**, a major new module that provides first‑class, systematic evaluation capabilities across all OpenJiuwen workflow patterns. It enables teams to measure correctness, reliability, semantic quality, latency, token usage, and regression behavior for any workflow or agent.
The system solves three long‑standing gaps:
1. **No regression detection** — previously no structured way to verify that workflow changes preserved correctness.
2. **No comparative measurement** — no shared metrics to compare versions of agents/workflows.
3. **No sampling support** — LLM nondeterminism required multi‑trial evaluation, which did not exist.
This feature adds a complete backend + frontend evaluation pipeline, including suites, tasks, graders, metrics, benchmark loading, and a full results UI.
---
## **Which issue(s) this PR fixes**
Fixes #<issue-number>
---
## **What scenarios were tested, and what were the verification results**
**Functional verification**
- Created evaluation suites, added tasks, updated tasks, deleted tasks.
- Loaded all seven benchmark YAML files; validated correct task creation.
- Ran evaluation against workflows and agents with deterministic, model-based, and code-based graders.
- Verified pattern detection across all six structural patterns (Routing, Chaining, Parallelization, Orchestrator‑Worker, Evaluator‑Optimizer, Memory Usage).
- Confirmed correct grader behavior: deterministic checks, LLM judge calls, code-based execution, weight aggregation.
- Confirmed metrics engine correctness: pass/fail, pass@k, pass^k, score distribution, latency stats, token usage, reliability, per-grader breakdown.
- Verified custom aggregate metrics execution, including error handling.
- Confirmed run lifecycle: RUNNING → COMPLETED/FAILED, immutability of completed runs.
- Verified large-suite behavior (50 tasks × 5 trials) and UI rendering of large trace sets.
**Performance verification**
- Execution engine calls scale linearly with tasks × trials.
- Model-based graders correctly issue LLM judge calls per trial.
- No regressions to existing workflow/agent execution performance.
**Reliability verification**
- Flakiness metric validated using multi-trial runs.
- Pattern detection validated with synthetic traces and real workflows.
- Code-based grader error paths tested (exceptions, invalid returns).
**Frontend verification**
- Full CRUD for suites and tasks.
- Run dialog correctly configures workflow/agent target and trial count.
- Results UI renders Overview, Metrics, Graders, and Traces tabs with correct visibility rules.
- Zustand store state transitions validated.
---
## **Self-checklist**
+ - [x] **Design**: Reviewed with maintainers; all comments addressed.
+ - [x] **Test**: Full UT/ST coverage for harness, graders, metrics, pattern validator, API, and frontend store.
+ - [x] **Verification**: PR description includes detailed verification results for feature, refactor, and bugfix aspects.
+ - [x] **Interface**: Adds new external API endpoints under /evaluation; no breaking changes to existing interfaces.
+ - [x] **Document**: Benchmark usage, suite/task schema, and evaluation workflow documented; docs PR prepared separately.
---
## **Special notes for reviewers**
- This module is **fully additive** — no existing endpoints or execution logic are modified.
- Code-based graders and custom metrics use exec(); this is an accepted constraint for v1 and will be hardened later.
- Large evaluation runs can produce multi‑MB result payloads; pagination is planned for a future release.
- Pattern detection is heuristic; tasks may override pattern_type explicitly.
See merge request: openJiuwen/agent-studio!1023
fix(mcp-stdio): correct discovery and invocation logic for stdio MCP plugins
Co-authored-by: michaelhuawei<michael.atamuk@huawei.com>
# message auto-generated for no-merge-commit merge:
!1074 merge fix/mcp-stdio-plugin-params into develop
fix(mcp-stdio): correct discovery and invocation logic for stdio MCP plugins
Created-by: michaelhuawei
Commit-by: michaelhuawei
Merged-by: ZYQ5333
Description: **What type of PR is this?**
/kind bug
**What does this PR do / why do we need it**:
This PR fixes **stdio MCP plugin discovery and invocation**, which were still broken even after the fix for #835.
The root cause was that the backend constructed incorrect params for stdio plugins, and the invocation path attempted to execute the .py script directly instead of running it via Python.
---
## ✔ 1. Fix: Stdio discovery was broken
### **Root cause**
_build_safe_stdio_params incorrectly used: args = [config.url]
But for stdio plugins:
- config.url is always ""
- The actual script path is stored in:
- config.params["command"]
- config.params["args"]
- config.params["env"]
### **Fix**
Discovery now uses:
script_path = params["command"] or config.url or ""
args = params["args"]
env = params["env"]
This matches what StdioClient expects.
---
## ✔ 2. Fix: Stdio invocation was broken
### **Root cause**
plugin_tools.py passed: command = params["command"] # e.g. "/path/to/server.py"
This caused:
because the .py file was executed directly instead of via Python.
### **Fix**
Invocation now treats:
- params["command"] as the **script path**
- sys.executable as the **actual executable**
Updated logic:
script_path = params["command"]
extra_args = params["args"]
mcp_params["command"] = sys.executable
mcp_params["args"] = [script_path] + extra_args
Now the process launches as: python /path/to/server.py "arg1" "arg2"
---
## 🎉 Result
- Stdio plugin **discovery works**
- Stdio plugin **invocation works**
- Both discovery and execution now correctly use:
- Python interpreter
- Script path from DB
- Args and env from DB
This completes the stdio MCP plugin support.
---
**Which issue(s) this PR fixes**:
Follow‑up to [#835](https://gitcode.com/openJiuwen/agent-studio/issues/835)
---
**Code review checklist**:
+ - [ ] whether to verify the function's return value
+ - [ ] Whether to comply with **SOLID principle / Demeter's law**
+ - [ ] Whether there is UT test case && the test case is valid (if no test case, explain why)
+ - [ ] Whether the API change is involved
+ - [ ] Whether official document modification is involved
See merge request: openJiuwen/agent-studio!1074
Add Executions Panel for Agent & Workflow Observability
Co-authored-by: @aharonamir1<amir.aharon@huawei.com>
# message auto-generated for no-merge-commit merge:
!941 merge feature/executions into develop
Add Executions Panel for Agent & Workflow Observability
Created-by: aharonamir1
Commit-by: @aharonamir1
Merged-by: ZYQ5333
Description: **What type of PR is this?**
/kind feature
This is linked to issue [#770](https://gitcode.com/openJiuwen/agent-studio/issues/770)
**Self-checklist**:(**请自检,在[ ]内打上x,我们将检视你的完成情况,否则会导致pr无法合入**)
+ - [ ] **设计**:PR对应的方案是否已经经过Maintainer评审,方案检视意见是否均已答复并完成方案修改
+ - [ ] **测试**:PR中的代码是否已有UT/ST测试用例进行充分的覆盖,新增测试用例是否随本PR一并上库或已经上库
+ - [ ] **验证**:PR描述信息中是否已包含对该PR对应的Feature、Refactor、Bugfix的预期目标达成情况的详细验证结果描述
+ - [ ] **接口**:是否涉及对外接口变更,相应变更已得到接口评审组织的通过,API对应的注释信息已经刷新正确
+ - [ ] **文档**:是否涉及官网文档修改,如果涉及请及时提交资料到Doc仓

## Summary
- Add a new Executions page (/dashboard/executions) with split-panel UI: execution list on the left, waterfall timeline on the right
- Support real-time monitoring of running workflows/agents with polling and visibility-aware lifecycle
- Enable trace summary creation for completed workflows so they appear in the execution history
- Display workflow/agent names instead of raw IDs in the execution list
## Backend Changes
### New API Endpoints (routers/execution.py)
- **POST /get_all_trace_summaries** - Query TraceSummaryDB by space, optionally filtered by business type (WORKFLOW/AGENT), returns list with business names
- **POST /list_active_executions** - Read from WorkflowExecutionManager in-memory registry, enriched with workflow names from WorkflowBaseDB
- **POST /get_running_traces** - Find running traces via TraceDetailDB records that lack a completed TraceSummaryDB entry (subquery-based detection)
### Trace Summary for Workflows (trace_summary_repository.py)
- **create_trace_summary_from_workflow_execution(trace_id)** - New method that reads completed workflow data from WorkflowExecutionDB and creates a corresponding TraceSummaryDB record. This is necessary because workflows save execution data to WorkflowExecutionDB (not TraceDetailDB), so the existing create_trace_summary_by_trace_id (which queries TraceDetailDB) could never find workflow data.
- **_enrich_with_business_names(data_list, db)** - Batch lookup of workflow/agent names from WorkflowBaseDB and AgentBaseDB, adds business_name field to trace summary responses
- **get_trace_summary_list_by_space()** and **get_running_traces_by_space()** now return business_name
### Workflow Runner (workflow_runner.py)
- Call trace_summary_repository.create_trace_summary_from_workflow_execution(trace_id) after workflow completion in all code paths (success, JiuWenExecuteException, BaseError/JiuWenGraphException, generic Exception)
- Extract trace_id from trace_logs[0].trace_id
### Workflow Execution Manager (workflow_execution_manager.py)
- Added start_time tracking to WorkflowExecutionRegistration
### Schemas (trace_summary.py)
- **TraceSummaryListBySpaceRequest** - Request model for space-level trace queries
- **TraceSummaryBriefWithStatus** - Response model with trace_id, business_id, business_name, business_type, create_time, duration, status
- **ActiveExecutionInfo** - Response model for in-memory active executions with workflow_name
## Frontend Changes
### New Components
- **ExecutionsPage.tsx** - Main page with MUI Tabs (Workflows/Agents), split panel layout, polling with visibility API, merge of multiple data sources (completed traces + running traces + active executions)
- **ExecutionList.tsx** - Left panel showing merged/sorted execution entries with name, status badge, timestamp, duration
- **ExecutionWaterfall.tsx** - Right panel with horizontal timeline bars, time axis, color-coded nodes, DFS flattening of hierarchical execution info
- **ExecutionStatusBadge.tsx** - Colored pill badges (green=Finished, blue+ping=Running, red=Error, orange=Interrupted)
- **ExecutionNodeTooltip.tsx** - Dark tooltip with duration/inputs/outputs, text truncated to 300 chars
### API Client (api-client)
- **executionPanelService.ts** - Service layer with methods for all three new endpoints plus existing debug endpoints; handles 404 as empty arrays
- **types.ts** - Added TraceSummaryBriefWithStatus and ActiveExecution interfaces with business_name/workflow_name
- **config.ts** - Added endpoint URL constants
### Routing & Navigation
- **App.tsx** - Lazy import and route for /dashboard/executions
- **SidebarNew.tsx** - Added Activity icon nav item
- **Locales** - Added "executions" key in both en-US.json and zh-CN.json
## Key Design Decisions
1. **Three data sources merged** - Completed traces from TraceSummaryDB, running traces detected via TraceDetailDB subquery, and active workflows from in-memory WorkflowExecutionManager. This covers all execution states.
2. **Workflow trace summary from WorkflowExecutionDB** - Workflows don't write to TraceDetailDB (that code path was disabled). Instead of re-enabling it, we read from WorkflowExecutionDB which is already populated.
3. **Visibility-aware polling** - Polling pauses when browser tab is hidden via document.visibilitychange, preventing unnecessary API calls.
4. **No-flicker polling** - Detail polling passes isPolling=true to skip loading state, preventing UI flicker on refresh.
## Test Plan
- [ ] Navigate to /dashboard/executions and verify the page loads with tabs
- [ ] Run a workflow and verify it appears as "Running" in the Workflows tab
- [ ] After workflow completes, verify it transitions to "Finished" in the list
- [ ] Click on a completed execution and verify waterfall timeline renders
- [ ] Hover over waterfall bars and verify tooltips show correct data
- [ ] Switch to Agents tab and verify agent executions appear
- [ ] Switch browser tabs and verify polling pauses (check Network tab)
- [ ] Verify execution list shows workflow/agent names, not IDs
See merge request: openJiuwen/agent-studio!941