agent-studio/backend/openjiuwen_studio/routers · openJiuwen/agent-studio - AtomGit

文件	最后提交记录	最后更新时间
__init__.py	feat: add VLM model management and reports Co-authored-by: yanx1n<yanx1n@noreply.gitcode.com> # message auto-generated for no-merge-commit merge: !998 feat: add VLM model management and reports From: @yanx1n Reviewed-by: @xiaoyao42, @cyz95 See merge request: openJiuwen/agent-studio!998	1 个月前
agents.py	feature: studio适配runtime-重构前版本 Co-authored-by: 冯浩<fenghao55@h-partners.com> !843 feature: studio适配runtime-重构前版本 From: @michealswhite Reviewed-by: @xiaoyao42, @ZYQ5333 See merge request: openJiuwen/agent-studio!843	2 个月前
auth.py	feat(connect): introduce multi‑platform Channels and MCP server for agent and workflow execution across messaging apps and AI assistants Co-authored-by: Michael<michael.atamuk@huawei.com> Co-authored-by: adi_amir<adi.amir1@huawei.com> Co-authored-by: nizzan<nizzan.kimhi@huawei.com> Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !1018 merge operating_from_external_platforms into develop feat(connect): introduce multi‑platform Channels and MCP server for agent and workflow execution across messaging apps and AI assistants Created-by: michaelhuawei Commit-by: michaelhuawei;Michael;Nikita Merkulov;aharonamir1;@aharonamir1;nikita-mee;nizzan;adi_amir Merged-by: ZYQ5333 Description: What type of PR is this? /kind feature --- What does this PR do / why do we need it: * Introduces the new `connect/` module, which makes OpenJiuwen accessible outside the browser in two ways: * Channels (for people): 14 platform adapters that let users run agents and workflows directly from the tools they already use (Telegram, Slack, Discord, WhatsApp, Teams, Email, SMS, voice assistants, CLI, webhooks, etc.), without opening the web UI. * MCP server (for AI assistants): an MCP stdio server that exposes OpenJiuwen agents and workflows as tools to AI assistants such as Claude Desktop and JiuwenClaw, so they can autonomously discover, inspect, and run them during reasoning. * Separates platform-specific concerns (adapters) from OpenJiuwen business logic (shared client library), improving maintainability and making it easy to add new platforms or tools. * Keeps the OpenJiuwen backend unchanged: the connect layer uses only existing REST APIs and Bearer token auth, so this is a purely additive “reach and accessibility” feature with no impact on existing web UI users. --- Which issue(s) this PR fixes: Fixes [#811](https://gitcode.com/openJiuwen/agent-studio/issues/811) --- What scenarios were tested, and what were the verification results（Function, performance, reliability, etc.）： * Client library (functional correctness): * Unit tests for `OpenJiuwenClient`, login flow, token verification/refresh, SSE-based execution of agents and workflows, and parsing of agent/workflow results. * Verified that `verify_and_refresh()` correctly handles: valid tokens, 401 + successful refresh, 401 + failed refresh (forces re-login), and non-401 errors (optimistic pass). * MCP server (AI assistants using OpenJiuwen): * 36 unit tests for the 9 MCP tools, mocking at the `client.` boundary to verify: Tool discovery (`tools/list`) exposes correct schemas. * Workflow and agent search, inspection, and execution map correctly to backend APIs. * Errors (401, network failures, timeouts) are surfaced as clear tool error messages. * Manual end-to-end tests with a local MCP client to confirm that multiple tool calls share a single stdio session and reuse the same backend token. * Channels — platform adapters (integration smoke tests): * Telegram / Slack / Discord (outbound): * Started each adapter against a test backend. * Verified login, listing/searching agents and workflows, running an agent, and running a workflow with interactive parameter collection. * Confirmed Telegram `ConversationHandler` correctly manages login and workflow param collection, including cancel and validation error paths. * Webhook-style platforms (WhatsApp, Teams, Twilio, GitHub, Messenger, WeChat, Google Assistant, Alexa): * Deployed adapters behind a public HTTPS URL (ngrok in dev). * Verified signature validation / platform-specific auth, basic commands (start/help), and a full workflow execution round-trip. * Confirmed platform-specific constraints are respected (e.g., response length truncation for SMS/Messenger, timeouts for voice platforms). * CLI / Email / HTTP Webhook: * CLI: ran agents and workflows interactively, including multi-step param collection and health checks. * Email: verified IMAP polling, threading via `In-Reply-To`, and that replies are correctly associated with prior conversations. * HTTP Webhook: verified stateless execution with optional `WEBHOOK_API_KEY` guard. * Performance & reliability: * Confirmed that each user command results in a small, bounded number of backend calls (typically 1–3). * Verified SSE execution enforces a 120-second timeout and returns clear timeout errors to users when exceeded. * For Telegram, validated that the per-user `asyncio.Lock` prevents race conditions during concurrent token refresh attempts. --- Self-checklist:（Please check carefully,and mark an x in the [] brackets. We will review your completion status.） + - [x] Design: Has the solution corresponding to the PR been reviewed by the Maintainer, and have all review comments been replied to and revised + - [x] Test: Has the code in the PR been fully covered by UT/ST test cases, and have the newly added test cases been uploaded to the repository along with this PR or already uploaded. + - [x] Verification: Does the PR description contains a detailed description of the verification results regarding the achievement of the expected goals for the Feature, Refactor, and Bugfix to this PR. + - [ ] Interface: Does it involve changes to external interfaces? The corresponding changes have been approved by the interface review organization, and the annotation information for the API has been correctly refreshed. + - [x] Document: Does it involve modifications to the official website documentation? If so, please submit the materials to the Doc repository in a timely manner. (Notes: * No backend API changes are introduced—only new consumers of existing REST endpoints. * Product/Docs updates for “OpenJiuwen Anywhere / Connect” and Channels + MCP usage are prepared separately and can be linked to this PR.) See merge request: openJiuwen/agent-studio!1018	28 天前
auth_new.py	feat:支持通过账号密码获取access_token Co-authored-by: chenchong<chenchong34@huawei.com> # message auto-generated for no-merge-commit merge: !852 feat:支持通过账号密码获取access_token From: @Chenchong_RD Reviewed-by: @cyz95, @ZYQ5333 See merge request: openJiuwen/agent-studio!852	2 个月前
common.py	refactor: rename app directory to openjiuwen_studio refactor: rename app directory to openjiuwen_studio	5 个月前
deepsearch.py	fix: 解决deepsearch模板上传报HTTP 500的问题 Co-authored-by: jinduoxia<chenjingheng@huawei.com> # message auto-generated for no-merge-commit merge: !1043 fix: 解决deepsearch模板上传报HTTP 500的问题 From: @cjh_jinduoxia Reviewed-by: @cyz95, @xiaoyao42 See merge request: openJiuwen/agent-studio!1043	23 天前
deepsearch_knowledge_base.py	feat: 增加同步知识库至DeepSearch功能 Co-authored-by: Mmmmroy<le.zhang1@h-partners.com> # message auto-generated for no-merge-commit merge: !846 feat: 增加同步知识库至DeepSearch功能 From: @Mmmmroy Reviewed-by: @xiaoyao42, @ZYQ5333 See merge request: openJiuwen/agent-studio!846	2 个月前
deepsearch_logger.py	fix: deepsearch报告编辑功能修复 Co-authored-by: jinduoxia<chenjingheng@huawei.com> # message auto-generated for no-merge-commit merge: !1031 fix: deepsearch报告编辑功能修复 From: @cjh_jinduoxia Reviewed-by: @xiaoyao42, @gallonH See merge request: openJiuwen/agent-studio!1031	28 天前
embedding_models.py	fix: Logs are uniformly managed using the core logging module. Co-authored-by: daifeiping<daifeiping@huawei.com> # message auto-generated for no-merge-commit merge: !305 merge develop into develop fix: Logs are uniformly managed using the core logging module. Created-by: daifeiping Commit-by: daifeiping Merged-by: openJiuwen-bot Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: https://gitcode.com/openJiuwen/community/blob/master/CONTRIBUTING.md 2) If you want to contribute your code but don't know who will review and merge, please add label `openjiuwen-assistant` to the pull request, we will find and do it as soon as possible. --> What type of PR is this? <!-- 选择下面一种标签替换下方 `/kind <label>`，可选标签类型有： - /kind bug - /kind task - /kind feature - /kind refactor - /kind clean_code 如PR描述不符合规范，修改PR描述后需要/check-pr重新检查PR规范。 --> /kind <label> Self-checklist:（请自检，在[ ]内打上x，我们将检视你的完成情况，否则会导致pr无法合入） + - [x] 设计：PR对应的方案是否已经经过Maintainer评审，方案检视意见是否均已答复并完成方案修改 + - [x] 测试：PR中的代码是否已有UT/ST测试用例进行充分的覆盖，新增测试用例是否随本PR一并上库或已经上库 + - [x] 验证：PR描述信息中是否已包含对该PR对应的Feature、Refactor、Bugfix的预期目标达成情况的详细验证结果描述 + - [x] 接口：是否涉及对外接口变更，相应变更已得到接口评审组织的通过，API对应的注释信息已经刷新正确 + - [x] 文档：是否涉及官网文档修改，如果涉及请及时提交资料到Doc仓 <!-- Special notes for your reviewers: --> <!-- + - [ ] 是否导致无法前向兼容 --> <!-- + - [ ] 是否涉及依赖的三方库变更 --> See merge request: openJiuwen/agent-studio!305	4 个月前
evaluation.py	feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI) Co-authored-by: Michael<michael.atamuk@huawei.com> Co-authored-by: adi_amir<adi.amir1@huawei.com> Co-authored-by: nizzan<nizzan.kimhi@huawei.com> Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !1023 merge evaluation into develop feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI) Created-by: michaelhuawei Commit-by: Michael;michaelhuawei;aharonamir1;@aharonamir1;nikita-mee;nizzan;adi_amir Merged-by: ZYQ5333 Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: [https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md](https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md) 2) If you want to contribute your code but don't know who will review and merge, please add label `openJiuwen-assistant` to the pull request. --> What type of PR is this? /kind feature /kind refactor --- ## What does this PR do / why do we need it This PR introduces the Evaluation System for Agents and Workflows, a major new module that provides first‑class, systematic evaluation capabilities across all OpenJiuwen workflow patterns. It enables teams to measure correctness, reliability, semantic quality, latency, token usage, and regression behavior for any workflow or agent. The system solves three long‑standing gaps: 1. No regression detection — previously no structured way to verify that workflow changes preserved correctness. 2. No comparative measurement — no shared metrics to compare versions of agents/workflows. 3. No sampling support — LLM nondeterminism required multi‑trial evaluation, which did not exist. This feature adds a complete backend + frontend evaluation pipeline, including suites, tasks, graders, metrics, benchmark loading, and a full results UI. --- ## Which issue(s) this PR fixes Fixes #<issue-number> --- ## What scenarios were tested, and what were the verification results Functional verification - Created evaluation suites, added tasks, updated tasks, deleted tasks. - Loaded all seven benchmark YAML files; validated correct task creation. - Ran evaluation against workflows and agents with deterministic, model-based, and code-based graders. - Verified pattern detection across all six structural patterns (Routing, Chaining, Parallelization, Orchestrator‑Worker, Evaluator‑Optimizer, Memory Usage). - Confirmed correct grader behavior: deterministic checks, LLM judge calls, code-based execution, weight aggregation. - Confirmed metrics engine correctness: pass/fail, pass@k, pass^k, score distribution, latency stats, token usage, reliability, per-grader breakdown. - Verified custom aggregate metrics execution, including error handling. - Confirmed run lifecycle: RUNNING → COMPLETED/FAILED, immutability of completed runs. - Verified large-suite behavior (50 tasks × 5 trials) and UI rendering of large trace sets. Performance verification - Execution engine calls scale linearly with tasks × trials. - Model-based graders correctly issue LLM judge calls per trial. - No regressions to existing workflow/agent execution performance. Reliability verification - Flakiness metric validated using multi-trial runs. - Pattern detection validated with synthetic traces and real workflows. - Code-based grader error paths tested (exceptions, invalid returns). Frontend verification - Full CRUD for suites and tasks. - Run dialog correctly configures workflow/agent target and trial count. - Results UI renders Overview, Metrics, Graders, and Traces tabs with correct visibility rules. - Zustand store state transitions validated. --- ## Self-checklist + - [x] Design: Reviewed with maintainers; all comments addressed. + - [x] Test: Full UT/ST coverage for harness, graders, metrics, pattern validator, API, and frontend store. + - [x] Verification: PR description includes detailed verification results for feature, refactor, and bugfix aspects. + - [x] Interface: Adds new external API endpoints under `/evaluation`; no breaking changes to existing interfaces. + - [x] Document: Benchmark usage, suite/task schema, and evaluation workflow documented; docs PR prepared separately. --- ## Special notes for reviewers - This module is fully additive — no existing endpoints or execution logic are modified. - Code-based graders and custom metrics use `exec()`; this is an accepted constraint for v1 and will be hardened later. - Large evaluation runs can produce multi‑MB result payloads; pagination is planned for a future release. - Pattern detection is heuristic; tasks may override `pattern_type` explicitly. See merge request: openJiuwen/agent-studio!1023	28 天前
execution.py	code check fix Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !1066 merge feature/executions into develop code check fix Created-by: aharonamir1 Commit-by: @aharonamir1 Merged-by: ZYQ5333 Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md 2) If you want to contribute your code but don't know who will review and merge, please add label `openJiuwen-assistant` to the pull request, we will find and do it as soon as possible. --> What type of PR is this? /kind bug What does this PR do / why do we need it: --- Summary - Show version (except 'draft') for execution - Fix running execution elapsed time accuracy by computing elapsed_ms server-side and sending server_time_ms for client-server clock offset correction - Fix stale waterfall showing previous execution's data during a new run Changes Backend - workflow_runner.py — Re-enabled incremental save_trace_details() per span so running nodes appear in the waterfall as they complete - agent_trace_utils.py — Same incremental write pattern for agent executions - trace_summary_repository.py — get_running_traces_by_space now returns server-computed elapsed_ms and server_time_ms; get_trace_summary_by_trace_id falls back to live TraceDetailDB data for in-progress traces - execution.py — Removed response_model from /get_running_traces to allow server_time_ms and elapsed_ms through without Pydantic stripping Frontend - ExecutionsPage.tsx — Calculates timeOffset = Date.now() - server_time_ms on each poll; clears trace state on tab switch; auto-selects running > active > completed - ExecutionList.tsx — Uses server elapsed_ms for running entries, applies timeOffset to correct clock skew for active entries, deduplicates active/running entries - executionPanelService.ts / types.ts — Updated response types for new server_time_ms and elapsed_ms fields Test plan - Start a workflow execution — verify it appears as "Running" in the Workflows tab - Start an agent execution — verify it appears as "Running" in the Agents tab - While running, verify elapsed time is accurate (not 2x or skewed by clock difference) - After completion, verify finished duration displays correctly (not a negative number) - Switch tabs — verify no stale data from previous tab - Completed executions (from old WorkflowExecutionDB) still display correctly Which issue(s) this PR fixes: Fixes [#868](https://gitcode.com/openJiuwen/agent-studio/issues/868) Self-checklist:（Please check carefully,and mark an x in the [] brackets. We will review your completion status.） + - [ ] Design: Has the solution corresponding to the PR been reviewed by the Maintainer, and have all review comments been replied to and revised + - [x] Test: Has the code in the PR been fully covered by UT/ST test cases, and have the newly added test cases been uploaded to the repository along with this PR or already uploaded. + - [x] Verification: Does the PR description contains a detailed description of the verification results regarding the achievement of the expected goals for the Feature, Refactor, and Bugfix to this PR. + - [ ] Interface: Does it involve changes to external interfaces? The corresponding changes have been approved by the interface review organization, and the annotation information for the API has been correctly refreshed. + - [ ] Document: Does it involve modifications to the official website documentation? If so, please submit the materials to the Doc repository in a timely manner. <!-- Special notes for your reviewers: --> <!-- + - [ ] Whether it causes forward compatibility failure --> <!-- + - [ ] Whether the dependent third-party library change is involved --> See merge request: openJiuwen/agent-studio!1066	16 天前
knowledge_base.py	fix: 修复weblink知识库同步功能 Co-authored-by: Mmmmroy<le.zhang1@h-partners.com> # message auto-generated for no-merge-commit merge: !1048 merge fix/weblink_sync into develop fix: 修复weblink知识库同步功能 Created-by: Mmmmroy Commit-by: Mmmmroy Merged-by: ZYQ5333 Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: https://gitcode.com/openJiuwen/community/blob/master/CONTRIBUTING.md 2) If you want to contribute your code but don't know who will review and merge, please add label `openjiuwen-assistant` to the pull request, we will find and do it as soon as possible. --> What type of PR is this? <!-- 选择下面一种标签替换下方 `/kind <label>`，可选标签类型有： - /kind bug - /kind task - /kind feature - /kind refactor - /kind clean_code 如PR描述不符合规范，修改PR描述后需要/check-pr重新检查PR规范。 --> /kind bug Self-checklist:（请自检，在[ ]内打上x，我们将检视你的完成情况，否则会导致pr无法合入） + - [ ] 设计：PR对应的方案是否已经经过Maintainer评审，方案检视意见是否均已答复并完成方案修改 + - [ ] 测试：PR中的代码是否已有UT/ST测试用例进行充分的覆盖，新增测试用例是否随本PR一并上库或已经上库 + - [ ] 验证：PR描述信息中是否已包含对该PR对应的Feature、Refactor、Bugfix的预期目标达成情况的详细验证结果描述 + - [ ] 接口：是否涉及对外接口变更，相应变更已得到接口评审组织的通过，API对应的注释信息已经刷新正确 + - [ ] 文档：是否涉及官网文档修改，如果涉及请及时提交资料到Doc仓 <!-- Special notes for your reviewers: --> <!-- + - [ ] 是否导致无法前向兼容 --> <!-- + - [ ] 是否涉及依赖的三方库变更 --> See merge request: openJiuwen/agent-studio!1048	22 天前
memory_base.py	feat(memory):add memory_base backend Co-authored-by: zhangmengyin<zhangmengyin2@huawei.com> # message auto-generated for no-merge-commit merge: !496 feat(memory):add memory_base backend From: @zhangmengyin Reviewed-by: @ZYQ5333, @xiaoyao42 See merge request: openJiuwen/agent-studio!496	3 个月前
models.py	feat:增加工作流导出dsl信息 Co-authored-by: wudawei<wudawei6@h-partners.com> # message auto-generated for no-merge-commit merge: !901 feat:增加工作流导出dsl信息 From: @w1101627533 Reviewed-by: @xiaoyao42, @ZYQ5333 See merge request: openJiuwen/agent-studio!901	2 个月前
plugin.py	fix(mcp-openapi): plugin creation fails when creating OpenAPI MCP plugin in plugin management Co-authored-by: michaelhuawei<michael.atamuk@huawei.com> # message auto-generated for no-merge-commit merge: !1036 merge fix/mcp-openapi-plugin-creation-838 into develop fix(mcp-openapi): plugin creation fails when creating OpenAPI MCP plugin in plugin management Created-by: michaelhuawei Commit-by: Michael;michaelhuawei Merged-by: ZYQ5333 Description: What type of PR is this? /kind bug --- What does this PR do / why do we need it: This PR fixes Bug #838, where creating an OpenAPI MCP Plugin in the Plugin Management UI resulted in a plugin creation error. The root cause was incorrect URL/file‑path validation across backend and frontend layers. The fixes include: ### Backend - `validate_plugin_url` now strips whitespace and handles empty values safely. - `_validate_openapi_paths` rewritten to correctly support: - Multiple comma‑separated entries - Both URLs and local file paths - SSRF‑safe URL validation via `validate_plugin_url` - File‑path existence checks without over‑restrictive “safe root” rules - `plugin_create` now validates URLs only when the input is actually a URL, not a file path. - `PluginCreate` schema (`model_post_init`) updated: - OPENAPI transport accepts URL or file path - SSE / Streamable HTTP transports require URL only - Clearer error messages for invalid combinations ### Frontend - `isValidUrl` updated to correctly bypass URL parsing for file paths. - `isUrlFieldValid` updated so OpenAPI transport accepts both URLs and file paths. - Prevents false validation errors in the MCP Plugin creation dialog. These changes ensure that OpenAPI MCP plugins can be created successfully using either remote OpenAPI URLs or local OpenAPI spec files. --- Which issue(s) this PR fixes: Fixes #838 --- Code review checklist: + - [ ] whether to verify the function's return value + - [ ] Whether to comply with SOLID principle / Demeter's law + - [ ] Whether there is UT test case && the test case is valid (if no test case, explain why) + - [ ] Whether the API change is involved + - [ ] Whether official document modification is involved See merge request: openJiuwen/agent-studio!1036	23 天前
prompt_debug_router.py	fix:修复space_id因为类型转换导致丢失前导0的问题 Co-authored-by: wangxin<wangxin375@huawei.com> # message auto-generated for no-merge-commit merge: !1014 fix:修复space_id因为类型转换导致丢失前导0的问题 From: @programmegirl Reviewed-by: @cyz95, @xiaoyao42 See merge request: openJiuwen/agent-studio!1014	1 个月前
prompt_llm_router.py	fix: 漏洞修复 Co-authored-by: 冯浩<fenghao55@h-partners.com> # message auto-generated for no-merge-commit merge: !744 fix: 漏洞修复 From: @michealswhite Reviewed-by: @ZYQ5333, @gallonH See merge request: openJiuwen/agent-studio!744	3 个月前
prompt_router.py	fix:修复space_id因为类型转换导致丢失前导0的问题 Co-authored-by: wangxin<wangxin375@huawei.com> # message auto-generated for no-merge-commit merge: !1014 fix:修复space_id因为类型转换导致丢失前导0的问题 From: @programmegirl Reviewed-by: @cyz95, @xiaoyao42 See merge request: openJiuwen/agent-studio!1014	1 个月前
prompt_tuning_router.py	fix: 漏洞修复 Co-authored-by: 冯浩<fenghao55@h-partners.com> # message auto-generated for no-merge-commit merge: !744 fix: 漏洞修复 From: @michealswhite Reviewed-by: @ZYQ5333, @gallonH See merge request: openJiuwen/agent-studio!744	3 个月前
register.py	feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI) Co-authored-by: Michael<michael.atamuk@huawei.com> Co-authored-by: adi_amir<adi.amir1@huawei.com> Co-authored-by: nizzan<nizzan.kimhi@huawei.com> Co-authored-by: @aharonamir1<amir.aharon@huawei.com> # message auto-generated for no-merge-commit merge: !1023 merge evaluation into develop feat(evaluation): introduce full evaluation system for agents & workflows (suites, tasks, graders, metrics, benchmarks, UI) Created-by: michaelhuawei Commit-by: Michael;michaelhuawei;aharonamir1;@aharonamir1;nikita-mee;nizzan;adi_amir Merged-by: ZYQ5333 Description: <!-- Thanks for sending a pull request! Here are some tips for you: 1) If this is your first time, please read our contributor guidelines: [https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md](https://gitcode.com/openJiuwen/openJiuwen/blob/master/CONTRIBUTING.md) 2) If you want to contribute your code but don't know who will review and merge, please add label `openJiuwen-assistant` to the pull request. --> What type of PR is this? /kind feature /kind refactor --- ## What does this PR do / why do we need it This PR introduces the Evaluation System for Agents and Workflows, a major new module that provides first‑class, systematic evaluation capabilities across all OpenJiuwen workflow patterns. It enables teams to measure correctness, reliability, semantic quality, latency, token usage, and regression behavior for any workflow or agent. The system solves three long‑standing gaps: 1. No regression detection — previously no structured way to verify that workflow changes preserved correctness. 2. No comparative measurement — no shared metrics to compare versions of agents/workflows. 3. No sampling support — LLM nondeterminism required multi‑trial evaluation, which did not exist. This feature adds a complete backend + frontend evaluation pipeline, including suites, tasks, graders, metrics, benchmark loading, and a full results UI. --- ## Which issue(s) this PR fixes Fixes #<issue-number> --- ## What scenarios were tested, and what were the verification results Functional verification - Created evaluation suites, added tasks, updated tasks, deleted tasks. - Loaded all seven benchmark YAML files; validated correct task creation. - Ran evaluation against workflows and agents with deterministic, model-based, and code-based graders. - Verified pattern detection across all six structural patterns (Routing, Chaining, Parallelization, Orchestrator‑Worker, Evaluator‑Optimizer, Memory Usage). - Confirmed correct grader behavior: deterministic checks, LLM judge calls, code-based execution, weight aggregation. - Confirmed metrics engine correctness: pass/fail, pass@k, pass^k, score distribution, latency stats, token usage, reliability, per-grader breakdown. - Verified custom aggregate metrics execution, including error handling. - Confirmed run lifecycle: RUNNING → COMPLETED/FAILED, immutability of completed runs. - Verified large-suite behavior (50 tasks × 5 trials) and UI rendering of large trace sets. Performance verification - Execution engine calls scale linearly with tasks × trials. - Model-based graders correctly issue LLM judge calls per trial. - No regressions to existing workflow/agent execution performance. Reliability verification - Flakiness metric validated using multi-trial runs. - Pattern detection validated with synthetic traces and real workflows. - Code-based grader error paths tested (exceptions, invalid returns). Frontend verification - Full CRUD for suites and tasks. - Run dialog correctly configures workflow/agent target and trial count. - Results UI renders Overview, Metrics, Graders, and Traces tabs with correct visibility rules. - Zustand store state transitions validated. --- ## Self-checklist + - [x] Design: Reviewed with maintainers; all comments addressed. + - [x] Test: Full UT/ST coverage for harness, graders, metrics, pattern validator, API, and frontend store. + - [x] Verification: PR description includes detailed verification results for feature, refactor, and bugfix aspects. + - [x] Interface: Adds new external API endpoints under `/evaluation`; no breaking changes to existing interfaces. + - [x] Document: Benchmark usage, suite/task schema, and evaluation workflow documented; docs PR prepared separately. --- ## Special notes for reviewers - This module is fully additive — no existing endpoints or execution logic are modified. - Code-based graders and custom metrics use `exec()`; this is an accepted constraint for v1 and will be hardened later. - Large evaluation runs can produce multi‑MB result payloads; pagination is planned for a future release. - Pattern detection is heuristic; tasks may override `pattern_type` explicitly. See merge request: openJiuwen/agent-studio!1023	28 天前
related_member.py	refactor: rename app directory to openjiuwen_studio refactor: rename app directory to openjiuwen_studio	5 个月前
runtimes.py	fix:runtime增加agent_detail接口&前端通过agent_detail查询开场白 Co-authored-by: wangxin<wangxin375@huawei.com> # message auto-generated for no-merge-commit merge: !959 fix:runtime增加agent_detail接口&前端通过agent_detail查询开场白 From: @programmegirl Reviewed-by: @ZYQ5333, @cyz95 See merge request: openJiuwen/agent-studio!959	1 个月前
space.py	fix: ruff lint Co-authored-by: caoyuzhe 00430192<smile.caoyuzhe@huawei.com> # message auto-generated for no-merge-commit merge: !445 fix: ruff lint From: @cyz95 Reviewed-by: @gallonH, @ZYQ5333 See merge request: openJiuwen/agent-studio!445	3 个月前
system_model.py	feat: Distributed deployment Co-authored-by: 曹凯淇<caokaiqi1@huawei.com> # message auto-generated for no-merge-commit merge: !692 feat: Distributed deployment From: @KevinLLLove Reviewed-by: @xiaoyao42, @cyz95 See merge request: openJiuwen/agent-studio!692	3 个月前
tags.py	fix:修复dfx问题 Co-authored-by: wangxin<wangxin375@huawei.com> # message auto-generated for no-merge-commit merge: !743 fix:修复dfx问题 From: @programmegirl Reviewed-by: @xiaoyao42, @ZYQ5333 See merge request: openJiuwen/agent-studio!743	3 个月前
triggers.py	fix(triggers): running triggers fail (SpaceInfo for system_trigger is wrongly validated) Co-authored-by: michaelhuawei<michael.atamuk@huawei.com> # message auto-generated for no-merge-commit merge: !1055 merge fix/trigger-edit-version-dropdown into develop fix(triggers): running triggers fail (SpaceInfo for system_trigger is wrongly validated) Created-by: michaelhuawei Commit-by: michaelhuawei Merged-by: ZYQ5333 Description: What type of PR is this? /kind bug What does this PR do / why do we need it: After the previous Triggers PR, testers reported that all trigger runs were failing. The root cause was in `check_user_space`: ### Problem Trigger-fired executions run under a special user: `system_trigger`. This user bypasses space validation, but the function returned an incomplete SpaceInfo: - missing `spacename` - missing `description` - missing `role_type` Downstream logic expects these fields to exist, causing trigger execution to fail. ### Fix Updated `check_user_space` to return a fully populated SpaceInfo for `system_trigger`: - `spacename="System Trigger"` - `description=""` - `role_type=RoleType.SUPER_USER` This ensures trigger executions run correctly without requiring space validation. Which issue(s) this PR fixes: No formal issue created (reported by testers) Code review checklist: + - [ ] whether to verify the function's return value + - [ ] Whether to comply with SOLID principle / Demeter's law + - [ ] Whether there is UT test case && the test case is valid (if no test case, explain why) + - [ ] Whether the API change is involved + - [ ] Whether official document modification is involved See merge request: openJiuwen/agent-studio!1055	19 天前
users.py	refactor: rename app directory to openjiuwen_studio refactor: rename app directory to openjiuwen_studio	5 个月前
vlm_models.py	feat: add VLM model management and reports Co-authored-by: yanx1n<yanx1n@noreply.gitcode.com> # message auto-generated for no-merge-commit merge: !998 feat: add VLM model management and reports From: @yanx1n Reviewed-by: @xiaoyao42, @cyz95 See merge request: openJiuwen/agent-studio!998	1 个月前
workflows.py	feat:增加工作流导出dsl信息 Co-authored-by: wudawei<wudawei6@h-partners.com> # message auto-generated for no-merge-commit merge: !901 feat:增加工作流导出dsl信息 From: @w1101627533 Reviewed-by: @xiaoyao42, @ZYQ5333 See merge request: openJiuwen/agent-studio!901	2 个月前