hermes-agent/tests/gateway · ChrisBob/hermes-agent - AtomGit

TTekniumfix(gateway): extend observe+attribution to location and media handlers

文件	最后提交记录	最后更新时间
platforms	feat(state.db): persist platform_message_id; restore yuanbao exact-id recall PR #29211 dropped JSONL gateway transcripts and noted that the platform's own `message_id` field (used by Yuanbao's recall guard to redact a message by exact platform id) was no longer preserved — falling back to content-match. That fallback works for the common case but redacts the wrong row when two messages share text (or fails to match when content is post-processed). Restore exact-id matching by giving state.db a column for it: - New `platform_message_id TEXT` column on the messages table (SCHEMA_VERSION bump 11 → 12; column added via declarative reconciler on existing DBs, no version-gated migration block needed) - Partial index `idx_messages_platform_msg_id` on (session_id, platform_message_id) to keep recall's point-lookup cheap even on large sessions - `append_message()` and `replace_messages()` accept the new value: the gateway-facing `append_to_transcript` in `gateway/session.py` forwards either `message["platform_message_id"]` or the legacy `message["message_id"]` key (yuanbao's existing convention) - `get_messages_as_conversation()` surfaces the column back on the message dict as `message_id` so platform code reads the same shape it used to read from JSONL - Yuanbao `_patch_transcript`: restore branch A1 (exact id match) ahead of A2 (content match) ahead of B (system-note). Both branches log which one fired so operators can tell from gateway.log whether recall hit the canonical path or had to fall back. Tests: - New low-level round-trip tests in `test_hermes_state.py` for both `append_message` and `replace_messages` paths - The PR's `test_yuanbao_recall_db_only.py` was rewritten to assert the new contract: branch A1 (id match) works against DB-only transcripts, and branch A2 (content match) still recovers rows that were observed without a platform id (e.g. agent-processed @bot messages where run.py doesn't carry msg_id through)	14 天前
__init__.py	test: reorganize test structure and add missing unit tests Reorganize flat tests/ directory to mirror source code structure (tools/, gateway/, hermes_cli/, integration/). Add 11 new test files covering previously untested modules: registry, patch_parser, fuzzy_match, todo_tool, approval, file_tools, gateway session/config/ delivery, and hermes_cli config/models. Total: 147 unit tests passing, 9 integration tests gated behind pytest marker.	3 个月前
_plugin_adapter_loader.py	test(gateway): isolate plugin adapter imports and guard the anti-pattern Fixes the xdist collision that broke CI on PR #17764, and structurally prevents future plugin-adapter tests from reintroducing it. Problem ------- tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py (already on main) both followed the same anti-pattern: sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>')) from adapter import <Adapter> Every platform plugin ships its own adapter.py, so the bare 'from adapter import ...' races for sys.modules['adapter']. Whichever test collected first in a given xdist worker won; the other crashed at collection with ImportError, and the polluted sys.path cascaded into 19 unrelated test failures across tools/, hermes_cli/, and run_agent/ in the same worker. Fix --- 1. tests/gateway/_plugin_adapter_loader.py (new): shared helper load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py via importlib.util under the unique module name plugin_adapter_<name>. Zero sys.path mutation, no possibility of collision. 2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py: migrated to the helper. All 'from adapter import ...' statements (including the ones inside test methods) are replaced with module-level attribute access on the loaded module. 3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans every test_.py under tests/gateway/ at session start and fails the run with a pointer to the helper if any test uses sys.path.insert into plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'. Runs on the xdist controller only (skipped in workers). The next plugin adapter test that tries to reintroduce this pattern gets rejected at collection time with a clear remediation message. 4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to AUTHOR_MAP so the check-attribution workflow passes. Validation ---------- scripts/run_tests.sh tests/gateway/ 4194 passed scripts/run_tests.sh tests/gateway/test_{teams,irc} 72 passed (both orderings) scripts/run_tests.sh <11 prev-failing test files> 398 passed Guard triggers correctly on both Path-operator and string-literal forms of the anti-pattern.	1 个月前
conftest.py	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	18 天前
feishu_helpers.py	feat(feishu): operator-configurable bot admission and mention policy Add two operator-facing toggles for inbound Feishu admission, enabling bot-to-bot scenarios such as A2A orchestration and inter-bot notifications: FEISHU_ALLOW_BOTS=none\|mentions\|all (default: none) Accept messages from other bots. `mentions` requires the peer bot to @-mention Hermes; `all` admits every peer-bot message. FEISHU_REQUIRE_MENTION=true\|false (default: true) Whether group messages must @-mention the bot. Override per-chat via `group_rules.<chat_id>.require_mention` in config.yaml. Defaults preserve prior behavior. Self-echo protection is always on: when the bot's identity is unresolved (auto-detection failed and FEISHU_BOT_OPEN_ID unset), peer-bot messages are rejected fail-closed to avoid feedback loops. Admitted peer bots bypass the human-user allowlist (FEISHU_ALLOWED_USERS) to match existing Discord behavior; humans still need an explicit allowlist entry. yaml feishu.allow_bots is bridged to the env var so the adapter and gateway auth layer share one source of truth. Resolving peer-bot display names requires the application:bot.basic_info:read scope; without it, peers still route but appear as their open_id. Test: tests/gateway/test_feishu_bot_admission.py covers the admission pipeline, group-policy bot-bypass, hydration, and event-dispatch plumbing as a parametrized matrix. Change-Id: I363cccb578c2a5c8b8bf0f0a890c01c89909e256	1 个月前
restart_test_helpers.py	fix(gateway): cap cached session sources with LRU eviction Follow-up on top of Zyproth's session-source cache: swap the unbounded dict for an OrderedDict with a 512-entry LRU cap so long-running gateways can't accumulate stale entries for dead sessions forever. - self._session_sources is now an OrderedDict - _cache_session_source() move_to_end + popitem(last=False) above cap - _get_cached_session_source() move_to_end on hit (LRU read bump) - restart_test_helpers.py wires OrderedDict + _session_sources_max	28 天前
test_7100_transient_failure_transcript.py	fix(gateway): persist user message on transient agent failures (#7100) The #1630 fix introduced a blanket `agent_failed_early` transcript skip to prevent context-overflow sessions from looping. That guard also triggers for unrelated transient failures (429 rate limits, read timeouts, connection resets, provider 5xx) which have nothing to do with session size — and it silently drops the user's message, so the agent has no memory of the last turn on retry. Split the failure classification in `GatewayRunner._run_agent`: * Context-overflow (`compression_exhausted` flag, explicit context-length phrases, or generic 400 with a long history) → keep the existing skip, preserving the #1630/#9893 fix. * Anything else that failed → persist just the user message so the conversation survives a retry. Use specific multi-word phrases (`context length`, `token limit`, `prompt is too long`, etc.) to match `run_agent.py`'s own classifier; bare `exceed` false-positively flagged "rate limit exceeded" as context overflow. Covered by new tests in `tests/gateway/test_7100_transient_failure_transcript.py` and the existing #1630 suite still passes.	1 个月前
test_active_session_text_merge.py	fix(gateway): merge rapid TEXT follow-ups during active sessions (#4469) (#26822) When the agent is running and the user sends multiple TEXT messages in rapid succession, base.py's active-session branch stored the pending event as a single-slot replacement: self._pending_messages[session_key] = event Three rapid messages A, B, C landed as: A (interrupts), B (replaces A before consumer reads), C (replaces B). Only C reached the next turn — A and B were silently dropped. This is the symptom in #4469. Route the follow-up through merge_pending_message_event(..., merge_text=True) so TEXT events accumulate into the existing pending event's text instead of clobbering it. Photo and media bursts already merged through the same helper; this just extends the merge_text path (already used by the Telegram bursty-grace branch in gateway/run.py) to all platforms. Test exercises BasePlatformAdapter.handle_message directly with the session marked active and asserts three rapid TEXT events merge to 'part two\\npart three' rather than dropping the middle message. Sanity-checked the test would fail without the fix. Credits @devorun for the original investigation and analysis in #4491 that surfaced the underlying queue handling, though their fix targeted GatewayRunner._pending_messages which is now dead state on main.	19 天前
test_agent_cache.py	test: remove 50 stale/broken tests to unblock CI (#22098) These 50 tests were failing on main in GHA Tests workflow (run 25580403103). Removing them to get CI green. Each underlying issue is either a stale test asserting old behavior after source was intentionally changed, an env-drift test that doesn't run cleanly under the hermetic CI conftest, or a flaky integration test. They can be rewritten individually as needed. Files affected: - tests/agent/test_bedrock_1m_context.py (3) - tests/agent/test_unsupported_parameter_retry.py (2) - tests/cron/test_cron_script.py (1) - tests/cron/test_scheduler_mcp_init.py (2) - tests/gateway/test_agent_cache.py (1) - tests/gateway/test_api_server_runs.py (1) - tests/gateway/test_discord_free_response.py (1) - tests/gateway/test_google_chat.py (6) - tests/gateway/test_telegram_topic_mode.py (3) - tests/hermes_cli/test_model_provider_persistence.py (2) - tests/hermes_cli/test_model_validation.py (1) - tests/hermes_cli/test_update_yes_flag.py (1) - tests/run_agent/test_concurrent_interrupt.py (2) - tests/tools/test_approval_heartbeat.py (3) - tests/tools/test_approval_plugin_hooks.py (2) - tests/tools/test_browser_chromium_check.py (7) - tests/tools/test_command_guards.py (4) - tests/tools/test_credential_pool_env_fallback.py (1) - tests/tools/test_daytona_environment.py (1) - tests/tools/test_delegate.py (4) - tests/tools/test_skill_provenance.py (1) - tests/tools/test_vercel_sandbox_environment.py (1) Before: 50 failed, 21223 passed. After: 0 failed (targeted run of all 22 affected files: 630 passed).	26 天前
test_allowed_channels_widening.py	fix(tests): catch up 25 stale tests after recent merges (#28626) Sweep of all CI failures on origin/main, grouped by drift source: Telegram allowlist gate (db50af910 added user-authz to _should_process_message): - Hardcoded "[Telegram]" prefix in the logger.warning so the call no longer dereferences self.name → self.platform, which test fixtures built via object.__new__ never set. - test_telegram_format / test_allowed_channels_widening fixtures stub _is_callback_user_authorized → True so the new gate doesn't reject guest-mode / allowed-channels test messages. - test_telegram_approval_buttons::test_update_prompt_callback_not_affected sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't reject the callback before it writes .update_response. Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin): - test_no_callback_returns_approval_required: status is now "pending_approval" (was "approval_required"). - test_close_stdin_allows_eof_driven_process_to_finish: switch to use_pty=True; non-PTY now uses stdin=DEVNULL. Mattermost (send() now resolves root_id via _api_get first): - test_send_with_thread_reply mocks _session.get with a thread-root response so the new resolver doesn't TypeError on a bare AsyncMock. Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available): - _safe_int → _to_epoch in the two test_kanban_db tests. - Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available to True since the isolated kanban_home fixture has no devops/kanban-worker tree. - test_gateway_dispatcher_disables_corrupt_board: connect count 3 → 5 (review-column probe now also runs per tick). Aux-config severity at_or_above (a94ddd807): - test_diagnostics_endpoint_severity_filter expects warning filter to include error+critical now (was exact-match). Anthropic error handling (conversation loop extracted from run_agent): - _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND agent.conversation_loop.jittered_backoff. The latter is the actual call site; without the second patch tests burn ~2s per retry and hit the 30s SIGALRM timeout on CI. Other test pollution / drift: - test_auto_does_not_select_copilot_from_github_token: patch agent.bedrock_adapter.has_aws_credentials → False so boto3's credential chain can't auto-pick Bedrock from developer ~/.aws. - test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value in addition to setup_mod.get_env_value — _platform_status reads through the gateway module's binding. - test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes "hermes_plugins" too. - test_recommended_update_command_defaults_to_hermes_update: also short-circuit get_managed_update_command in case a stray ~/.hermes/.managed marker is present. - test_user_id_is_not_explicit: _parse_target_ref now returns is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects them — a DM must be opened first via conversations.open).	16 天前
test_allowlist_startup_check.py	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	18 天前
test_api_server.py	[agent] fix: harden api server response headers	18 天前
test_api_server_bind_guard.py	fix(security): enforce API_SERVER_KEY for non-loopback binding Add is_network_accessible() helper using Python's ipaddress module to robustly classify bind addresses (IPv4/IPv6 loopback, wildcards, mapped addresses, hostname resolution with DNS-failure-fails-closed). The API server connect() now refuses to start when the bind address is network-accessible and no API_SERVER_KEY is set, preventing RCE from other machines on the network. Co-authored-by: entropidelic <entropidelic@users.noreply.github.com>	1 个月前
test_api_server_jobs.py	refactor: remove redundant local imports already available at module level Sweep ~74 redundant local imports across 21 files where the same module was already imported at the top level. Also includes type fixes and lint cleanups on the same branch.	1 个月前
test_api_server_multimodal.py	feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text\|image_url, image_url: {url, detail?}} Responses: {type: input_text\|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text\|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>	1 个月前
test_api_server_normalize.py	fix(api_server): normalize array-based content parts in chat completions Some OpenAI-compatible clients (Open WebUI, LobeChat, etc.) send message content as an array of typed parts instead of a plain string: [{"type": "text", "text": "hello"}] The agent pipeline expects strings, so these array payloads caused silent failures or empty messages. Add _normalize_chat_content() with defensive limits (recursion depth, list size, output length) and apply it to both the Chat Completions and Responses API endpoints. The Responses path had inline normalization that only handled input_text/output_text — the shared function also handles the standard 'text' type. Salvaged from PR #7980 (ikelvingo) — only the content normalization; the SSE and Weixin changes in that PR were regressions and are not included. Co-authored-by: ikelvingo <ikelvingo@users.noreply.github.com>	1 个月前
test_api_server_runs.py	ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock (#28861) * ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock The full pytest suite reliably hangs at ~96% on origin/main, blowing through the 20-minute GHA job timeout on every CI push since yesterday. Individual tests complete in <30s — the deadlock builds up at session teardown after all tests run, when leaked threads and atexit handlers from thousands of tests interact and one of them lands in a futex-wait that never resolves. This PR is a stopgap that unblocks CI immediately + speeds up several slow tests we found while diagnosing. Changes - pyproject.toml: add pytest-timeout==2.4.0 to dev deps; bake --timeout=60 --timeout-method=thread into the default addopts. - scripts/run_tests.sh: re-add --timeout flags directly because the script wipes pyproject addopts with -o 'addopts='. - .github/workflows/tests.yml: explicit --timeout/--timeout-method on the CI pytest invocation for clarity. - gateway/run.py: in _run_agent, if the stream consumer was never created (e.g. non-streaming agent or test stub), cancel the stream_task immediately instead of waiting out the 5s wait_for timeout. ~5s saved per non-streaming gateway test run. - tests/run_agent/conftest.py: extend _fast_retry_backoff to patch agent.conversation_loop.jittered_backoff alongside run_agent.jittered_backoff. The retry loop was extracted into agent.conversation_loop which holds its own import — patching the run_agent reference alone left tests burning real wall-clock backoff seconds. - tests/run_agent/test_anthropic_error_handling.py tests/run_agent/test_run_agent.py (TestRetryExhaustion) tests/run_agent/test_fallback_model.py: same conversation_loop fix for per-test fixtures (defensive — the conftest covers them too). - tests/gateway/test_gateway_inactivity_timeout.py: trim run_duration 10.0 → 2.0 / 5.0 → 2.0 on three tests that wait the full SlowFakeAgent duration. Adjusted thresholds proportionally. - tests/gateway/test_api_server_runs.py: test_stop_interrupt_exception_does_not_crash trips the interrupted event in addition to raising, so the slow_run thread unblocks at teardown instead of waiting 10s. - tests/hermes_cli/test_update_gateway_restart.py: also patch time.monotonic in the autouse fixture. _wait_for_service_active loops on a wall-clock deadline; with sleep no-op'd the loop spun on real monotonic until 10s real-time per restart attempt (20s+ per test). - tests/tools/test_zombie_process_cleanup.py: cut runner._restart_drain_timeout 5.0 → 0.1 in test_gateway_stop_calls_close. Suite still hangs at 96% on full no-timeout runs; with these changes CI runs through to a real pass/fail signal. * chore(lock): regenerate uv.lock after adding pytest-timeout * ci: drop pytest-timeout 60 → 30s + bump GHA job 20 → 30 min Prior commit's timeout=60 was too generous — CI test job still hit the 20-min wall-clock cap with the suite hung at 96% (orphan agent-browser subprocesses blocking pytest session teardown). The local timeout=20 run completed in 6:17, so 30s is conservative enough to let real tests finish but aggressive enough to short-circuit deadlocks. Also bump GHA job timeout to 30 min as a safety margin. * test: delete 11 pre-existing failing tests + revert monotonic patch The previous PR commit landed pytest-timeout=30s and the suite now completes in 18:14 instead of hanging at 96%, but 11 pre-existing tests fail with real assertions. Per Teknium: nuke them. Deleted (no replacements): - tests/gateway/test_restart_resume_pending.py::test_clean_drain_does_not_mark_resume_pending - tests/gateway/test_restart_resume_pending.py::test_drain_timeout_only_marks_still_running_sessions - tests/hermes_cli/test_gateway_service.py::TestGatewaySystemServiceRouting::test_gateway_install_passes_system_flags - tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages::test_install_wsl_with_systemd_warns - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_detects_launchd_and_skips_manual_restart_message - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways - tests/tools/test_file_operations.py::TestGitBaselineCheck::* (6 tests, entire class — _check_git_baseline helper doesn't exist) Also reverted my time.monotonic autouse-fixture hack in test_update_gateway_restart.py — it was causing worker crashes in CI by poisoning later tests in the same xdist worker. The two slow tests in that file (~24s and ~20s) will go back to taking real time but should still finish under the 30s pytest-timeout. * test: delete more pre-existing CI failures After previous push 3 more tests failed on CI; cull them all. Removed: - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_without_launchd_shows_manual_restart - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateResetFailedBeforeRestart::test_reset_failed_also_runs_before_retry_restart - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateResetFailedBeforeRestart::test_final_failure_message_tells_user_to_reset_failed - tests/run_agent/test_tool_call_args_sanitizer.py::test_marker_message_inserted_when_missing The 4 update_gateway_restart tests trigger `_wait_for_service_active` polling on a real wall-clock deadline that occasionally exceeds the 30s pytest-timeout cap and crashes xdist workers. The marker test has a pre-existing assertion mismatch. * test: nuke entire TestCmdUpdateLaunchdRestart class After surgical deletes of 4 tests this class keeps producing new worker-crashing tests. The pattern is consistent: any test in this class that triggers cmd_update's _wait_for_service_active polling spins on real wall-clock time and trips pytest-timeout's thread method, crashing the xdist worker. Just delete the whole class (285 lines, ~10 tests). These exercise macOS-only launchd behavior that's better tested on a real macOS runner than in linux xdist. * test: stub the 2 fallback_model tests that crash xdist workers on CI * test: delete test_anthropic_error_handling.py + test_fallback_model.py entirely These two files exercise the agent retry/fallback code paths and consistently crash xdist workers under pytest-timeout's thread method. Whack-a-mole-stubbing individual tests just surfaces the next ones. Nuke both files. * test: delete tests/hermes_cli/test_update_gateway_restart.py entirely This file's cmd_update integration tests consistently crash xdist workers under pytest-timeout's thread method. Surgical deletes just surface the next set. Removing the whole file. * ci(tests): switch pytest-timeout method thread → signal Thread-method has been crashing xdist workers when it interrupts code that's not interruption-safe (retry loops, threading.Event waits, etc). Signal method uses SIGALRM which is interpreter-level and cleanly raises a Failed: Timeout exception in test code. Should stop the worker crash cascade — failures will surface as proper Timeout markers we can diagnose individually.	15 天前
test_api_server_toolset.py	refactor: remove browser_close tool — auto-cleanup handles it (#5792) * refactor: remove browser_close tool — auto-cleanup handles it The browser_close tool was called in only 9% of browser sessions (13/144 navigations across 66 sessions), always redundantly — cleanup_browser() already runs via _cleanup_task_resources() at conversation end, and the background inactivity reaper catches anything else. Removing it saves one tool schema slot in every browser-enabled API call. Also fixes a latent bug: cleanup_browser() now handles Camofox sessions too (previously only Browserbase). Camofox sessions were never auto-cleaned per-task because they live in a separate dict from _active_sessions. Files changed (13): - tools/browser_tool.py: remove function, schema, registry entry; add camofox cleanup to cleanup_browser() - toolsets.py, model_tools.py, prompt_builder.py, display.py, acp_adapter/tools.py: remove browser_close from all tool lists - tests/: remove browser_close test, update toolset assertion - docs/skills: remove all browser_close references * fix: repeat browser_scroll 5x per call for meaningful page movement Most backends scroll ~100px per call — barely visible on a typical viewport. Repeating 5x gives ~500px (~half a viewport), making each scroll tool call actually useful. Backend-agnostic approach: works across all 7+ browser backends without needing to configure each one's scroll amount individually. Breaks early on error for the agent-browser path. * feat: auto-return compact snapshot from browser_navigate Every browser session starts with navigate → snapshot. Now navigate returns the compact accessibility tree snapshot inline, saving one tool call per browser task. The snapshot captures the full page DOM (not viewport-limited), so scroll position doesn't affect it. browser_snapshot remains available for refreshing after interactions or getting full=true content. Both Browserbase and Camofox paths auto-snapshot. If the snapshot fails for any reason, navigation still succeeds — the snapshot is a bonus, not a requirement. Schema descriptions updated to guide models: navigate mentions it returns a snapshot, snapshot mentions it's for refresh/full content. * refactor: slim cronjob tool schema — consolidate model/provider, drop unused params Session data (151 calls across 67 sessions) showed several schema properties were never used by models. Consolidated and cleaned up: Removed from schema (still work via backend/CLI): - skill (singular): use skills array instead - reason: pause-only, unnecessary - include_disabled: now defaults to true - base_url: extreme edge case, zero usage - provider (standalone): merged into model object Consolidated: - model + provider → single 'model' object with {model, provider} fields. If provider is omitted, the current main provider is pinned at creation time so the job stays stable even if the user changes their default. Kept: - script: useful data collection feature - skills array: standard interface for skill loading Schema shrinks from 14 to 10 properties. All backend functionality preserved — the Python function signature and handler lambda still accept every parameter. * fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli, hermes-messaging, safe), which meant it appeared in every session for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS gate only works after running 'hermes tools' explicitly. Now MoA only appears when a user explicitly enables it via 'hermes tools'. The moa toolset definition and check_fn remain unchanged — it just needs to be opted into.	1 个月前
test_approve_deny_commands.py	fix(tests): catch up 25 stale tests after recent merges (#28626) Sweep of all CI failures on origin/main, grouped by drift source: Telegram allowlist gate (db50af910 added user-authz to _should_process_message): - Hardcoded "[Telegram]" prefix in the logger.warning so the call no longer dereferences self.name → self.platform, which test fixtures built via object.__new__ never set. - test_telegram_format / test_allowed_channels_widening fixtures stub _is_callback_user_authorized → True so the new gate doesn't reject guest-mode / allowed-channels test messages. - test_telegram_approval_buttons::test_update_prompt_callback_not_affected sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't reject the callback before it writes .update_response. Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin): - test_no_callback_returns_approval_required: status is now "pending_approval" (was "approval_required"). - test_close_stdin_allows_eof_driven_process_to_finish: switch to use_pty=True; non-PTY now uses stdin=DEVNULL. Mattermost (send() now resolves root_id via _api_get first): - test_send_with_thread_reply mocks _session.get with a thread-root response so the new resolver doesn't TypeError on a bare AsyncMock. Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available): - _safe_int → _to_epoch in the two test_kanban_db tests. - Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available to True since the isolated kanban_home fixture has no devops/kanban-worker tree. - test_gateway_dispatcher_disables_corrupt_board: connect count 3 → 5 (review-column probe now also runs per tick). Aux-config severity at_or_above (a94ddd807): - test_diagnostics_endpoint_severity_filter expects warning filter to include error+critical now (was exact-match). Anthropic error handling (conversation loop extracted from run_agent): - _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND agent.conversation_loop.jittered_backoff. The latter is the actual call site; without the second patch tests burn ~2s per retry and hit the 30s SIGALRM timeout on CI. Other test pollution / drift: - test_auto_does_not_select_copilot_from_github_token: patch agent.bedrock_adapter.has_aws_credentials → False so boto3's credential chain can't auto-pick Bedrock from developer ~/.aws. - test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value in addition to setup_mod.get_env_value — _platform_status reads through the gateway module's binding. - test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes "hermes_plugins" too. - test_recommended_update_command_defaults_to_hermes_update: also short-circuit get_managed_update_command in case a stray ~/.hermes/.managed marker is present. - test_user_id_is_not_explicit: _parse_target_ref now returns is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects them — a DM must be opened first via conversations.open).	16 天前
test_auth_fallback.py	fix(gateway,cron): activate fallback_model when primary provider auth fails When the primary provider raises AuthError (expired OAuth token, revoked API key), the error was re-raised before AIAgent was created, so fallback_model was never consulted. Now both gateway/run.py and cron/scheduler.py catch AuthError specifically and attempt to resolve credentials from the fallback_providers/fallback_model config chain before propagating the error. Closes #7230	1 个月前
test_auto_continue.py	feat: auto-continue interrupted agent work after gateway restart (#4493) When the gateway restarts mid-agent-work, the session transcript ends on a tool result the agent never processed. Previously, the user had to type 'continue' or use /retry (which replays from scratch, losing all prior work). Now, when the next user message arrives and the loaded history ends with role='tool', a system note is prepended: [System note: Your previous turn was interrupted before you could process the last tool result(s). Please finish processing those results and summarize what was accomplished, then address the user's new message below.] This is injected in _run_agent()'s run_sync closure, right before calling agent.run_conversation(). The agent sees the full history (including the pending tool results) and the system note, so it can summarize what was accomplished and then handle the user's new input. Design decisions: - No new session flags or schema changes — purely detects trailing tool messages in the loaded history - Works for any restart scenario (clean, crash, SIGTERM, drain timeout) as long as the session wasn't suspended (suspended = fresh start) - The user's actual message is preserved after the note - If the session WAS suspended (unclean shutdown), the old history is abandoned and the user starts fresh — no false auto-continue Also updates the shutdown notification message from 'Use /retry after restart to continue' to 'Send any message after restart to resume where it left off' — which is now accurate. Test plan: - 6 new auto-continue tests (trailing tool detection, no false positives for assistant/user/empty history, multi-tool, message preservation) - All 13 restart drain tests pass (updated /retry assertion)	1 个月前
test_background_command.py	test(gateway): include direct_messages_topic_id in telegram DM metadata assertions	17 天前
test_background_process_notifications.py	fix(gateway): route background-process notifications into Telegram DM topics Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	16 天前
test_base_topic_sessions.py	fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies	16 天前
test_bluebubbles.py	fix(gateway): preserve underscores in plain-text identifiers	18 天前
test_bundles_command.py	feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: \| Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section.	16 天前
test_busy_session_ack.py	feat(busy): add 'steer' as a third display.busy_input_mode option (#16279) Enter while the agent is busy can now inject the typed text via /steer — arriving at the agent after the next tool call — instead of interrupting (current default) or queueing for the next turn. Changes: - cli.py: keybinding honors busy_input_mode='steer' by calling agent.steer(text) on the UI thread (thread-safe), with automatic fallback to 'queue' when the agent is missing, steer() is unavailable, images are attached, or steer() rejects the payload. /busy accepts 'steer' as a fourth argument alongside queue/interrupt/status. - gateway/run.py: busy-message handler and the PRIORITY running-agent path both route through running_agent.steer() when the mode is 'steer', with the same fallback-to-queue safety net. Ack wording tells users their message was steered into the current run. Restart-drain queueing now also activates for 'steer' so messages aren't lost across restarts. - agent/onboarding.py: first-touch hint has a steer branch for both CLI and gateway. - hermes_cli/commands.py: /busy args_hint updated to include steer, and 'steer' is registered as a subcommand (completions). - hermes_cli/web_server.py: dashboard select widget offers steer. - hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py: inline docs updated. - website/docs/user-guide/cli.md + messaging/index.md: documented. - Tests: steer set/status path for /busy; onboarding hints; _load_busy_input_mode accepts steer; busy-session ack exercises steer success + two fallback-to-queue branches. Requested on X by @CodingAcct. Default is unchanged (interrupt).	1 个月前
test_busy_session_auth_bypass.py	fix(gateway): enforce auth check in busy-session path to prevent unauthorized injection (#17775) The busy-session handler (_handle_active_session_busy_message) bypassed the authorization gate that the cold path enforces via _is_user_authorized(). In shared-thread contexts (Slack threads, Telegram forum topics, Discord threads) where thread_sessions_per_user=False (the default), all participants share one session_key. An unauthorized user posting in the same thread as an authorized user would hit the active-session branch, skip the auth check, and have their text merged into _pending_messages or injected via agent.interrupt(). This commit adds the same _is_user_authorized() check at the top of the busy handler, before any message queuing, steering, or interrupt logic. Unauthorized messages are silently dropped (return True) with a warning log — matching the cold-path behavior. Affected platforms: Slack, Telegram, Discord, any adapter with shared-session thread contexts. Closes #17775	1 个月前
test_cancel_background_drain.py	fix(gateway): cancel_background_tasks must drain late-arrivals (#12471) During gateway shutdown, a message arriving while cancel_background_tasks is mid-await (inside asyncio.gather) spawns a fresh _process_message_background task via handle_message and adds it to self._background_tasks. The original implementation's _background_tasks.clear() at the end of cancel_background_tasks dropped the reference; the task ran untracked against a disconnecting adapter, logged send-failures, and lingered until it completed on its own. Fix: wrap the cancel+gather in a bounded loop (MAX_DRAIN_ROUNDS=5). If new tasks appeared during the gather, cancel them in the next round. The .clear() at the end is preserved as a safety net for any task that appeared after MAX_DRAIN_ROUNDS — but in practice the drain stabilizes in 1-2 rounds. Tests: tests/gateway/test_cancel_background_drain.py — 3 cases. - test_cancel_background_tasks_drains_late_arrivals: spawn M1, start cancel, inject M2 during M1's shielded cleanup, verify M2 is cancelled. - test_cancel_background_tasks_handles_no_tasks: no-op path still terminates cleanly. - test_cancel_background_tasks_bounded_rounds: baseline — single task cancels in one round, loop terminates. Regression-guard validated: against the unpatched implementation, the late-arrival test fails with exactly the expected message ('task leaked'). With the fix it passes. Blast radius is shutdown-only; the audit classified this as MED. Shipping because the fix is small and the hygiene is worth it. While investigating the audit's other MEDs (busy-handler double-ack, Discord ExecApprovalView double-resolve, UpdatePromptView double-resolve), I verified all three were false positives — the check-and-set patterns have no await between them, so they're atomic on single-threaded asyncio. No fix needed for those.	1 个月前
test_channel_directory.py	fix(Slack): resolve Slack channels by raw ID and enumerate joined channels send_message(target='slack:<channel_id>') failed with "Could not resolve" because _parse_target_ref had no Slack branch — Slack's uppercase alphanumeric IDs fell through to channel-name resolution, which only matched by name. As a fallback, the agent would retry with bare target='slack' and post to the home channel instead. Three fixes: - _parse_target_ref recognizes Slack IDs (C/G/D/U/W prefix) as explicit targets so the name-resolver is bypassed entirely. - resolve_channel_name tries a case-sensitive raw-ID match before the existing name match, so any platform's IDs resolve cleanly. - _build_slack now actually calls users.conversations against each workspace's AsyncWebClient (paginated), instead of only returning session-history entries. This populates the directory with public and private channels the bot has joined, so action='list' shows them and they can also be addressed by name. Errors from one workspace don't block others. build_channel_directory becomes async (Slack web calls require it). The two async-context callers in gateway/run.py are awaited; the cron ticker thread call bridges via asyncio.run_coroutine_threadsafe. Slack bot needs channels:read and groups:read scopes for full enumeration; missing scopes degrade gracefully per-workspace. addressing #15927	1 个月前
test_clean_shutdown_marker.py	fix: update tests for resume_pending semantics + add AUTHOR_MAP entries Tests updated to reflect suspend_recently_active now setting resume_pending=True (preserves session) instead of suspended=True (wipes session history). AUTHOR_MAP entries: millerc79 (#19033), shellybotmoyer (#18915)	1 个月前
test_command_bypass_active_session.py	refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047) * refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch	1 个月前
test_complete_path_at_filter.py	fix(tui): restore voice/panic handlers + scope fuzzy paths to cwd Two fixes on top of the fuzzy-@ branch: (1) Rebase artefact: re-apply only the fuzzy additions on top of fresh `tui_gateway/server.py`. The earlier commit was cut from a base 58 commits behind main and clobbered ~170 lines of voice.toggle / voice.record handlers and the gateway crash hooks (`_panic_hook`, `_thread_panic_hook`). Reset server.py to origin/main and re-add only: - `_FUZZY_*` constants + `_list_repo_files` + `_fuzzy_basename_rank` - the new fuzzy branch in the `complete.path` handler (2) Path scoping (Copilot review): `git ls-files` returns repo-root- relative paths, but completions need to resolve under the gateway's cwd. When hermes is launched from a subdirectory, the previous code surfaced `@file:apps/web/src/foo.tsx` even though the agent would resolve that relative to `apps/web/` and miss. Fix: - `git -C root rev-parse --show-toplevel` to get repo top - `git -C top ls-files …` for the listing - `os.path.relpath(top + p, root)` per result, dropping anything starting with `../` so the picker stays scoped to cwd-and-below (matches Cmd-P workspace semantics) `apps/web/src/foo.tsx` ends up as `@file:src/foo.tsx` from inside `apps/web/`, and sibling subtrees + parent-of-cwd files don't leak. New test `test_fuzzy_paths_relative_to_cwd_inside_subdir` builds a 3-package mono-repo, runs from `apps/web/`, and verifies completion paths are subtree-relative + outside-of-cwd files don't appear. Copilot review threads addressed: #3134675504 (path scoping), #3134675532 (`voice.toggle` regression), #3134675541 (`voice.record` regression — both were stale-base artefacts, not behavioural changes).	1 个月前
test_compress_command.py	fix(compress): abort instead of dropping messages when summary LLM fails (#28102) When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path.	17 天前
test_compress_focus.py	fix(compress): don't reach into ContextCompressor privates from /compress (#15039) Manual /compress crashed with 'LCMEngine' object has no attribute '_align_boundary_forward' when any context-engine plugin was active. The gateway handler reached into _align_boundary_forward and _find_tail_cut_by_tokens on tmp_agent.context_compressor, but those are ContextCompressor-specific — not part of the generic ContextEngine ABC — so every plugin engine (LCM, etc.) raised AttributeError. - Add optional has_content_to_compress(messages) to ContextEngine ABC with a safe default of True (always attempt). - Override it in the built-in ContextCompressor using the existing private helpers — preserves exact prior behavior for 'compressor'. - Rewrite gateway /compress preflight to call the ABC method, deleting the private-helper reach-in. - Add focus_topic to the ABC compress() signature. Make _compress_context retry without focus_topic on TypeError so older strict-sig plugins don't crash on manual /compress <focus>. - Regression test with a fake ContextEngine subclass that only implements the ABC (mirrors LCM's surface). Reported by @selfhostedsoul (Discord, Apr 22).	1 个月前
test_compress_plugin_engine.py	fix(compress): don't reach into ContextCompressor privates from /compress (#15039) Manual /compress crashed with 'LCMEngine' object has no attribute '_align_boundary_forward' when any context-engine plugin was active. The gateway handler reached into _align_boundary_forward and _find_tail_cut_by_tokens on tmp_agent.context_compressor, but those are ContextCompressor-specific — not part of the generic ContextEngine ABC — so every plugin engine (LCM, etc.) raised AttributeError. - Add optional has_content_to_compress(messages) to ContextEngine ABC with a safe default of True (always attempt). - Override it in the built-in ContextCompressor using the existing private helpers — preserves exact prior behavior for 'compressor'. - Rewrite gateway /compress preflight to call the ABC method, deleting the private-helper reach-in. - Add focus_topic to the ABC compress() signature. Make _compress_context retry without focus_topic on TypeError so older strict-sig plugins don't crash on manual /compress <focus>. - Regression test with a fake ContextEngine subclass that only implements the ABC (mirrors LCM's surface). Reported by @selfhostedsoul (Discord, Apr 22).	1 个月前
test_config.py	Revert "feat(telegram): support quick-command-only menus" This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4.	16 天前
test_config_cwd_bridge.py	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	18 天前
test_config_env_bridge_authority.py	fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761) * fix(gateway): config.yaml wins over .env for agent/display/timezone settings Regression from the silent config→env bridge. The bridge at module import time is correct for max_turns (unconditional overwrite), but every other agent., display., timezone, and security bridge key was guarded by 'if X not in os.environ' — so a stale .env entry from an old 'hermes setup' run would shadow the user's current config.yaml indefinitely. Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60 in .env from an old setup, and the gateway silently capped at 60 iterations per turn. Gateway logs confirmed api_calls never exceeded 60. Three changes: 1. gateway/run.py: drop the 'not in os.environ' guards for all agent., display., timezone, and security.* bridge keys. config.yaml is now authoritative for these settings — same semantics already in place for max_turns, terminal., and auxiliary.. Also surface the bridge failure (previously 'except Exception: pass') to stderr so operators see bridge errors instead of silently falling back to .env. 2. gateway/run.py: INFO-log the resolved max_iterations at gateway start so operators can verify the config→env bridge did the right thing instead of chasing a phantom budget ceiling. 3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in the setup wizard. config.yaml is the single source of truth. Also clean up any stale .env entry left behind by pre-fix setups. Regression tests in tests/gateway/test_config_env_bridge_authority.py guard each config→env key against the 'stale .env shadows config' bug. * fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) Three issues observed in production gateway.log during a rapid restart chain on 2026-05-02, all fixed here. 1. _send_restart_notification logged unconditional success adapter.send() catches provider errors (e.g. Telegram 'Chat not found') and returns SendResult(success=False); it never raises. The caller ignored the return value and always logged 'Sent restart notification to <chat>' at INFO, producing a misleading success line directly below the 'Failed to send Telegram message' traceback on every boot. Now inspects result.success and logs WARNING with the error otherwise. 2. WhatsApp bridge SIGTERM on shutdown classified as fatal error _check_managed_bridge_exit() saw the bridge's returncode -15 (our own SIGTERM from disconnect()) and fired the full fatal-error path, producing 'ERROR ... WhatsApp bridge process exited unexpectedly' plus 'Fatal whatsapp adapter error (whatsapp_bridge_exited)' on every planned shutdown, immediately before the normal '✓ whatsapp disconnected'. Adds a _shutting_down flag that disconnect() sets before the terminate, and _check_managed_bridge_exit() returns None for returncode in {0, -2, -15} while shutting down. OOM-kill (137) and other non-signal exits still hit the fatal path. 3. restart_drain_timeout default 60s → 180s On 2026-05-02 01:43:27 a user /restart fired while three agents were mid-API-call (82s, 112s, 154s into their turns). The 60s drain budget expired and all three were force-interrupted. 180s covers realistic in-flight agent turns; users on very-long-reasoning models can still raise it further via agent.restart_drain_timeout in config.yaml. Existing explicit user values are preserved by deep-merge. Tests - tests/gateway/test_restart_notification.py: two new tests assert INFO is only logged on SendResult(success=True) and WARNING with the error string is logged on SendResult(success=False). - tests/gateway/test_whatsapp_connect.py: parametrized test for returncode in {0, -2, -15} proves shutdown-time exits are suppressed; separate test proves returncode 137 (SIGKILL/OOM) still surfaces as fatal even when _shutting_down is set. - _check_managed_bridge_exit() reads _shutting_down via getattr-with- default so existing _make_adapter() test helpers that bypass __init__ (pitfall #17 in AGENTS.md) keep working unmodified.	1 个月前
test_debug_command.py	fix(debug): sweep expired pending pastes on slash debug paths	1 个月前
test_delivery.py	fix(gateway): preserve case-sensitive chat IDs in DeliveryTarget.parse Fixes NousResearch/hermes-agent #11768 Root cause: target.strip().lower() was lowercasing the entire target string, corrupting case-sensitive chat IDs like Slack C123ABC and Matrix !RoomABC. Fix: Only lowercase the platform prefix for case-insensitive matching; preserve the original case for chat_id and thread_id values.	1 个月前
test_destructive_slash_confirm.py	feat: confirm prompt for destructive slash commands (#4069) (#22687) /clear, /new, /reset, and /undo now ask the user to confirm before discarding conversation state — three-option prompt routed through the existing tools.slash_confirm primitive. Native yes/no buttons render on Telegram, Discord, and Slack (their adapters already implement send_slash_confirm); other platforms get a text-fallback prompt and reply with /approve, /always, or /cancel. The classic prompt_toolkit CLI uses the same three-option flow via the established _prompt_text_input pattern (see _confirm_and_reload_mcp). TUI keeps its existing modal overlay (#12312). Gated by new config key approvals.destructive_slash_confirm (default true). Picking 'Always Approve' flips the gate to false so subsequent destructive commands run silently — matches the established mcp_reload_confirm UX. Out of scope: /cron remove (separate domain — scheduled jobs, not session history). Existing TUI overlay env-var (HERMES_TUI_NO_CONFIRM) left unchanged; cosmetic unification can come later. Closes #4069.	25 天前
test_dingtalk.py	fix(dingtalk): transcribe native voice notes Sibling fix to PR #28918 (Discord voice notes). DingTalk's rich-text "voice" item type is its native voice-message format, but the adapter was routing it to MessageType.AUDIO — which gateway/run.py:7605 skips for STT. The docs claim every voice-capable platform auto-transcribes, so this brings DingTalk in line. Generic audio uploads (mapped to "file" by DINGTALK_TYPE_MAPPING) are unchanged — they were already classified as DOCUMENT, not AUDIO. Adds tests/gateway/test_dingtalk.py::TestExtractMedia covering both the voice path and the audio-passthrough invariant.	15 天前
test_discord_allowed_channels.py	fix(discord): honor wildcard '' in ignored_channels and free_response_channels Follow-up to the allowed_channels wildcard fix in the preceding commit. The same '' literal trap affected two other Discord channel config lists: - DISCORD_IGNORED_CHANNELS: '' was stored as the literal string in the ignored set, and the intersection check never matched real channel IDs, so '' was a no-op instead of silencing every channel. - DISCORD_FREE_RESPONSE_CHANNELS: same shape — '' never matched, so the bot still required a mention everywhere. Add a '' short-circuit to both checks, matching the allowed_channels semantics. Extend tests/gateway/test_discord_allowed_channels.py with regression coverage for all three lists. Refs: #14920	1 个月前
test_discord_allowed_mentions.py	fix(discord): default allowed_mentions to block @everyone and role pings discord.py does not apply a default AllowedMentions to the client, so any reply whose content contains @everyone/@here or a role mention would ping the whole server — including verbatim echoes of user input or LLM output that happens to contain those tokens. Set a safe default on commands.Bot: everyone=False, roles=False, users=True, replied_user=True. Operators can opt back in via four DISCORD_ALLOW_MENTION_* env vars or discord.allow_mentions.* in config.yaml. No behavior change for normal user/reply pings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	1 个月前
test_discord_attachment_download.py	fix(discord): transcribe native voice notes	15 天前
test_discord_bot_auth_bypass.py	fix(discord): harden DISCORD_ALLOWED_ROLES and cover gateway layer Two follow-ups to the cherry-picked PR #9873 (`e3bcc819`): 1. `_is_allowed_user` now uses `getattr(self, '_allowed_*_ids', set())` so test fixtures that build the adapter via `object.__new__` (skipping __init__) don't crash with AttributeError. See AGENTS.md pitfall #17 — same pattern as gateway.run. 2. New 3-case regression coverage in test_discord_bot_auth_bypass.py: - role-only config bypasses the gateway 'no allowlists' branch - roles + users combined still authorizes user-allowlist matches - the role bypass does NOT leak to other platforms (Telegram, etc.) 3. Autouse fixture in test_discord_bot_auth_bypass.py clears all Discord auth env vars before each test so DISCORD_ALLOWED_ROLES leakage from a previous test in the session can't flip later 'should-reject' tests into false-pass. Required because the bare cherry-pick of #9873 only added the adapter- level role check — it didn't cover the gateway-level _is_user_authorized, which still rejected role-only setups via the 'no allowlists configured' branch.	1 个月前
test_discord_bot_filter.py	feat(discord): add DISCORD_ALLOW_BOTS config for bot message filtering (inspired by openclaw) Add configurable bot message filtering via DISCORD_ALLOW_BOTS env var: - 'none' (default): Ignore all other bot messages — matches previous behavior where only our own bot was filtered, but now ALL bots are filtered by default for cleaner channels - 'mentions': Accept bot messages only when they @mention our bot — useful for bot-to-bot workflows triggered by mentions - 'all': Accept all bot messages — for setups where bots need to interact freely Previously, we only ignored our own bot's messages, allowing all other bots through. This could cause noisy loops in channels with multiple bots. 8 new tests covering all filter modes and edge cases. Inspired by openclaw v2026.3.7 Discord allowBots: 'mentions' config.	2 个月前
test_discord_channel_controls.py	test: disable text batching in existing adapter tests Set _text_batch_delay_seconds = 0 on test adapter fixtures so messages dispatch immediately (bypassing async batching). This preserves the existing synchronous assertion patterns while the batching logic is tested separately in test_text_batching.py.	1 个月前
test_discord_channel_prompts.py	refactor: remove smart_model_routing feature (#12732) Smart model routing (auto-routing short/simple turns to a cheap model across providers) was opt-in and disabled by default. This removes the feature wholesale: the routing module, its config keys, docs, tests, and the orchestration scaffolding it required in cli.py / gateway/run.py / cron/scheduler.py. The /fast (Priority Processing / Anthropic fast mode) feature kept its hooks into _resolve_turn_agent_config — those still build a route dict and attach request_overrides when the model supports it; the route now just always uses the session's primary model/provider rather than running prompts through choose_cheap_model_route() first. Also removed: - DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out example sections in hermes_cli/config.py and cli-config.yaml.example - _load_smart_model_routing() / self._smart_model_routing on GatewayRunner - self._smart_model_routing / self._active_agent_route_signature on HermesCLI (signature kept; just no longer initialised through the smart-routing pipeline) - route_label parameter on HermesCLI._init_agent (only set by smart routing; never read elsewhere) - 'Smart Model Routing' section in website/docs/integrations/providers.md - tip in hermes_cli/tips.py - entries in hermes_cli/dump.py + hermes_cli/web_server.py - row in skills/autonomous-ai-agents/hermes-agent/SKILL.md Tests: - Deleted tests/agent/test_smart_model_routing.py - Rewrote tests/agent/test_credential_pool_routing.py to target the simplified _resolve_turn_agent_config directly (preserves credential pool propagation + 429 rotation coverage) - Dropped 'cheap model' test from test_cli_provider_resolution.py - Dropped resolve_turn_route patches from cli + gateway test_fast_command — they now exercise the real method end-to-end - Removed _smart_model_routing stub assignments from gateway/cron test helpers Targeted suites: 74/74 in the directly affected test files; tests/agent + tests/cron + tests/cli pass except 5 failures that already exist on main (cron silent-delivery + alias quick-command).	1 个月前
test_discord_channel_skills.py	test(discord): add tests for channel_skill_bindings resolution	1 个月前
test_discord_clarify_buttons.py	feat(discord): render clarify choices as buttons Brings Discord to parity with Telegram on the clarify tool's interactive UX. Overrides BasePlatformAdapter.send_clarify on DiscordAdapter to attach a button view when choices are present. - ClarifyChoiceView: one discord.ui.Button per choice (max 24, Discord's 25-component view cap leaves one slot for Other) plus a final 'Other (type answer)' button. - Numeric click -> tools.clarify_gateway.resolve_gateway_clarify( clarify_id, choice_text) using the canonical choice text from the gateway entry (falls back to the button label if the entry vanished). - Other click -> tools.clarify_gateway.mark_awaiting_text(clarify_id) so the gateway's text-intercept captures the next user message in this session as the response. - Auth via the shared _component_check_auth helper (same OR-semantics as ExecApprovalView / SlashConfirmView / UpdatePromptView / ModelPickerView). - Open-ended (no choices) path renders the prompt as a plain embed and relies on the existing text-intercept resolution. - Single-use: first valid click disables every button and updates the embed footer with who answered and what they chose. No changes to BasePlatformAdapter.send_clarify or the gateway's clarify_callback wiring -- the existing scaffolding already drives all adapters; Discord just inherits the default text fallback today and gains buttons by virtue of this override. Test conftest extended: _FakeEmbed gains add_field() / set_footer() stubs so tests can construct embedded views without monkey-patching per-test. Original PR: #19249 by @LeonSGP43. This is a reshape of the contributor's work onto current main's clarify infrastructure (clarify_id + entry-based resolution shared with Telegram, instead of a parallel on_answer-closure mechanism). The button view structure and UX shape are preserved. Tests: 14 new tests in tests/gateway/test_discord_clarify_buttons.py. 391/391 existing Discord gateway tests still pass. Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>	21 天前
test_discord_component_auth.py	fix(gateway/discord): require allowlist auth on slash commands Slash commands (_run_simple_slash, _handle_thread_create_slash) bypassed every DISCORD_ALLOWED_* gate enforced by on_message. Any guild member could invoke /background (RCE via terminal), /restart, /model, /skill, etc. CVSS 9.8 Critical. - _evaluate_slash_authorization mirrors on_message gates (user, role, channel, ignored channel) with fail-closed semantics - _check_slash_authorization sends ephemeral reject + logs + admin alert - Auth gate runs before defer() so rejections are ephemeral - /skill autocomplete returns [] for unauthorized users (no catalog leak) - Component views (ExecApproval, SlashConfirm, UpdatePrompt, ModelPicker) now honor role allowlists via shared _component_check_auth helper - Optional DISCORD_HIDE_SLASH_COMMANDS defense-in-depth - Cross-platform admin alert (Telegram/Slack fallback) on unauthorized attempts Based on PR #18125 by @0xyg3n.	1 个月前
test_discord_connect.py	fix(discord): narrow rate-limit catch and move sync state under gateway/ Two follow-ups on top of helix4u's slash-command sync hardening: - Only suppress exceptions that are actually Discord 429 rate limits (discord.RateLimited, HTTPException with status 429, or a clearly rate-limit-named duck type). Arbitrary failures that happen to expose a retry_after attribute now re-raise to the outer handler instead of silently swallowing a cooldown. - Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root stops collecting ad-hoc runtime files. Added a test verifying unrelated exceptions don't get misclassified as rate limits.	28 天前
test_discord_document_handling.py	feat(discord): allow_any_attachment config to accept arbitrary file types The Discord adapter silently dropped any attachment whose extension wasn't in the SUPPORTED_DOCUMENT_TYPES allowlist (PDF, text family, zip, office). Users uploading .wav / .bin / other unrecognized formats saw nothing in their conversation — the file got logged as 'Unsupported document type' and discarded before the agent ever saw it. Add discord.allow_any_attachment (default false) to bypass the allowlist. When on: - Any file is downloaded, cached under ~/.hermes/cache/documents/, and surfaced as a DOCUMENT-typed event with application/octet-stream MIME - gateway/run.py already emits a context note with the cached path, auto-translated via to_agent_visible_cache_path() for Docker/Modal sandboxed terminals - File body is NOT inlined — only the path — so binary uploads don't blow up the context window - Allowlisted text formats (.txt/.md/.log) keep their 100 KiB inline behavior unchanged Also adds discord.max_attachment_bytes (default 32 MiB matches the historical hardcoded cap; 0 = unlimited) since users opting into arbitrary types may want to raise the cap. The whole attachment is held in memory while being cached, so unlimited carries a real memory cost. Env overrides: DISCORD_ALLOW_ANY_ATTACHMENT, DISCORD_MAX_ATTACHMENT_BYTES. Discord-only by deliberate scope. Telegram has hard 20 MB API limits and Slack has its own caps — extending the same flag there is a separate follow-up if/when requested.	18 天前
test_discord_free_response.py	feat(discord): default history backfill on, expand to per-user + threads Follow-up to snav's PR #25463 contribution: flip default to on, broaden scope so backfill fires whenever require_mention gates the bot (not just shared-session channels). Why: - The mention-gate creates a session-transcript gap regardless of whether the channel is shared or per-user. In per-user sessions, Alice's session is still missing other participants' messages and her own pre-mention messages — backfill fills both gaps. - Threads naturally scope to thread-only history because discord.py's channel.history() on a thread returns only that thread's messages. - DMs still skip — every DM triggers the bot, so the session transcript is already complete. Changes: - hermes_cli/config.py: discord.history_backfill default → true - gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL default → 'true' - cli-config.yaml.example + website docs: update defaults and prose; add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were documented in the PR description but missing from the env-var table - tests/gateway/test_discord_free_response.py: - flip test_discord_per_user_channel_does_not_backfill → test_discord_per_user_channel_backfills_too (new behavior) - add test_discord_dm_does_not_backfill (DM skip is invariant) - give FakeThread a no-op history() so existing thread tests don't hit a fake discord.Forbidden when backfill now fires on threads too Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord.	20 天前
test_discord_imports.py	fix: defer discord adapter annotations Prevent gateway.platforms.discord from crashing at import time when discord.py is unavailable. Python 3.11 eagerly evaluates annotations, so using discord.Interaction and similar annotations caused an AttributeError after the optional import fallback set discord=None. Add postponed annotation evaluation and a regression test covering import without discord installed.	2 个月前
test_discord_lazy_install_views.py	fix(discord): define view classes after lazy discord.py install When discord.py is not installed at import time, DISCORD_AVAILABLE=False and the view class definitions at module bottom are skipped. check_discord_requirements() performs a lazy install and sets DISCORD_AVAILABLE=True but never re-ran the class definitions, causing NameError on the first button interaction (exec approval, slash confirm, etc.). Extract the five ui.View subclasses into _define_discord_view_classes() and call it both at module load (when discord.py is pre-installed) and inside check_discord_requirements() after a successful lazy install.	16 天前
test_discord_media_metadata.py	feat(discord): add /thread command, auto_thread config, and media metadata fix (#1178) - Add /thread slash command that creates a Discord thread and starts a new Hermes session in it. The starter message (if provided) becomes the first user input in the new session. - Add discord.auto_thread config option (DISCORD_AUTO_THREAD env var): when enabled, every message in a text channel automatically creates a thread, allowing parallel isolated sessions. - Fix Discord media method signatures to accept metadata kwarg (send_voice, send_image_file, send_image) — prevents TypeError when the base adapter passes platform metadata. - Fix test mock isolation: add app_commands and ForumChannel to discord mocks so tests pass in full-suite runs. Based on PRs #866 and #1109 by insecurejezza, modified per review: removed /channel command (unsafe), added auto_thread feature, made /thread dispatch new sessions. Co-authored-by: insecurejezza <insecurejezza@users.noreply.github.com>	2 个月前
test_discord_model_picker.py	test(gateway): unify discord mock via shared conftest; drop duplicated mock in model_picker test The cherry-picked model_picker test installed its own discord mock at module-import time via a local _ensure_discord_mock(), overwriting sys.modules['discord'] with a mock that lacked attributes other gateway tests needed (Intents.default(), File, app_commands.Choice). On pytest-xdist workers that collected test_discord_model_picker.py first, the shared mock in tests/gateway/conftest.py got clobbered and downstream tests failed with AttributeError / TypeError against missing mock attrs. Classic sys.modules cross-test pollution (see xdist-cross-test-pollution skill). Fix: - Extend the canonical _ensure_discord_mock() in tests/gateway/conftest.py to cover everything the model_picker test needs: real View/Select/ Button/SelectOption classes (not MagicMock sentinels), an Embed class that preserves title/description/color kwargs for assertion, and Color.greyple. - Strip the duplicated mock-setup block from test_discord_model_picker.py and rely on the shared mock that conftest installs at collection time. Regression check: scripts/run_tests.sh tests/gateway/ tests/hermes_cli/ -k 'discord or model or copilot or provider' -o 'addopts=' 1291 passed (was 1288 passed + 3 xdist-ordered failures before this commit).	1 个月前
test_discord_opus.py	fix: add macOS Homebrew Opus fallback and fix shutdown dict iteration - Add Homebrew library path fallback when ctypes.util.find_library fails on macOS (Apple Silicon + Intel paths, guarded by platform check) - Fix RuntimeError in gateway stop() by iterating over dict copy - Update Opus tests to verify find_library-first + conditional fallback	2 个月前
test_discord_race_polish.py	refactor(discord): slim down the race-polish fix (#12644) PR #12558 was heavy for what the fix actually is — essay-length comments, a dedicated helper method where a setdefault would do, and a source-inspection test with no real behavior coverage. The genuine code change is ~5 lines of new logic (1 field, 2 async with, an on_ready wait block). Trimmed: - Replaced the 12-line _voice_lock_for helper with a setdefault one-liner at each call site (join_voice_channel, leave_voice_channel). - Collapsed the 12-line comment on on_message's _ready_event wait to 3 lines. Dropped the warning log on timeout — pass-on-timeout is fine; if on_ready hangs that long, the bot is already broken and the log wouldn't help. - Dropped the source-inspection test (greps the module source for expected substrings). It was low-value scaffolding; the voice-serialization test covers actual behavior. Net: -73 lines vs PR #12558. Same two guarantees preserved, same test passes (verified by stashing the fix and confirming failure).	1 个月前
test_discord_reactions.py	fix(gateway): avoid false failure reactions on restart cancellation	1 个月前
test_discord_reply_mode.py	fix(gateway): load reply_to_mode from config.yaml for Discord and Telegram The YAML-to-env-var bridge in load_gateway_config() mapped every Discord and Telegram config key (require_mention, auto_thread, reactions, etc.) except reply_to_mode. Users setting discord.reply_to_mode or telegram.reply_to_mode in ~/.hermes/config.yaml got no effect — the adapter only read the env var, which nothing populated from YAML. Add the missing bridge for both platforms, following the existing pattern. Top-level <platform>.reply_to_mode preferred, falls back to <platform>.extra.reply_to_mode, env var never overwritten. Handles YAML 1.1 bare `off` → Python False coercion. This is a re-submission of the work from #9837 and #13930, which both implemented the same fix but neither landed (see co-authors below). Co-authored-by: Matteo De Agazio <hypnosis.mda@gmail.com> Co-authored-by: ishardo <239075732+ishardo@users.noreply.github.com>	30 天前
test_discord_roles_dm_scope.py	fix(discord): route DM role-auth opt-in through config.yaml (not env var) Per repo policy, ~/.hermes/.env is for secrets only. Guild IDs are behavioral configuration, not secrets. Replacing the DISCORD_DM_ROLE_AUTH_GUILD env var from the original fix with discord.dm_role_auth_guild in config.yaml. - New module-level _read_dm_role_auth_guild() helper reads hermes_cli.config.read_raw_config()['discord']['dm_role_auth_guild']. Fails closed on any parse error (safe default = DM role-auth off). - DEFAULT_CONFIG['discord'] gains dm_role_auth_guild: '' with a comment documenting the opt-in. - Tests patch hermes_cli.config.read_raw_config directly (via the _set_dm_role_auth_guild helper) instead of setenv/delenv. 12 tests in test_discord_roles_dm_scope pass; no env var involvement. - Docstring + module docstring + comments updated to reference discord.dm_role_auth_guild. - E2E verified with real imports across 6 scenarios: unset, int, string, garbage, zero, and (crucially) env-var-only-no-config all return None except the valid int/string cases. Env var has zero effect — policy compliance confirmed.	28 天前
test_discord_send.py	fix(discord): typing indicator task not cleaned up after API error When the Discord typing API call fails (rate limit, network error, 403), _typing_loop returns early but the stale task remains in _typing_tasks. Subsequent send_typing calls see the stale entry and skip, leaving no typing indicator for the rest of the agent invocation. Add finally block to _typing_loop to always remove the task from _typing_tasks on exit, whether from cancellation, error, or normal completion. This allows send_typing to create a fresh task. 3 new tests in test_discord_send.py: - Task removed after API error - Typing restartable after failure - stop_typing cleans up	24 天前
test_discord_slash_auth.py	fix(discord): extend role-scope fix to slash surface + fixture update Sibling-site fix: _evaluate_slash_authorization was the fourth _is_allowed_user caller and didn't pass guild/is_dm through, so slash interactions would take the DM branch regardless of whether they came from a guild channel. Now reads interaction.guild + in_dm and forwards. Also updates test_discord_slash_auth fixture (_make_interaction) so the SimpleNamespace guild mock has a get_member(uid)->None method — required by the new guild-scoped fallback path in _is_allowed_user. Tests exercising positive role paths still work via user.roles. Three new regression tests in test_discord_roles_dm_scope: - Slash DM + role in mutual public guild → rejected - Slash in guild B + role only in guild A → rejected - Slash in guild B + role in guild B → allowed (positive control) 368 Discord tests pass. test_discord_free_channel_skips_auto_thread also fails on clean main (pre-existing, unrelated to this fix).	28 天前
test_discord_slash_commands.py	fix(gateway/discord): require allowlist auth on slash commands Slash commands (_run_simple_slash, _handle_thread_create_slash) bypassed every DISCORD_ALLOWED_* gate enforced by on_message. Any guild member could invoke /background (RCE via terminal), /restart, /model, /skill, etc. CVSS 9.8 Critical. - _evaluate_slash_authorization mirrors on_message gates (user, role, channel, ignored channel) with fail-closed semantics - _check_slash_authorization sends ephemeral reject + logs + admin alert - Auth gate runs before defer() so rejections are ephemeral - /skill autocomplete returns [] for unauthorized users (no catalog leak) - Component views (ExecApproval, SlashConfirm, UpdatePrompt, ModelPicker) now honor role allowlists via shared _component_check_auth helper - Optional DISCORD_HIDE_SLASH_COMMANDS defense-in-depth - Cross-platform admin alert (Telegram/Slack fallback) on unauthorized attempts Based on PR #18125 by @0xyg3n.	1 个月前
test_discord_system_messages.py	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	18 天前
test_discord_thread_persistence.py	fix(gateway): ensure deterministic thread eviction in helpers	30 天前
test_display_config.py	test(telegram): cover env-clamped helper + adaptive text-batch tiers - New tests/gateway/test_telegram_text_batch_perf.py: TestEnvFloatClamped — 7 tests covering default-when-unset, valid parse, garbage fallback, NaN rejection, Inf rejection, min-clamp, max-clamp. Asserts asyncio.sleep() always gets a finite number. TestAdaptiveTextBatchTiers — 4 tests covering the tier-constant invariants and the min(cap, tier_delay) composition rule. - tests/gateway/test_display_config.py: update assertions for Telegram's new tool_progress='new' default.	24 天前
test_dm_topics.py	fix: avoid Telegram group reply thread session splits	16 天前
test_document_cache.py	feat: add .zip document support and auto-mount cache dirs into remote backends (#4846) - Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram, Slack, Discord) cache uploaded zip files instead of rejecting them. - Add get_cache_directory_mounts() and iter_cache_files() to credential_files.py for host-side cache directory passthrough (documents, images, audio, screenshots). - Docker: bind-mount cache dirs read-only alongside credentials/skills. Changes are live (bind mount semantics). - Modal: mount cache files at sandbox creation + resync before each command via _sync_files() with mtime+size change detection. - Handles backward-compat with legacy dir names (document_cache, image_cache, audio_cache, browser_screenshots) via get_hermes_dir(). - Container paths always use the new cache/<subdir> layout regardless of host layout. This replaces the need for a dedicated extract_archive tool (PR #4819) — the agent can now use standard terminal commands (unzip, tar) on uploaded files inside remote containers. Closes: related to PR #4819 by kshitijk4poor	1 个月前
test_duplicate_reply_suppression.py	fix(gateway): prevent duplicate final send when only cosmetic edit failed When the stream consumer's got_done handler successfully delivers the final response content via _send_or_edit but the subsequent edit (e.g. cursor removal) fails, final_response_sent remains False even though the user has already received the final answer. The gateway's fallback send path then re-delivers the same content, causing the user to see the response twice on Telegram. Introduce a new _final_content_delivered flag on the stream consumer, set by the got_done handler when the final content has reached the user. The _run_agent suppression logic now treats this flag as an additional signal (alongside final_response_sent and response_previewed) that final delivery is already complete. This preserves the existing behavior for intermediate-text-only streams (where already_sent=True but no final content has been delivered) — those still receive the gateway's fallback send, matching the test expectation in test_partial_stream_output_does_not_set_already_sent. Adds TestFinalContentDeliveredSuppression with two cases covering both the suppression (content delivered + edit failed) and the non-suppression (intermediate text only) branches.	20 天前
test_email.py	fix(email): send IMAP ID extension to support 163/NetEase mailbox 163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe Login` unless the client first identifies itself via the RFC 2971 ID command after LOGIN. Without this, the email gateway logs in OK but then fails on the very first poll and the connection is torn down. Send the ID payload best-effort after both `imap.login()` sites (`EmailAdapter.connect` and `_fetch_new_messages`). Failures are swallowed at debug level so non-supporting IMAP servers (Gmail, Outlook, Fastmail, Yahoo, etc.) keep working unchanged. Closes #22271	25 天前
test_ephemeral_reply.py	feat(gateway): auto-delete slash-command system notices after TTL (#18266) Adds opt-in auto-deletion for slash-command reply messages like "New session started!", "Restarting gateway…", "Stopped.", and YOLO toggles. After the TTL elapses the gateway calls the adapter's delete_message; on platforms without a delete API (everything except Telegram today) the TTL is silently ignored and the message stays. Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful real-time, but system notices clutter the thread once the agent finishes. Implementation: - EphemeralReply(str) sentinel in gateway/platforms/base.py. Subclasses str so existing 'X' in response / response.startswith(...) checks in tests and call sites keep working unchanged; isinstance() still distinguishes it for the send path. - _process_message_background and both busy-session bypass paths (in base.py) call _unwrap_ephemeral() on the handler return, send the unwrapped text, and schedule a detached delete task when the TTL > 0 AND the adapter class overrides delete_message. - display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG. Handler can pass ttl_seconds explicitly to override. - Wrapped the highest-noise return sites: /new, /reset, /stop, /yolo on/off, /restart success + "already in progress". Draining notices and /help output left as plain strings — those are informational and users want to read them. Backward-compat: default TTL 0 → no scheduling, no behavior change for existing users. Platforms without delete_message silently no-op.	1 个月前
test_extract_local_files.py	feat(gateway): deliverable mode — ship artifacts as native uploads from any agent surface (#27813) The agent can now produce a chart, PDF, spreadsheet, or any other supported file type and have it land in Slack / Discord / Telegram / WhatsApp / etc. as a native attachment, just by mentioning the absolute path in its response. Same primitive works for kanban-worker completions: workers attach artifacts via kanban_complete(artifacts=[...]) and the gateway notifier uploads them alongside the completion message. Changes: - gateway/platforms/base.py: extract_local_files now covers PDFs, docx, spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives (zip/tar/gz), audio (mp3/wav/...), and html — not just images and video. Image/video extensions still embed inline; everything else routes to send_document via the existing dispatch partition in gateway/run.py. - tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains an explicit `artifacts` parameter. The handler stashes it in metadata.artifacts (for downstream workers) and the kernel promotes it onto the completed-event payload so the notifier can find it without a second SQL round-trip. - gateway/run.py: _kanban_notifier_watcher now calls a new helper _deliver_kanban_artifacts after sending the completion text. The helper reads payload.artifacts (preferred), falls back to scanning the payload summary and task.result with extract_local_files, then partitions images / videos / documents and uploads each via send_multiple_images / send_video / send_document. - website/docs/user-guide/features/deliverable-mode.md + sidebars.ts: user-facing docs page covering the extension list, the kanban artifacts pattern, and the MCP-for-connector-breadth recommendation. Tests: - tests/gateway/test_extract_local_files.py: 7 new test cases (documents, spreadsheets, presentations, audio, archives, html, chart-pdf canonical case). 44 passing, 0 regressions. - tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts arg shape (list / string / merge with existing metadata / type rejection). 17 passing. - tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full notifier → artifact-upload path and missing-file silent-skip. 12 passing. - E2E (real files, real kanban kernel, real BasePlatformAdapter): worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata + event payload land → notifier helper partitions correctly → send_multiple_images called once with the PNG, send_document called twice with PDF + CSV. What's NOT in this PR (deferred to follow-ups): - Ad-hoc "research this for two hours, ping the thread when done" slash command — covered today by kanban subscriptions; a dedicated slash command can ride a follow-up PR if needed. - Setup-wizard prompt for recommended MCP servers (Notion, GitHub, Linear, etc.) — docs page lists them; UI is a separate change. Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf (local doc, not shipped).	17 天前
test_fallback_eviction.py	fix: don't evict cached agent on failed runs — prevents MCP restart loop (#7539) * fix: circuit breaker stops CPU-burning restart loops on persistent errors When a gateway session hits a non-retryable error (e.g. invalid model ID → HTTP 400), the agent fails and returns. But if the session keeps receiving messages (or something periodically recreates agents), each attempt spawns a new AIAgent — reinitializing MCP server connections, burning CPU — only to hit the same 400 error again. On a 4-core server, this pegs an entire core per stuck session and accumulates 300+ minutes of CPU time over hours. Fix: add a per-session consecutive failure counter in the gateway runner. - Track consecutive non-retryable failures per session key - After 3 consecutive failures (_MAX_CONSECUTIVE_FAILURES), block further agent creation for that session and notify the user: '⚠️ This session has failed N times in a row with a non-retryable error. Use /reset to start a new session.' - Evict the cached agent when the circuit breaker engages to prevent stale state from accumulating - Reset the counter on successful agent runs - Clear the counter on /reset and /new so users can recover - Uses getattr() pattern so bare GatewayRunner instances (common in tests using object.__new__) don't crash Tests: - 8 new tests in test_circuit_breaker.py covering counter behavior, threshold, reset, session isolation, and bare-runner safety Addresses #7130. * Revert "fix: circuit breaker stops CPU-burning restart loops on persistent errors" This reverts commit d848ea7109d62a2fc4ba6da36fc4f0366b5ded94. * fix: don't evict cached agent on failed runs — prevents MCP restart loop When a run fails (e.g. invalid model ID → 400) and fallback activated, the gateway was evicting the cached agent to 'retry primary next time.' But evicting a failed agent forces a full AIAgent recreation on the next message — reinitializing MCP server connections, spawning stdio processes — only to hit the same 400 again. This created a CPU-burning loop (91%+ for hours, #7130). The fix: add `and not _run_failed` to the fallback-eviction check. Failed runs keep the cached agent. The next message reuses it (no MCP reinit), hits the same error, returns it to the user quickly. The user can /reset or /model to fix their config. Successful fallback runs still evict as before so the next message retries the primary model. Addresses #7130.	1 个月前
test_fast_command.py	fix(gateway): guard against None request_overrides in _build_api_kwargs	1 个月前
test_feishu.py	fix(feishu): keep topic replies in threads Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.	28 天前
test_feishu_approval_buttons.py	feat(feishu): add native update prompt cards	26 天前
test_feishu_bot_admission.py	test(ci): stabilize shared optional dependency baselines	21 天前
test_feishu_bot_auth_bypass.py	feat(feishu): operator-configurable bot admission and mention policy Add two operator-facing toggles for inbound Feishu admission, enabling bot-to-bot scenarios such as A2A orchestration and inter-bot notifications: FEISHU_ALLOW_BOTS=none\|mentions\|all (default: none) Accept messages from other bots. `mentions` requires the peer bot to @-mention Hermes; `all` admits every peer-bot message. FEISHU_REQUIRE_MENTION=true\|false (default: true) Whether group messages must @-mention the bot. Override per-chat via `group_rules.<chat_id>.require_mention` in config.yaml. Defaults preserve prior behavior. Self-echo protection is always on: when the bot's identity is unresolved (auto-detection failed and FEISHU_BOT_OPEN_ID unset), peer-bot messages are rejected fail-closed to avoid feedback loops. Admitted peer bots bypass the human-user allowlist (FEISHU_ALLOWED_USERS) to match existing Discord behavior; humans still need an explicit allowlist entry. yaml feishu.allow_bots is bridged to the env var so the adapter and gateway auth layer share one source of truth. Resolving peer-bot display names requires the application:bot.basic_info:read scope; without it, peers still route but appear as their open_id. Test: tests/gateway/test_feishu_bot_admission.py covers the admission pipeline, group-policy bot-bypass, hydration, and event-dispatch plumbing as a parametrized matrix. Change-Id: I363cccb578c2a5c8b8bf0f0a890c01c89909e256	1 个月前
test_feishu_comment.py	feat: add Feishu document comment intelligent reply with 3-tier access control - Full comment handler: parse drive.notice.comment_add_v1 events, build timeline, run agent, deliver reply with chunking support. - 5 tools: feishu_doc_read, feishu_drive_list_comments, feishu_drive_list_comment_replies, feishu_drive_reply_comment, feishu_drive_add_comment. - 3-tier access control rules (exact doc > wildcard "*" > top-level > defaults) with per-field fallback. Config via ~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload. - Self-reply filter using generalized self_open_id (supports future user-identity subscriptions). Receiver check: only process events where the bot is the @mentioned target. - Smart timeline selection, long text chunking, semantic text extraction, session sharing per document, wiki link resolution. Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9	1 个月前
test_feishu_comment_rules.py	feat: add Feishu document comment intelligent reply with 3-tier access control - Full comment handler: parse drive.notice.comment_add_v1 events, build timeline, run agent, deliver reply with chunking support. - 5 tools: feishu_doc_read, feishu_drive_list_comments, feishu_drive_list_comment_replies, feishu_drive_reply_comment, feishu_drive_add_comment. - 3-tier access control rules (exact doc > wildcard "*" > top-level > defaults) with per-field fallback. Config via ~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload. - Self-reply filter using generalized self_open_id (supports future user-identity subscriptions). Receiver check: only process events where the bot is the @mentioned target. - Smart timeline selection, long text chunking, semantic text extraction, session sharing per document, wiki link resolution. Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9	1 个月前
test_feishu_onboard.py	fix(gateway): use monotonic deadlines in QR onboarding flows	28 天前
test_fresh_reset_skill_injection.py	fix(gateway): re-inject topic-bound skill after /new or /reset reset_session() creates a fresh SessionEntry with created_at == updated_at, but get_or_create_session() bumps updated_at on the next inbound message, causing _is_new_session in _handle_message_with_agent to evaluate False. The topic/channel skill auto-load gate (group_topics, channel_skill_bindings) silently skips the first message after a manual reset. Add an is_fresh_reset flag on SessionEntry, set by reset_session() and consumed once by the message handler. Kept distinct from was_auto_reset because that flag also drives a 'session expired due to inactivity' user-facing notice and a context-note prepend — both wrong for an explicit /new or /reset. Persisted through to_dict/from_dict so the flag survives gateway restart between /reset and the next message. Fixes #6508 Co-authored-by: warabe1122 <45554392+warabe1122@users.noreply.github.com> Co-authored-by: willy-scr <187001140+willy-scr@users.noreply.github.com>	1 个月前
test_gateway_command_help.py	fix: sanitize Telegram help command mentions	1 个月前
test_gateway_inactivity_timeout.py	ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock (#28861) * ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock The full pytest suite reliably hangs at ~96% on origin/main, blowing through the 20-minute GHA job timeout on every CI push since yesterday. Individual tests complete in <30s — the deadlock builds up at session teardown after all tests run, when leaked threads and atexit handlers from thousands of tests interact and one of them lands in a futex-wait that never resolves. This PR is a stopgap that unblocks CI immediately + speeds up several slow tests we found while diagnosing. Changes - pyproject.toml: add pytest-timeout==2.4.0 to dev deps; bake --timeout=60 --timeout-method=thread into the default addopts. - scripts/run_tests.sh: re-add --timeout flags directly because the script wipes pyproject addopts with -o 'addopts='. - .github/workflows/tests.yml: explicit --timeout/--timeout-method on the CI pytest invocation for clarity. - gateway/run.py: in _run_agent, if the stream consumer was never created (e.g. non-streaming agent or test stub), cancel the stream_task immediately instead of waiting out the 5s wait_for timeout. ~5s saved per non-streaming gateway test run. - tests/run_agent/conftest.py: extend _fast_retry_backoff to patch agent.conversation_loop.jittered_backoff alongside run_agent.jittered_backoff. The retry loop was extracted into agent.conversation_loop which holds its own import — patching the run_agent reference alone left tests burning real wall-clock backoff seconds. - tests/run_agent/test_anthropic_error_handling.py tests/run_agent/test_run_agent.py (TestRetryExhaustion) tests/run_agent/test_fallback_model.py: same conversation_loop fix for per-test fixtures (defensive — the conftest covers them too). - tests/gateway/test_gateway_inactivity_timeout.py: trim run_duration 10.0 → 2.0 / 5.0 → 2.0 on three tests that wait the full SlowFakeAgent duration. Adjusted thresholds proportionally. - tests/gateway/test_api_server_runs.py: test_stop_interrupt_exception_does_not_crash trips the interrupted event in addition to raising, so the slow_run thread unblocks at teardown instead of waiting 10s. - tests/hermes_cli/test_update_gateway_restart.py: also patch time.monotonic in the autouse fixture. _wait_for_service_active loops on a wall-clock deadline; with sleep no-op'd the loop spun on real monotonic until 10s real-time per restart attempt (20s+ per test). - tests/tools/test_zombie_process_cleanup.py: cut runner._restart_drain_timeout 5.0 → 0.1 in test_gateway_stop_calls_close. Suite still hangs at 96% on full no-timeout runs; with these changes CI runs through to a real pass/fail signal. * chore(lock): regenerate uv.lock after adding pytest-timeout * ci: drop pytest-timeout 60 → 30s + bump GHA job 20 → 30 min Prior commit's timeout=60 was too generous — CI test job still hit the 20-min wall-clock cap with the suite hung at 96% (orphan agent-browser subprocesses blocking pytest session teardown). The local timeout=20 run completed in 6:17, so 30s is conservative enough to let real tests finish but aggressive enough to short-circuit deadlocks. Also bump GHA job timeout to 30 min as a safety margin. * test: delete 11 pre-existing failing tests + revert monotonic patch The previous PR commit landed pytest-timeout=30s and the suite now completes in 18:14 instead of hanging at 96%, but 11 pre-existing tests fail with real assertions. Per Teknium: nuke them. Deleted (no replacements): - tests/gateway/test_restart_resume_pending.py::test_clean_drain_does_not_mark_resume_pending - tests/gateway/test_restart_resume_pending.py::test_drain_timeout_only_marks_still_running_sessions - tests/hermes_cli/test_gateway_service.py::TestGatewaySystemServiceRouting::test_gateway_install_passes_system_flags - tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages::test_install_wsl_with_systemd_warns - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_detects_launchd_and_skips_manual_restart_message - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways - tests/tools/test_file_operations.py::TestGitBaselineCheck::* (6 tests, entire class — _check_git_baseline helper doesn't exist) Also reverted my time.monotonic autouse-fixture hack in test_update_gateway_restart.py — it was causing worker crashes in CI by poisoning later tests in the same xdist worker. The two slow tests in that file (~24s and ~20s) will go back to taking real time but should still finish under the 30s pytest-timeout. * test: delete more pre-existing CI failures After previous push 3 more tests failed on CI; cull them all. Removed: - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_without_launchd_shows_manual_restart - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateResetFailedBeforeRestart::test_reset_failed_also_runs_before_retry_restart - tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateResetFailedBeforeRestart::test_final_failure_message_tells_user_to_reset_failed - tests/run_agent/test_tool_call_args_sanitizer.py::test_marker_message_inserted_when_missing The 4 update_gateway_restart tests trigger `_wait_for_service_active` polling on a real wall-clock deadline that occasionally exceeds the 30s pytest-timeout cap and crashes xdist workers. The marker test has a pre-existing assertion mismatch. * test: nuke entire TestCmdUpdateLaunchdRestart class After surgical deletes of 4 tests this class keeps producing new worker-crashing tests. The pattern is consistent: any test in this class that triggers cmd_update's _wait_for_service_active polling spins on real wall-clock time and trips pytest-timeout's thread method, crashing the xdist worker. Just delete the whole class (285 lines, ~10 tests). These exercise macOS-only launchd behavior that's better tested on a real macOS runner than in linux xdist. * test: stub the 2 fallback_model tests that crash xdist workers on CI * test: delete test_anthropic_error_handling.py + test_fallback_model.py entirely These two files exercise the agent retry/fallback code paths and consistently crash xdist workers under pytest-timeout's thread method. Whack-a-mole-stubbing individual tests just surfaces the next ones. Nuke both files. * test: delete tests/hermes_cli/test_update_gateway_restart.py entirely This file's cmd_update integration tests consistently crash xdist workers under pytest-timeout's thread method. Surgical deletes just surface the next set. Removing the whole file. * ci(tests): switch pytest-timeout method thread → signal Thread-method has been crashing xdist workers when it interrupts code that's not interruption-safe (retry loops, threading.Event waits, etc). Signal method uses SIGALRM which is interpreter-level and cleanly raises a Failed: Timeout exception in test code. Should stop the worker crash cascade — failures will surface as proper Timeout markers we can diagnose individually.	15 天前
test_gateway_shutdown.py	fix(gateway,cron): close ephemeral agents + reap stale aux clients (salvage #13979) (#16598) * fix: clean gateway auxiliary client caches on teardown * fix(gateway): recover from stale pid files and close cron agents Two issues were keeping the gateway from surviving long runs: 1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which refuses to unlink when the file's pid differs from our own. That safety check exists for the --replace atexit handoff, but it also applied to stale-record cleanup, so after a crashy exit the pid file was orphaned: `write_pid_file()`'s O_EXCL create then failed with `FileExistsError`, and systemd looped on "PID file race lost to another gateway instance". Unlink unconditionally from this helper since the caller has already verified the record is dead. 2. The cron scheduler never closed the ephemeral `AIAgent` it creates per tick, and never swept the process-global auxiliary-client cache. Over days of 10-minute ticks this leaked subprocesses and async httpx transports until the gateway hit EMFILE. Release the agent and call `cleanup_stale_async_clients()` in `run_job`'s outer `finally`, matching the gateway's own per-turn cleanup. * chore(release): map bloodcarter@gmail.com -> bloodcarter --------- Co-authored-by: bloodcarter <bloodcarter@gmail.com>	1 个月前
test_goal_max_turns_config.py	fix(gateway): honor configured goal turn budget	28 天前
test_goal_status_notice.py	fix(gateway): defer goal status notices until after response delivery Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.	27 天前
test_goal_verdict_send.py	revert: roll back /goal checklist + /subgoal feature stack (#23813) * Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)" This reverts commit a63a2b7c78562cd4eaf33f5f7db81ae0b3938552. * Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)" This reverts commit 4a080b1d5aa7528a679880c93147bc7fffdd267a. * Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)" This reverts commit 404640a2b752f502825dc8b26212204fa890d495.	24 天前
test_google_chat.py	test(gateway): accept trust_env in fake aiohttp ClientSession lambdas	17 天前
test_home_target_env_var.py	fix(gateway): preserve home-channel thread targets across restart notifications	1 个月前
test_homeassistant.py	test: remove 169 change-detector tests across 21 files (#11472) First pass of test-suite reduction to address flaky CI and bloat. Removed tests that fall into these change-detector patterns: 1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests that call inspect.getsource() on production modules and grep for string literals. Break on any refactor/rename even when behavior is correct. 2. Platform enum tautologies (every gateway/test_X.py): assertions like `Platform.X.value == 'x'` duplicated across ~9 adapter test files. 3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that only verify a key exists in a dict. Data-layout tests, not behavior. 4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing _fallback): tests that do parser.parse_args([...]) then assert args.field. Tests Python's argparse, not our code. 5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch cmd_X, call plugins_command with matching action, assert mock called. Tests the if/elif chain, not behavior. 6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests, test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests that mock the external API client, call our function, and assert exact kwargs. Break on refactor even when behavior is preserved. 7. Schedule-internal "function-was-called" tests (acp/test_server scheduling tests): tests that patch own helper method, then assert it was called. Kept behavioral tests throughout: error paths (pytest.raises), security tests (path traversal, SSRF, redaction), message alternation invariants, provider API format conversion, streaming logic, memory contract, real config load/merge tests. Net reduction: 169 tests removed. 38 empty classes cleaned up. Collected before: 12,522 tests Collected after: 12,353 tests	1 个月前
test_hooks.py	feat(gateway): expose plugin slash commands natively on all platforms + decision-capable command hook Plugin slash commands now surface as first-class commands in every gateway enumerator — Discord native slash picker, Telegram BotCommand menu, Slack /hermes subcommand map — without a separate per-platform plugin API. The existing 'command:<name>' gateway hook gains a decision protocol via HookRegistry.emit_collect(): handlers that return a dict with {'decision': 'deny'\|'handled'\|'rewrite'\|'allow'} can intercept slash command dispatch before core handling runs, unifying what would otherwise have been a parallel 'pre_gateway_command' hook surface. Changes: - gateway/hooks.py: add HookRegistry.emit_collect() that fires the same handler set as emit() but collects non-None return values. Backward compatible — fire-and-forget telemetry hooks still work via emit(). - hermes_cli/plugins.py: add optional 'args_hint' param to register_command() so plugins can opt into argument-aware native UI registration (Discord arg picker, future platforms). - hermes_cli/commands.py: add _iter_plugin_command_entries() helper and merge plugin commands into telegram_bot_commands() and slack_subcommand_map(). New is_gateway_known_command() recognizes both built-in and plugin commands so the gateway hook fires for either. - gateway/platforms/discord.py: extract _build_auto_slash_command helper from the COMMAND_REGISTRY auto-register loop and reuse it for plugin-registered commands. Built-in name conflicts are skipped. - gateway/run.py: before normal slash dispatch, call emit_collect on command:<canonical> and honor deny/handled/rewrite/allow decisions. Hook now fires for plugin commands too. - scripts/release.py: AUTHOR_MAP entry for @Magaav. - Tests: emit_collect semantics, plugin command surfacing per platform, decision protocol (deny/handled/rewrite/allow + non-dict tolerance), Discord plugin auto-registration + conflict skipping, is_gateway_known_command. Salvaged from #14131 (@Magaav). Original PR added a parallel 'pre_gateway_command' hook and a platform-keyed plugin command registry; this re-implementation reuses the existing 'command:<name>' hook and treats plugin commands as platform-agnostic so the same capability reaches Telegram and Slack without new API surface. Co-authored-by: Magaav <73175452+Magaav@users.noreply.github.com>	1 个月前
test_insights_unicode_flags.py	fix(model-switch): normalize Unicode dashes from Telegram/iOS input Telegram on iOS auto-converts double hyphens (--) to em dashes (—) or en dashes (–) via autocorrect. This breaks /model flag parsing since parse_model_flags() only recognizes literal '--provider' and '--global'. When the flag isn't parsed, the entire string (e.g. 'glm-5.1 —provider zai') gets treated as the model name and fails with 'Model names cannot contain spaces.' Fix: normalize Unicode dashes (U+2012-U+2015) to '--' when they appear before flag keywords (provider, global), before flag extraction. The existing test suite in test_model_switch_provider_routing.py already covers all four dash variants — this commit adds the code that makes them pass.	1 个月前
test_internal_event_bypass_pairing.py	test(conftest): reset module-level state + unset platform allowlists (#13400) Three fixes that close the remaining structural sources of CI flakes after PR #13363. ## 1. Per-test reset of module-level singletons and ContextVars Python modules are singletons per process, and pytest-xdist workers are long-lived. Module-level dicts/sets and ContextVars persist across tests on the same worker. A test that sets state in `tools.approval._session_approved` and doesn't explicitly clear it leaks that state to every subsequent test on the same worker. New `_reset_module_state` autouse fixture in `tests/conftest.py` clears: - tools.approval: _session_approved, _session_yolo, _permanent_approved, _pending, _gateway_queues, _gateway_notify_cbs, _approval_session_key - tools.interrupt: _interrupted_threads - gateway.session_context: 10 session/cron ContextVars (reset to _UNSET) - tools.env_passthrough: _allowed_env_vars_var (reset to empty set) - tools.credential_files: _registered_files_var (reset to empty dict) - tools.file_tools: _read_tracker, _file_ops_cache This was the single biggest remaining class of CI flakes. `test_command_guards::test_warn_session_approved` and `test_combined_cli_session_approves_both` were failing 12/15 recent main runs specifically because `_session_approved` carried approvals from a prior test's session into these tests' `"default"` session lookup. ## 2. Unset platform allowlist env vars in hermetic fixture `TELEGRAM_ALLOWED_USERS`, `DISCORD_ALLOWED_USERS`, and 20 other `_ALLOWED_USERS` / `_ALLOW_ALL_USERS` vars are now unset per-test in the same place credential env vars already are. These aren't credentials but they change gateway auth behavior; if set from any source (user shell, leaky test, CI env) they flake button-authorization tests. Fixes three `test_telegram_approval_buttons` tests that were failing across recent runs of the full gateway directory. ## 3. Two specific tests with module-level captured state - `test_signal::TestSignalPhoneRedaction`: `agent.redact._REDACT_ENABLED` is captured at module import from `HERMES_REDACT_SECRETS`, not read per-call. `monkeypatch.delenv` at test time is too late. Added `monkeypatch.setattr("agent.redact._REDACT_ENABLED", True)` per skill xdist-cross-test-pollution Pattern 5. - `test_internal_event_bypass_pairing::test_non_internal_event_without_user_triggers_pairing`: `gateway.pairing.PAIRING_DIR` is captured at module import from HERMES_HOME, so per-test HERMES_HOME redirection in conftest doesn't retroactively move it. Test now monkeypatches PAIRING_DIR directly to its tmp_path, preventing rate-limit state from prior xdist workers from letting the pairing send-call be suppressed. ## Validation - tests/tools/: 3494 pass (0 fail) including test_command_guards - tests/gateway/: 3504 pass (0 fail) across repeat runs - tests/agent/ + tests/hermes_cli/ + tests/run_agent/ + tests/tools/: 8371 pass, 37 skipped, 0 fail — full suite across directories No production code changed.	1 个月前
test_interrupt_key_match.py	test(gateway): cover photo burst interrupt regressions Add regression coverage for non-album Telegram photo burst batching, photo follow-ups that should queue without interrupting active runs, and the gateway priority-interrupt path for photo events.	2 个月前