feat(computer-use): cua-driver backend, universal any-model schema
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.
Replaces the Anthropic-native computer_20251124 approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.
- tools/computer_use/ package — swappable ComputerUseBackend ABC +
CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal computer_use tool with one schema for all providers.
Actions: capture (som/vision/ax), click, double_click, right_click,
middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (_multimodal=True, OpenAI-style
content: [text, image_url] parts) that flows through
handle_function_call into the tool message. Anthropic adapter converts
into native tool_result image blocks; OpenAI-compatible providers
get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
recent screenshots carry real image data; older ones become text
placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
instead of its base64 char length (~1MB would have registered as
~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- hermes tools post-setup installs cua-driver via the upstream script
and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
(curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
force delete, lock screen, log out).
- Skill apple/macos-computer-use/SKILL.md — universal (model-agnostic)
workflow guide.
- Docs: user-guide/features/computer-use.md plus reference catalog
entries.
44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.
469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.
- model_tools.py provider-gating: the tool is available to every
provider. Providers without multi-part tool message support will see
text-only tool results (graceful degradation via text_summary).
- Anthropic server-side clear_tool_uses_20250919 — deferred;
client-side eviction + compressor pruning cover the same cost ceiling
without a beta header.
- macOS only. cua-driver uses private SkyLight SPIs
(SLEventPostToPid, SLPSPostEventRecordTo,
_AXObserverAddNotificationAndCheckRemote) that can break on any macOS
update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
prints the Settings path.
Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
feat(sessions): opt-in per-session JSON snapshot writer
PR #29182 deleted the per-session JSON snapshot writer outright because
state.db is canonical and the snapshots had no in-tree consumer. Some
users have external tooling that reads ~/.hermes/sessions/session_{sid}.json
directly, so reintroduce the writer behind a config flag that defaults
to off.
- Add sessions.write_json_snapshots (default False) to DEFAULT_CONFIG
- Restore AIAgent._save_session_log + _clean_session_content as
gated methods. When the flag is off the call is a fast no-op; when
on, the writer behaves as before (atomic write, truncation guard
preserved, REASONING_SCRATCHPAD → think tag normalization)
- Re-derive the target path from agent.session_id on each call so
/branch and /compress re-points happen automatically — no need
to restore the explicit re-point bookkeeping at call sites
- Wire the single call site in _persist_session (the cleanup-on-exit
hook). Did NOT restore the 7 intra-turn calls the original PR deleted
— those were redundant writes within the same turn that doubled disk
I/O without adding any persistence guarantee _persist_session does
not already provide
- Read the flag once at agent init via load_config(), cache as
agent._session_json_enabled
- Update TestNoSessionJsonSnapshot → TestSessionJsonSnapshotOptIn
to pin behavior: default off (no file), opt-in true (file written),
no-op method on default agents, logs_dir retained unconditionally
- Update CONTRIBUTING.md and the bundled hermes-agent skill to
document the flag and its default
feat(kanban): configure worktree paths and branches
Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so
tasks with workspace_kind='worktree' can pin a target branch on
create. Schema migration added to _migrate_add_optional_columns.
- Task.branch_name field + DB column + migration
- create_task accepts branch_name kwarg
- hermes kanban create --branch <name> flag
- kanban show output includes 'Branch: <name>' when set
Cherry-picked the substantive commit (a7558cf27); the PR's tip was
an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list
and show-output conflicts alongside main's session_id and
max_runtime_seconds additions; kept all three.
feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap (#3934)
* feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap
Map active skills to Telegram's slash command menu so users can
discover and invoke skills directly. Three changes:
1. Telegram menu now includes active skill commands alongside built-in
commands, capped at 100 entries (Telegram Bot API limit). Overflow
commands remain callable but hidden from the picker. Logged at
startup when cap is hit.
2. New /commands [page] gateway command for paginated browsing of all
commands + skills. /help now shows first 10 skill commands and
points to /commands for the full list.
3. When a user types a slash command that matches a disabled or
uninstalled skill, they get actionable guidance:
- Disabled: 'Enable it with: hermes skills config'
- Optional (not installed): 'Install with: hermes skills install official/<path>'
Built on ideas from PR #3921 by @kshitijk4poor.
* chore: move 21 niche skills to optional-skills
Move specialized/niche skills from built-in (skills/) to optional
(optional-skills/) to reduce the default skill count. Users can
install them with: hermes skills install official/<category>/<name>
Moved skills (21):
- mlops: accelerate, chroma, faiss, flash-attention,
hermes-atropos-environments, huggingface-tokenizers, instructor,
lambda-labs, llava, nemo-curator, pinecone, pytorch-lightning,
qdrant, saelens, simpo, slime, tensorrt-llm, torchtitan
- research: domain-intel, duckduckgo-search
- devops: inference-sh cli
Built-in skills: 96 → 75
Optional skills: 22 → 43
* fix: only include repo built-in skills in Telegram menu, not user-installed
User-installed skills (from hub or manually added) stay accessible via
/skills and by typing the command directly, but don't get registered
in the Telegram slash command picker. Only skills whose SKILL.md is
under the repo's skills/ directory are included in the menu.
This keeps the Telegram menu focused on the curated built-in set while
user-installed skills remain discoverable through /skills and /commands.
chore(skills): move heavy training skills + outlines to optional-skills (#22912)
These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't
be active by default. Moved to optional-skills/ where users opt-in via
hermes skills install official/....
Moved:
- mlops/training/axolotl
- mlops/training/trl-fine-tuning
- mlops/training/unsloth
- mlops/inference/outlines
Counts: 91 -> 87 built-in, 72 -> 76 optional.
Auto-regenerated docs (per-skill pages + catalogs) reflect the move.
chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355)
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical x in (...) → x in {...} fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.
All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.
Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- ruff check clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
tests/tools/: 18187 passed, 59 pre-existing failures (verified against
origin/main with the same shape — identical failure count, identical
category — all xdist test-order flakes unrelated to this change)
Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).