atomcode/crates/atomcode-core/tests/fixtures · AtomCode/atomcode - AtomGit

bangxutest(core): session fixture invariant tests (benchmark automation Stage 1)

文件	最后提交记录	最后更新时间
session_404_recovery.jsonl	test(core): session fixture invariant tests (benchmark automation Stage 1) Commits two clean hermes session jsonls to tests/fixtures/: - session_p0_sprint_clean.jsonl (2026-04-22 21-11-34, 7 turns, full post-P0 behavior: #5 auto-stop nudge fired, multi-step task completed, markdown summary, no continuation placeholders) - session_404_recovery.jsonl (2026-04-22 20-12-44, 4 turns, #4 path ranking + #14b read_file preferred-over-bash-cat exercised) New tests/session_fixture_invariants_test.rs runs 6 assertions per fixture when present: - no `(continuing...)` / `(completed)` / `Continue.` placeholders (would indicate continuation recovery mechanism was re-introduced) - no "summarize and stop instead of continuing" directive (old #5 nudge wording that caused weak models to skip user-requested steps) - no `sed -i` / `perl -pi` / `awk -i inplace` shell-workaround tool calls (P0 #2 anti-bypass regression check) - every `bash` ToolResult output carries `exit: N` / `killed:` marker (P0 #3 exit-code-in-marker regression check) - meta: collector sees Assistant content (catches jsonl schema drift that would silently make all other asserts trivially pass) Explicit limits — does NOT catch regressions whose fixture is never refreshed, and does NOT run a real replay harness. Stage 2 (a real ReplayProvider + minimal AgentLoop test harness driving recorded responses back through the framework) is tracked as P1 #14d in project_095x_roadmap.md, ~1.5 day of infra work — deferred until regression evidence warrants the investment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	1 个月前
session_p0_sprint_clean.jsonl	test(core): session fixture invariant tests (benchmark automation Stage 1) Commits two clean hermes session jsonls to tests/fixtures/: - session_p0_sprint_clean.jsonl (2026-04-22 21-11-34, 7 turns, full post-P0 behavior: #5 auto-stop nudge fired, multi-step task completed, markdown summary, no continuation placeholders) - session_404_recovery.jsonl (2026-04-22 20-12-44, 4 turns, #4 path ranking + #14b read_file preferred-over-bash-cat exercised) New tests/session_fixture_invariants_test.rs runs 6 assertions per fixture when present: - no `(continuing...)` / `(completed)` / `Continue.` placeholders (would indicate continuation recovery mechanism was re-introduced) - no "summarize and stop instead of continuing" directive (old #5 nudge wording that caused weak models to skip user-requested steps) - no `sed -i` / `perl -pi` / `awk -i inplace` shell-workaround tool calls (P0 #2 anti-bypass regression check) - every `bash` ToolResult output carries `exit: N` / `killed:` marker (P0 #3 exit-code-in-marker regression check) - meta: collector sees Assistant content (catches jsonl schema drift that would silently make all other asserts trivially pass) Explicit limits — does NOT catch regressions whose fixture is never refreshed, and does NOT run a real replay harness. Stage 2 (a real ReplayProvider + minimal AgentLoop test harness driving recorded responses back through the framework) is tracked as P1 #14d in project_095x_roadmap.md, ~1.5 day of infra work — deferred until regression evidence warrants the investment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	1 个月前