文件最后提交记录最后更新时间
test(core): session fixture invariant tests (benchmark automation Stage 1) Commits two clean hermes session jsonls to tests/fixtures/: - session_p0_sprint_clean.jsonl (2026-04-22 21-11-34, 7 turns, full post-P0 behavior: #5 auto-stop nudge fired, multi-step task completed, markdown summary, no continuation placeholders) - session_404_recovery.jsonl (2026-04-22 20-12-44, 4 turns, #4 path ranking + #14b read_file preferred-over-bash-cat exercised) New tests/session_fixture_invariants_test.rs runs 6 assertions per fixture when present: - no (continuing...) / (completed) / Continue. placeholders (would indicate continuation recovery mechanism was re-introduced) - no "summarize and stop instead of continuing" directive (old #5 nudge wording that caused weak models to skip user-requested steps) - no sed -i / perl -pi / awk -i inplace shell-workaround tool calls (P0 #2 anti-bypass regression check) - every bash ToolResult output carries exit: N / killed: marker (P0 #3 exit-code-in-marker regression check) - meta: collector sees Assistant content (catches jsonl schema drift that would silently make all other asserts trivially pass) Explicit limits — does NOT catch regressions whose fixture is never refreshed, and does NOT run a real replay harness. Stage 2 (a real ReplayProvider + minimal AgentLoop test harness driving recorded responses back through the framework) is tracked as P1 #14d in project_095x_roadmap.md, ~1.5 day of infra work — deferred until regression evidence warrants the investment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 1 个月前
test(core): session fixture invariant tests (benchmark automation Stage 1) Commits two clean hermes session jsonls to tests/fixtures/: - session_p0_sprint_clean.jsonl (2026-04-22 21-11-34, 7 turns, full post-P0 behavior: #5 auto-stop nudge fired, multi-step task completed, markdown summary, no continuation placeholders) - session_404_recovery.jsonl (2026-04-22 20-12-44, 4 turns, #4 path ranking + #14b read_file preferred-over-bash-cat exercised) New tests/session_fixture_invariants_test.rs runs 6 assertions per fixture when present: - no (continuing...) / (completed) / Continue. placeholders (would indicate continuation recovery mechanism was re-introduced) - no "summarize and stop instead of continuing" directive (old #5 nudge wording that caused weak models to skip user-requested steps) - no sed -i / perl -pi / awk -i inplace shell-workaround tool calls (P0 #2 anti-bypass regression check) - every bash ToolResult output carries exit: N / killed: marker (P0 #3 exit-code-in-marker regression check) - meta: collector sees Assistant content (catches jsonl schema drift that would silently make all other asserts trivially pass) Explicit limits — does NOT catch regressions whose fixture is never refreshed, and does NOT run a real replay harness. Stage 2 (a real ReplayProvider + minimal AgentLoop test harness driving recorded responses back through the framework) is tracked as P1 #14d in project_095x_roadmap.md, ~1.5 day of infra work — deferred until regression evidence warrants the investment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 1 个月前