DeepSeek-Reasonix/benchmarks/spike-tdd-kernel · runningW/DeepSeek-Reasonix - AtomGit

GGitHubfix(jobs): close stop() race condition; drop useless \$ escapes (#288 )

文件	最后提交记录	最后更新时间
bench-latency.mjs	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
cost-results.json	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
cost-results.md	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
cost.mjs	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
latency.json	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
latency.md	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
tdd-eval.json	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
tdd-eval.md	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
tdd-eval.mjs	fix(jobs): close stop() race condition; drop useless \$ escapes (#288) * fix(benchmark): drop useless \$ escapes in tdd-eval prompt string The h1 task description string used `\${file}::\${fullName}` inside a regular `"..."` literal. The backslashes are no-ops (JS quietly drops unrecognized escapes in regular strings) but trip CodeQL's js/useless-regexp-character-escape rule. Identical resulting string, cleaner source. * fix(jobs): wait for actual close event after SIGKILL, not a fixed timer stop() had two timing races. Both manifested as the same flake — the returned snapshot's `running` flag could still be true even though the test (and the user) just told us to stop the job. 1. SIGKILL phase used a fixed 800ms timer, then returned regardless of whether the OS had reaped the process tree. Under Windows scheduler load, taskkill /T /F on a 3-level tree (npm → node → vite) can take over a second to propagate before Node's `close` event fires. 2. SIGTERM phase awaited `readyPromise`, which is dual-purpose: it fires on either a startup ready-signal regex match OR child exit. If the job had already matched a ready signal, `readyPromise` was already resolved, so the SIGTERM grace race short-circuited immediately and we'd jump to SIGKILL with zero pause. Adds `closedPromise` — fires only on close/error, never on ready signal — and uses it for both the SIGTERM grace race and the post- SIGKILL wait, with a 5s ceiling on the latter so a wedged kernel can't hang us indefinitely.	23 天前
test-id-spec.md	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前
work-estimate.md	docs(rfc-25): land spike artifacts referenced from RFC #25 The four feasibility experiments for the kernel-level red-green TDD invariant (RFC #25) produced reports, raw data, and reproducer scripts. The RFC body and its summary comment cite paths under benchmarks/spike-tdd-kernel/; without these files the references are dead links. Contents: - latency.md / latency.json / bench-latency.mjs (Exp 4) - test-id-spec.md (Exp 2) - tdd-eval.md / tdd-eval.json / tdd-eval.mjs (Exp 3) - cost-results.md / cost-results.json / cost.mjs (Exp 1) - work-estimate.md (3-stage plan) All reports include their own pass thresholds, decisions, and RFC-text-change implications. Reproducer scripts read the project .env and run against deepseek-v4-flash / deepseek-chat; total cost was ~$0.001 for the original spike pass. Local-only tracking-issue-draft.md is intentionally excluded — it gets posted as a real GitHub issue once the FCP closes.	28 天前