文件最后提交记录最后更新时间
feat: preserve raw bytes when anonymization is a no-op When the anonymizer doesn't change a slice's text, the streamer used to push Buffer.from(out, "utf8") — which loses any invalid-UTF-8 bytes in the input (replaced by U+FFFD via StringDecoder). Files mistakenly classified as text (binary blobs without a known extension, text with stray non-UTF-8 bytes, BOMs) came out corrupted even though nothing in the term list matched. Track the raw chunk bytes alongside the decoded `pending`. On flush — where we have every byte buffered — emit the original buffer directly when the output equals the input, so a pure passthrough is bit-exact. In the streaming OVERLAP path, do the same when the decode for that slice round-trips losslessly; fall back to encoded output otherwise (unchanged from before for that case). Also add the "missing_content" locale entry for the /api/anonymize-preview route. 25 天前
Improve error handling 23 天前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
repo change + daily stat improvements 18 天前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
Standardize error responses with consistent format and human-readable messages (#667)1 个月前
fix: include file path in cache ETag Without the path, two different files in the same repo (same sha, same anonymization options) shared an ETag. If a browser ever sent the cached ETag for one file while requesting another, the server would have returned 304 against the wrong cache entry. Fold the path into the ETag so each file has its own fingerprint. Follow-up to b3c1030 (#439). 26 天前
fix persistance bugs 23 天前
fix persistance bugs 23 天前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
improve binary file detection: content sniffing + jsonl support Files like .jsonl that mime-types doesn't know fell through to application/octet-stream and rendered as "Unsupported binary file" in the viewer. Replace istextorbinary with isbinaryfile for content-based detection, and use mime-types for name-based classification with a textual application/* allowlist. The streaming transformer now defers classification when the name is inconclusive and sniffs the first chunk before emitting "transform", so route.ts and AnonymizedFile.ts get a content-aware Content-Type. Whitelists .jsonl and .ndjson to short-circuit dataset files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 23 天前
fix: resolve eslint unused-var and useless-assignment warnings Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 23 天前
Replace isomorphic-dompurify with sanitize-html for Node 21 compat (#663)1 个月前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
fix test 23 天前
Fix streamer crash and misclassified transient GitHub errors Add missing error handler on the anonymizer transform stream in the streamer route — without it, an upstream error tears down the pipe and the anonymizer emits an unhandled error that crashes the process (surfacing as ECONNRESET to the main server). Classify transient network errors (ReadError, ECONNRESET, ETIMEDOUT) as upstream_error/502 instead of file_not_found/404 so they are distinguishable in logs and don't cache-poison downstream. Update handleError tests to match the existing sanitization behavior that returns internal_error for non-AnonymousError instances. 22 天前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前
Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669)1 个月前