Filesystem Scan Cache Architecture Contract
This document defines the current contract for the shared filesystem scan cache implemented in Rust (crates/pi-natives/src/fs_cache.rs) and consumed by native discovery/search APIs exposed to packages/coding-agent.
What this cache is
The cache stores full directory-scan entry lists (GlobMatch[]) keyed by scan scope, traversal policy, and requested metadata detail. Higher-level operations (glob filtering, fuzzyFind scoring, and cached grep candidate selection) run against those cached entries.
Primary goals:
- avoid repeated filesystem walks for repeated discovery/search calls
- keep consistency across native discovery/search flows when they share the same scan policy
- allow explicit staleness recovery for empty results and explicit invalidation after file mutations
Ownership and public surface
- Cache implementation and policy:
crates/pi-natives/src/fs_cache.rs - Native consumers:
crates/pi-natives/src/glob.rscrates/pi-natives/src/fd.rs(fuzzyFind)crates/pi-natives/src/grep.rs(cached directory mode only)
- JS binding/export:
packages/natives/native/index.d.ts(invalidateFsScanCache)packages/natives/native/index.js
- Coding-agent mutation invalidation helpers:
packages/coding-agent/src/tools/fs-cache-invalidation.ts
Cache key partitioning (hard contract)
Each entry is keyed by:
- canonicalized
rootdirectory path include_hiddenbooleanuse_gitignorebooleanskip_node_modulesbooleandetail(ScanDetail::MinimalorScanDetail::Full)
Implications:
- Hidden and non-hidden scans do not share entries.
- Gitignore-respecting and ignore-disabled scans do not share entries.
- Scans that prune
node_modulesdo not share entries with scans that include it. - Minimal scans (path + file type only) do not share entries with full scans (mtime + regular-file size metadata).
follow_linksis part ofScanOptionsused to build the walker, but is not currently part ofCacheKey; calls that differ only byfollow_linkscan share a cache entry.
Consumers must pass stable semantics for hidden/gitignore/node_modules/detail behavior; changing any keyed flag creates a different cache partition.
Scan collection behavior
Cache population uses ignore::WalkBuilder configured by include_hidden, use_gitignore, skip_node_modules, and follow_links:
- sorted by file path
.gitis always prunednode_modulesis pruned at traversal time whenskip_node_modules=true- cancellation is checked before the walk and every 128 visited entries per parallel visitor
ScanDetail::Minimalrecords normalized relative path and file type onlyScanDetail::Fullalso records mtime and regular-file size
Search roots for cache scans are resolved by fs_cache::resolve_search_path:
- relative paths are resolved against current cwd
- target must be an existing directory
- root is canonicalized when possible
Freshness and eviction policy
Global policy (environment-overridable):
FS_SCAN_CACHE_TTL_MS(default1000)FS_SCAN_EMPTY_RECHECK_MS(default200)FS_SCAN_CACHE_MAX_ENTRIES(default16)
Behavior:
get_or_scan(...)- if TTL is
0: bypass cache entirely, always fresh scan (cache_age_ms = 0) - on cache hit within TTL: return cloned cached entries + non-zero
cache_age_ms - on expired hit: evict key, rescan, store fresh entry
- if TTL is
force_rescan(..., store=false): remove any matching key, scan fresh, and do not repopulate cacheforce_rescan(..., store=true): remove any matching key, scan fresh, then store the new entry- max entry enforcement is oldest-first eviction by
created_atafter insert
Empty-result fast recheck (separate from normal hits)
Normal cache hit:
- a cache hit inside TTL returns cached entries and does nothing else.
Empty-result fast recheck:
- this is a caller-side policy using
ScanResult.cache_age_ms - if filtered/query result is empty and cached scan age is at least
empty_recheck_ms(), caller performs oneforce_rescan(..., store=true)and retries - intended to reduce stale-negative results when files were added while the cache is still inside TTL
Current consumers:
glob: rechecks when filtered matches are empty and scan age exceeds thresholdfuzzyFind(fd.rs): rechecks only when query is non-empty and scored matches are emptygrep: rechecks when cached directory candidate file list is empty
Consumer defaults and cache usage
Cache is opt-in on exposed scan/search APIs (cache?: boolean, default false).
Current defaults in native APIs:
glob:hidden=false,gitignore=true,cache=false;node_modulesis included only whenincludeNodeModules=trueor the pattern mentionsnode_modules; full detail is used only whensortByMtime=truefuzzyFind:hidden=false,gitignore=true,cache=false,node_modulesis skipped,follow_links=true, minimal detailgrep:hidden=true,gitignore=true,cache=false; cached directory mode skipsnode_modulesunless the glob mentionsnode_modules; minimal detail
Coding-agent callers today:
- High-volume mention candidate discovery enables cache:
packages/coding-agent/src/utils/file-mentions.ts
- Mutation flows invalidate through
packages/coding-agent/src/tools/fs-cache-invalidation.ts. - Tool-level search integration (
packages/coding-agent/src/tools/search.ts) currently calls nativegrepwithcache: false.
Invalidation contract
Native invalidation entrypoint:
invalidateFsScanCache(path?: string)- with
path: remove cache entries whose root is a prefix of the target path - without path: clear all scan cache entries
- with
Path handling details:
- relative invalidation paths are resolved against cwd
- invalidation attempts canonicalization
- if target does not exist (for example after delete), fallback canonicalizes the parent and reattaches the filename when possible
- this preserves invalidation behavior for create/delete/rename where one side may not exist
Coding-agent mutation flow responsibilities
Coding-agent code must invalidate after successful filesystem mutations.
Central helpers:
invalidateFsScanAfterWrite(path)invalidateFsScanAfterDelete(path)invalidateFsScanAfterRename(oldPath, newPath)(invalidates both sides when paths differ)
Current mutation callsites include:
packages/coding-agent/src/tools/write.tspackages/coding-agent/src/edit/hashline/filesystem.tspackages/coding-agent/src/edit/modes/patch.tspackages/coding-agent/src/edit/modes/replace.ts
Rule: if a flow mutates filesystem content or location and bypasses these helpers, cache staleness bugs are expected.
Adding a new cache consumer safely
When introducing cache use in a new scanner/search path:
-
Use stable scan policy inputs
- decide hidden/gitignore/node_modules/detail semantics first
- pass them consistently to
get_or_scan/force_rescanso cache partitions are intentional
-
Treat cache data as pre-filtered only by traversal policy
- apply tool-specific filtering (glob patterns, type filters, scoring) after retrieval
- never assume cached entries already reflect your higher-level filters
-
Implement empty-result fast recheck only for stale-negative risk
- use
scan.cache_age_ms >= empty_recheck_ms() - retry once with
force_rescan(..., store=true, ...) - keep this path separate from normal cache-hit logic
- use
-
Respect no-cache mode explicitly
- when caller disables cache, call
force_rescan(..., store=false, ...)or use an uncached streaming walker - do not populate shared cache in a no-cache request path
- when caller disables cache, call
-
Wire mutation invalidation for any new write path
- after successful write/edit/delete/rename, call the coding-agent invalidation helper
- for rename/move, invalidate both old and new paths
-
Do not add per-call TTL knobs
- current contract is global policy only (env-configured), no per-request TTL override
Known boundaries
- Cache scope is process-local in-memory (
DashMap), not persisted across process restarts. - Cache stores scan entries, not final tool results.
glob/fuzzyFind/cachedgrepshare scan entries only when key dimensions (root,hidden,gitignore,skip_node_modules,detail) match..gitis always excluded at scan collection time regardless of caller options.