Natives Text/Search Pipeline
This document maps the @oh-my-pi/pi-natives text/search/code surface from generated JS/TS exports to Rust N-API modules and back to JS result objects.
Terminology follows docs/natives-architecture.md:
- Generated binding: public API in
packages/natives/native/index.d.ts. - Rust module layer: N-API exports in
crates/pi-natives/src/*. - Shared scan cache:
fs_cache-backed directory-entry cache used by discovery/search flows.
Implementation files
packages/natives/native/index.d.tscrates/pi-natives/src/grep.rscrates/pi-natives/src/glob.rscrates/pi-natives/src/glob_util.rscrates/pi-natives/src/fs_cache.rscrates/pi-natives/src/fd.rscrates/pi-natives/src/ast.rscrates/pi-natives/src/text.rscrates/pi-natives/src/highlight.rscrates/pi-natives/src/tokens.rs
JS API ↔ Rust export mapping
| JS API | Rust export (#[napi], snake_case -> camelCase) |
Rust module |
|---|---|---|
grep(options, onMatch?) |
grep |
grep.rs |
search(content, options) |
search |
grep.rs |
hasMatch(content, pattern, ignoreCase?, multiline?) |
hasMatch |
grep.rs |
fuzzyFind(options) |
fuzzyFind |
fd.rs |
glob(options, onMatch?) |
glob |
glob.rs |
invalidateFsScanCache(path?) |
invalidateFsScanCache |
fs_cache.rs |
astGrep(options) |
astGrep |
ast.rs |
astEdit(options) |
astEdit |
ast.rs |
wrapTextWithAnsi(text, width, tabWidth) |
wrapTextWithAnsi |
text.rs |
truncateToWidth(text, maxWidth, ellipsis, pad, tabWidth) |
truncateToWidth |
text.rs |
sliceWithWidth(line, startCol, length, strict, tabWidth) |
sliceWithWidth |
text.rs |
extractSegments(line, beforeEnd, afterStart, afterLen, strictAfter, tabWidth) |
extractSegments |
text.rs |
visibleWidth(text, tabWidth) |
visibleWidth |
text.rs |
highlightCode(code, lang, colors) |
highlightCode |
highlight.rs |
supportsLanguage(lang) |
supportsLanguage |
highlight.rs |
getSupportedLanguages() |
getSupportedLanguages |
highlight.rs |
countTokens(input, encoding?) |
countTokens |
tokens.rs |
Pipeline overview by subsystem
1) Regex search (grep, search, hasMatch)
Input/options flow
- Callers invoke generated native exports directly; there is no package-local TS wrapper that renames
searchtosearchContent. - Rust option structs in
grep.rsdeserialize camelCase fields (ignoreCase,maxCount,contextBefore,contextAfter,maxColumns,timeoutMs). grepcreatesCancelTokenfromtimeoutMs+AbortSignaland runs insidetask::blocking("grep", ...).searchandhasMatchoperate on provided string/Uint8Arraycontent and do not scan the filesystem.
Execution branches
- In-memory branch
search->search_sync/ search helpers over provided content bytes.hasMatchcompiles/checks pattern against provided content and returns a boolean.- No filesystem scan, no
fs_cache.
- Single-file branch
grepresolves path, checks metadata is file, and searches that file.
- Directory branch
- Optional cache lookup via
fs_cache::get_or_scanwhencache: true. - Fresh scan via
fs_cache::force_rescanwhencache: false. - Optional empty-result recheck when cached results are older than the empty-result recheck threshold.
- Entry filtering: file-only + optional glob filter (
glob_util) + optional type filter mapping (js,ts,rust, etc.).
- Optional cache lookup via
Search/collection semantics
- Regex engine:
grep_regex::RegexMatcherBuilderwithignoreCaseandmultiline. - Context resolution:
contextBefore/contextAfteroverride legacycontext.- Non-content modes do not collect context.
- Output modes:
content-> oneGrepMatchper hit.countandfilesWithMatchesmap to count-style entries (lineNumber=0,line="",matchCountset).offsetandmaxCountare applied during aggregation across sorted file results.- Directory searches use parallel filesystem walking/searching, then aggregate per-file results to preserve global offset/limit semantics in the returned result and callback stream.
Result shaping back to JS
- Rust
SearchResult/GrepResultfields map to TS interfaces via N-API object conversion. - Counters are clamped before crossing N-API where needed.
GrepResult.limitReachedis optional and emitted when true.- Streaming callback receives each shaped
GrepMatchfor content or count-style entries.
Failure behavior
searchreturnsSearchResult.errorfor regex/search failures instead of throwing.greprejects on hard errors such as invalid path, invalid glob/regex, or cancellation timeout/abort.hasMatchreturns a boolean on success and throws on invalid pattern/UTF-8 conversion errors.- File open/search errors in multi-file scans are skipped per-file; scan continues.
Malformed regex handling
grep.rs sanitizes braces before regex compile:
- Invalid repetition-like braces are escaped (
{/}->\{/\}) when they cannot form{N},{N,},{N,M}. - This prevents common literal-template fragments (for example
${platform}) from failing as malformed repetition. - Remaining invalid regex syntax still returns a regex error.
2) File discovery (glob) and fuzzy path search (fuzzyFind)
glob and fuzzyFind share fs_cache scans; matching logic differs.
glob flow
- Caller passes
GlobOptionsdirectly.patternandpathare required in the generated type. - Rust resolves the search path and compiles pattern via
glob_util::compile_glob. - Entry source:
cache=true->get_or_scan+ optional stale-emptyforce_rescan.cache=false->force_rescan(..., store=false)(fresh only).
- Filtering:
- skip
.gitalways; - skip
node_modulesunless requested (includeNodeModules) or pattern mentionsnode_modules; - apply glob match;
- apply file-type filter; symlink
file/dirfilters resolve target metadata.
- skip
- Optional sort by mtime descending (
sortByMtime) before truncating tomaxResults.
fuzzyFind flow
- Rust implementation lives in
fd.rs; generated export isfuzzyFind. - Shared scan source from
fs_cachewith the same cache/no-cache split and stale-empty recheck policy. - Scoring:
- exact / starts-with / contains / subsequence-based fuzzy score;
- separator/punctuation-normalized scoring path;
- directory bonus and deterministic tie-break (
score desc, thenpath asc).
- Symlink entries are excluded from fuzzy results.
Failure behavior
- Invalid glob pattern returns an error from
glob_util::compile_glob. - Search root must resolve to an existing directory for directory discovery flows.
- Cancellation/timeouts propagate as abort errors via
CancelToken::heartbeat()checks in loops.
Malformed glob handling
glob_util::build_glob_pattern is tolerant:
- normalizes
\to/, - auto-prefixes simple recursive patterns with
**/whenrecursive=true, - auto-closes unbalanced
{...alternation groups before compile.
3) AST search/edit (astGrep, astEdit)
ast.rs exposes syntax-aware code search and rewrite operations.
astGrep(options)returns matches with byte/line/column coordinates and optional metavariable bindings.astEdit(options)returns replacement changes, per-file counts, searched/touched file counts, parse errors, and whether edits were applied.dryRundefaults to true for edit options in the generated documentation.- Options include language override, path/glob/selector, strictness, limits, parse-error policy,
signal, andtimeoutMs.
These exports are direct native APIs used by tooling; they are not mediated by a TS wrapper in packages/natives.
4) Shared scan/cache lifecycle (fs_cache)
fs_cache stores scan results as normalized relative entries (path, fileType, optional mtime and regular-file size) keyed by:
- canonical search root,
include_hidden,use_gitignore,skip_node_modules,- scan detail (
MinimalvsFull).
follow_links affects a fresh scan but is not currently part of the cache key.
Cache state transitions
- Miss / disabled
- TTL is
0or key absent/expired -> fresh collection.
- TTL is
- Hit
- Entry age is within TTL -> return cached entries +
cache_age_ms.
- Entry age is within TTL -> return cached entries +
- Stale-empty recheck
- If query yields zero matches and cache age exceeds the empty-result threshold, force one rescan.
- Invalidation
invalidateFsScanCache(path?):- no arg: clear all keys;
- path arg: remove keys for roots affected by that path.
Stale-result tradeoff
- Cache favors low-latency repeated scans over immediate consistency.
- TTL window can return stale positives/negatives.
- Empty-result recheck reduces stale negatives for older cached scans at the cost of one extra scan.
- Explicit invalidation is the intended correctness hook after file mutations.
5) ANSI text utilities (text)
These are pure, in-memory utilities.
Boundaries and responsibilities
text.rsowns terminal-cell semantics:- ANSI sequence parsing,
- grapheme-aware width and slicing,
- wrap/truncate/slice behavior,
- explicit tab-width parameter on width-sensitive APIs.
grep.rsline truncation (maxColumns) is separate:- simple character-boundary truncation of matched lines with
..., - not ANSI-state-preserving and not terminal-cell width aware.
- simple character-boundary truncation of matched lines with
Key behaviors
wrapTextWithAnsi: wraps by visible width, carries active SGR codes across wrapped lines.truncateToWidth: visible-cell truncation with ellipsis policy (Unicode,Ascii,Omit), optional right padding.sliceWithWidth: column slicing with optional strict width enforcement.extractSegments: extracts before/after segments around an overlay while restoring ANSI state for theaftersegment.sanitizeText(ANSI/control/surrogate stripping with line-ending normalization) no longer lives intext.rs; it moved to@oh-my-pi/pi-utilsas a pure-JS implementation inpackages/utils/src/sanitize-text.ts. The native binding was removed in the same change because the JS version was competitive on the benchmarked workloads, and keeping a Rust copy forced every caller (includingpi-utils) to pull in@oh-my-pi/pi-natives.visibleWidth: counts visible terminal cells using caller-supplied tab width.
Failure behavior
Text functions generally return deterministic transformed output; errors are limited to N-API argument/string conversion boundaries.
6) Syntax highlighting (highlight)
highlight.rs is pure transformation; it does not use the filesystem scan cache.
Flow
- Caller passes
code, optionallang, and ANSI color palette. - Rust resolves syntax by token/name lookup, extension lookup, alias table fallback, then plain-text fallback.
- Each line is parsed with syntect
ParseStateand scope stack. - Scopes map to semantic color categories and ANSI color codes are injected/reset.
Failure behavior
- Per-line parse failure does not fail the call: that line is appended unhighlighted and processing continues.
- Unknown/unsupported language falls back to plain text syntax.
7) Token counting (tokens)
countTokens(input, encoding?) is an in-memory utility.
inputmay be a single string or an array of strings.- Arrays return one aggregate count and are encoded in parallel in Rust.
- Default encoding is
O200kBase;Cl100kBaseis also available. - The implementation uses ordinary tokenization, not special-token handling.
Pure utility vs filesystem-dependent flows
| Flow | Filesystem access | Shared cache | Notes |
|---|---|---|---|
search / hasMatch |
No | No | regex on provided bytes/string only |
text module functions |
No | No | ANSI/width utilities only |
highlight module functions |
No | No | syntax + ANSI coloring only |
countTokens |
No | No | tokenization only |
astGrep / astEdit |
Yes | No | syntax-aware file search/edit |
glob |
Yes | Optional | directory scans + glob filtering |
fuzzyFind |
Yes | Optional | directory scans + fuzzy scoring |
grep (file/dir path) |
Yes | Optional in dir mode | ripgrep over files, optional filters/callback |
End-to-end lifecycle summary
- Caller invokes generated native export with typed options.
- Rust validates/normalizes options and builds matcher/search config.
- For filesystem flows, entries are scanned (cache hit/miss/rescan where applicable) then filtered/scored/searched.
- Worker loops periodically call cancel heartbeat; timeout/abort can terminate execution.
- Rust shapes outputs into N-API objects (
lineNumber,matchCount,limitReached, etc.). - Generated bindings return typed JS objects and optional per-match callbacks for
grep/glob.