Lexer Agent Guide

Use this file for work under lexer/ together with the repository-level AGENTS.md.

Core Metadata

Attribute Value
Name Lexer
Purpose Converts source text into a stream of tokens and provides a unified lexical interface for multiple language modes. Day-to-day changes apply only to ETS (ETSLexer and ETS token tables); TS/AS/JS paths are out of scope.
Primary Language C++

Change Frequency and Scope

  • Lexer is the foundation for Parser: it turns source into a token stream that Parser consumes. This directory is at the front of the pipeline; once its interface is stable, all later stages depend on it.
  • This directory is rarely modified. Most work happens in parser, varbinder, checker, and compiler/lowering. Lexer is touched only when adding or changing keywords, token kinds, or lexical rules; such changes require updating scripts/ and regenerating code, plus careful regression testing.
  • In-scope for changes: only ETSLexer and ETS-related token/keyword tables. TSLexer, ASLexer, and JS lexer paths are out of scope.

Directory Layout

lexer/
├── *.cpp, *.h           # Lexer core and per-language implementations; [in scope] ETSLexer; TSLexer/ASLexer rarely changed
├── token/               # Token, source location, numeric literal parsing
├── regexp/              # Regex literal lexing
├── scripts/             # Token and keyword tables (keywords.yaml, tokens.yaml, Ruby)
└── templates/           # Codegen templates (tokenType, keywords, token.inl, etc. .erb)

Responsibilities

  • Token kinds and keywords: Generated from YAML + Ruby under scripts/ into keywords*.cpp/h, tokenType, etc.; scripts are the single source of truth.
  • Multi-language lexing: ETSLexer (ArkTS/ETS) is the active target for changes; TSLexer, ASLexer, and shared token/regexp/number logic are rarely modified.
  • Source locations: token/sourceLocation records line/column and offsets for parser and diagnostics.

Dependencies

  • Used by: parser, util (diagnostics, options, paths).
  • Depends on: no other ets2panda front-end modules (only C++ stdlib and project infrastructure).

Extending or Modifying

  • New or changed token kind or keyword: Update scripts/tokens.yaml or scripts/keywords.yaml, run the corresponding Ruby scripts to regenerate .h/.cpp, and run regression (Parser and later stages depend on the token stream).
  • ETS lexing or keywords: Change ETS-related tables and ETSLexer; extend ETS entries in scripts/ if needed. Other language modes (TS/AS/JS) are out of scope; avoid changing this layer unless necessary.

Spec Alignment Rules

  • Token/keyword changes that affect language behavior must map to the latest technical-preview spec grammar.
  • If lexer changes imply parser grammar differences, keep parser/docs updates in the same patch.