| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
feat(tokenizer): expose tokenizerSource and wire Helm charts - Rename EPP tokenizer plugin JSON key to tokenizerSource for source/identity split - Render tokenizerSource in all routing profile Helm branches - Document the option in values and example profiles - Replace prediction sidecar noqa ANN001 with explicit type annotations Co-authored-by: lileqi<lileqi@huawei.com> | 10 天前 | |
feat(tokenizer): parallelize sidecar render paths with configurable worker pool The tokenizer sidecar runs one asyncio event loop. RenderCompletion and RenderChatCompletion awaited the vLLM render path inline, so nearly all per-request work (pydantic build, Jinja chat-template render, token post-processing) ran on the event loop and serialized concurrent requests. Under load (8 concurrency, 16k/32k prompts) this blew past the GIE DataProducer 400ms deadline, while only the direct Tokenize RPC was parallel. Add a direct, thread-offloaded tokenization path for the plain-text / no-tools / no-multimodal hot case, mirroring the existing Tokenize path: - render_completion -> provider.encode inside asyncio.to_thread - render_chat_completion -> provider.apply_chat_template inside asyncio.to_thread Complex requests (tools, multimodal, token-id prompts, truncation) still fall back to the vLLM render path. The direct path is token-identical to vLLM render for plain text, verified across short and long inputs. Parallelism comes from the HF fast tokenizer (Rust) releasing the GIL during encode; measured ~3-4x throughput scaling at concurrency 8. On a 4-core box, 32k chat p99 drops from ~203ms (serial) to ~60ms. Make the worker pool configurable and self-tuning: - add --thread-pool-size CLI flag (precedence: CLI > TOKENIZER_THREAD_POOL_SIZE env > default min(cpu*2,32)); the pool backs asyncio.to_thread - Helm: default the tokenizer sidecar to cpu 4 (request+limit) and auto-derive TOKENIZER_THREAD_POOL_SIZE from resources.limits.cpu (one worker per core, the measured sweet spot), overridable via threadPoolSize or extraEnv Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> | 10 天前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 10 天前 | ||
| 10 天前 |