kalosm/interfaces/kalosm-sample/src/structured_parser · fl/kalosm - AtomGit

GGitHubAdd caching for wasm file downloads and limit the hypothetical chunker length (#417 )

文件	最后提交记录	最后更新时间
arc_linked_list.rs	fix parsing signs and optimize separated parser	1 年前
float.rs	Fix clippy lints (#381)	11 个月前
index.rs	Remote chat, remote structured generation models, and single file gguf chat model loading (#319) * add chat template support and remove the VectorSpace trait * move sampling and chat templates to kalosm llama * update kalosm-llama unstructured generation to the new interface * restore structured generation module * Restore llama implementation of structured generation * clean up kalosm-llama clippy lints * restore llama chat and structured chat implementation * improve infer chat example * add support for remote chat models * support constraints for openai remote models * load the tokenizer from the gguf file if a huggingface tokenizer is not present * Fix tokenizer conversion * restore chat struct * Fix chat implementation with llama * remove tokio from language model * Create chat and text completion extension traits * add task helper to the chat extension trait * update kalosm-language to new task interface * make llama callable * add with_constraints method to task * fix task example * update examples to new chat and task api * set tools to none to fix llama chat template * Add helpers for the default parser for a specific type and model combo * simplify constrained rust type example * restore prompt annealing * fix structured example * document text completion model * document new chat api * update task documentation * Fix tokenizer gguf * fix custom llama source example * fix remaining tests * add logging to remote examples * Clippy fixes * More clippy fixes * use function call in docs more constantly * fix remaining doc tests	1 年前
integer.rs	Optimize fusor (#393) * rename fusor core * create fusor error type * try into for gguf value * remove candle * create index select kernel * refactor quantized implementation * add dequanitze kernel template * add where cond * fuse dequantize and visit tiled ops * fix hello example * Automatically spawn polling thread * llama port compiles * matmul almost working * fuzzing small matrixes passes * fix fuzz matmul test * rope test passing * fix metadata loading * Fix shape calculation for index select * fix qmatmul shape * Fix shape calculation for mat mul * remove some logging * fix compute graph deadlock * batched matrix multiplication * fix cache * fix attention mask * building the compute graph works * fix graphvis * remove recursion from resolve * handle > 3 dimensions in map tiled * make timing info optional * stable wgpu * fx hash * Fix passes after nodes are garbage collected * remove log * add a extra_assertions flag * add a sleep to the device poll loop * Fix more merging bugs * fix cycles * model runs without panicking * fix qmatmul output buffer size * add type assertions * Fix rectangular qmatmul * tokens generating * fix softmax test * fix index select on large arrays * fix rms norm * matmul failing * use fused multiply add in the matmul kernel * add strides to matmul * handle non-contiguous tensors in qmatmul * Fix attention mechanism. It works!!! * fix timing queries * more graphviz fixes * fuse matmul kernels * clippy fix * Fix kernel fusion * refactor mid level representation * just remove queries * more benchmarks * more candle benchmarks * create Operation trait * fix tests * add device to the workgroup size constraints function * implement operation for reduce * fix some lints * implement operation for resize * remove check_bounds_contiguous * add index select to MIR * clean up formatting * remove log * remove more logs * bench larger inputs * add dequantize to MIR * clean up * MIR for qmatmul * add matmul to MIR * fix formatting * queue all operations before running anything * fix tests * fix cached tensors * simplify dependencies * fuse multiple unrelated kernels * rename values * fix merging * linearize size for reduce * remove logs * move output into a separate method * tests passing * merge adjacent non-related kernels without synchronization * remove visit * disable merging and bump wgpu * use pipeline cache * cache most compilation steps * Fix bench dependencies * move the caches to the device * fix tensor partialeq * memory coalescing in visit_tiled * remove log * More consistent performance * re-enable non-conflicting merges * add kernel name for debugging * cargo update * faster builds for infer example * double tokens per second * only materialize every other layer * more detailed pair wise name * fix pairwise bench * add a many dimensional pairwise benchmark * skip empty dimensions in tiled map * scale tile size down as the rank scales up * materialize every layer * add support for custom operations * better round up method * faster reduce kernel * unroll reduction in softmax kernel * vectorized softmax load * add a separate case for large softmax * implement the same special case for reduce * custom operations are sync * don't repeat dequantize * label everything * cache dequantize rms norm * where cond custom opt * bench larger qmatmul * split out sgemv variant * initial attempt at sgemv * match braces * fix dispatching * tests passing * use subgroupAdd function * clean up imports * faster sgemv * slightly faster sgemv * simd sgemv * add unrolled dequantize variants * 70% faster sgemv * split chunk size and vector size * implement vectorized sgemm * add more qmatmul benchmarks * skip second sum pass if this is a single subgroup * pull out dequantize_vec4_block * test and fix vec4 dequant * more optimized q6k dequantize * use the same pattern for unrolled * specialized vec4 q6k dequant * add more qmatmul fuzzing tests * longer fuzzing * fix dequantize q6k * Fix fuzz_de_quantize vec4 test * specialized dequantize_vec4_block q4k implementation * restore multi-operation fusion * more flexable sgemv kernel * interleaved blocks * fix sgemv * ignore tokenizer.json * slightly cleaner q6k dequantize * specialized q6k sgemv kernel * remove log * add a link to the llama.cpp kernel * make q6k work with multiple rows at once * make preloading optional in q6k gemv * specialized q4k gemv implementation * first value correct * simplify scale calculation * fix q4k * remove log * slightly faster * specialized q_n gemv * add q5_0 * fix llama.cpp link * cache downloads for tests * cache qmatmul bench file * faster q_n kernels * add specialized q8_0 gemv kernel * double dispatch size * bump dependencies and move closer to wasm compat * fix compilation * disable zero initialization * same configuration for tests * slightly faster Q4k * explicit vectorization * unroll loops * fix kalosm llama * fix dispatch size * faster q8_0 gmv * refactor matmul impl * vectorized sgemm multiply * revert changes to kalosm-llama * undo kalosm-language cargo.toml changes * restore ocr changes * fix formatting * fix tokenizers * clippy fix * fix dependencies * fix clippy and formatting * fix formatting * fix clippy * fix tests	9 个月前
literal.rs	fix CI tests	2 年前
map.rs	Fix the required next tokens for repeat parsers	1 年前
mod.rs	Add gemma 270m (#397) * add gemma 270m * remove comment * fix clippy * more clippy fixes	8 个月前
one_line.rs	Remote chat, remote structured generation models, and single file gguf chat model loading (#319) * add chat template support and remove the VectorSpace trait * move sampling and chat templates to kalosm llama * update kalosm-llama unstructured generation to the new interface * restore structured generation module * Restore llama implementation of structured generation * clean up kalosm-llama clippy lints * restore llama chat and structured chat implementation * improve infer chat example * add support for remote chat models * support constraints for openai remote models * load the tokenizer from the gguf file if a huggingface tokenizer is not present * Fix tokenizer conversion * restore chat struct * Fix chat implementation with llama * remove tokio from language model * Create chat and text completion extension traits * add task helper to the chat extension trait * update kalosm-language to new task interface * make llama callable * add with_constraints method to task * fix task example * update examples to new chat and task api * set tools to none to fix llama chat template * Add helpers for the default parser for a specific type and model combo * simplify constrained rust type example * restore prompt annealing * fix structured example * document text completion model * document new chat api * update task documentation * Fix tokenizer gguf * fix custom llama source example * fix remaining tests * add logging to remote examples * Clippy fixes * More clippy fixes * use function call in docs more constantly * fix remaining doc tests	1 年前
or.rs	fix required next parsing for Or combinator	1 年前
parse.rs	Fix booleans in derived parsers (#343)	1 年前
regex.rs	Fix regex constraints	1 年前
repeat.rs	fix parsing signs and optimize separated parser	1 年前
schema.rs	Fix clippy lints (#381)	11 个月前
sentence.rs	simplify parse	1 年前
separated.rs	fix parsing signs and optimize separated parser	1 年前
stop_on.rs	Add caching for wasm file downloads and limit the hypothetical chunker length (#417) * cache downloads in wasm * more type coercion * add a length limit to the hypothetical chunker * a bit of cleanup * fix formatting * remove logs * more resilient progress updates * fix caching progress * fix clippy	4 个月前
string.rs	fix parsing signs and optimize separated parser	1 年前
then.rs	implement prompt healing	1 年前
word.rs	simplify parse	1 年前