GGitHubOptimize fusor (#393 )

81cfc0ae创建于 2025年7月28日历史提交

文件	最后提交记录	最后更新时间
src	Improve the documentation for the entry point of each crate (#255)	1 年前
Cargo.toml	Optimize fusor (#393) * rename fusor core * create fusor error type * try into for gguf value * remove candle * create index select kernel * refactor quantized implementation * add dequanitze kernel template * add where cond * fuse dequantize and visit tiled ops * fix hello example * Automatically spawn polling thread * llama port compiles * matmul almost working * fuzzing small matrixes passes * fix fuzz matmul test * rope test passing * fix metadata loading * Fix shape calculation for index select * fix qmatmul shape * Fix shape calculation for mat mul * remove some logging * fix compute graph deadlock * batched matrix multiplication * fix cache * fix attention mask * building the compute graph works * fix graphvis * remove recursion from resolve * handle > 3 dimensions in map tiled * make timing info optional * stable wgpu * fx hash * Fix passes after nodes are garbage collected * remove log * add a extra_assertions flag * add a sleep to the device poll loop * Fix more merging bugs * fix cycles * model runs without panicking * fix qmatmul output buffer size * add type assertions * Fix rectangular qmatmul * tokens generating * fix softmax test * fix index select on large arrays * fix rms norm * matmul failing * use fused multiply add in the matmul kernel * add strides to matmul * handle non-contiguous tensors in qmatmul * Fix attention mechanism. It works!!! * fix timing queries * more graphviz fixes * fuse matmul kernels * clippy fix * Fix kernel fusion * refactor mid level representation * just remove queries * more benchmarks * more candle benchmarks * create Operation trait * fix tests * add device to the workgroup size constraints function * implement operation for reduce * fix some lints * implement operation for resize * remove check_bounds_contiguous * add index select to MIR * clean up formatting * remove log * remove more logs * bench larger inputs * add dequantize to MIR * clean up * MIR for qmatmul * add matmul to MIR * fix formatting * queue all operations before running anything * fix tests * fix cached tensors * simplify dependencies * fuse multiple unrelated kernels * rename values * fix merging * linearize size for reduce * remove logs * move output into a separate method * tests passing * merge adjacent non-related kernels without synchronization * remove visit * disable merging and bump wgpu * use pipeline cache * cache most compilation steps * Fix bench dependencies * move the caches to the device * fix tensor partialeq * memory coalescing in visit_tiled * remove log * More consistent performance * re-enable non-conflicting merges * add kernel name for debugging * cargo update * faster builds for infer example * double tokens per second * only materialize every other layer * more detailed pair wise name * fix pairwise bench * add a many dimensional pairwise benchmark * skip empty dimensions in tiled map * scale tile size down as the rank scales up * materialize every layer * add support for custom operations * better round up method * faster reduce kernel * unroll reduction in softmax kernel * vectorized softmax load * add a separate case for large softmax * implement the same special case for reduce * custom operations are sync * don't repeat dequantize * label everything * cache dequantize rms norm * where cond custom opt * bench larger qmatmul * split out sgemv variant * initial attempt at sgemv * match braces * fix dispatching * tests passing * use subgroupAdd function * clean up imports * faster sgemv * slightly faster sgemv * simd sgemv * add unrolled dequantize variants * 70% faster sgemv * split chunk size and vector size * implement vectorized sgemm * add more qmatmul benchmarks * skip second sum pass if this is a single subgroup * pull out dequantize_vec4_block * test and fix vec4 dequant * more optimized q6k dequantize * use the same pattern for unrolled * specialized vec4 q6k dequant * add more qmatmul fuzzing tests * longer fuzzing * fix dequantize q6k * Fix fuzz_de_quantize vec4 test * specialized dequantize_vec4_block q4k implementation * restore multi-operation fusion * more flexable sgemv kernel * interleaved blocks * fix sgemv * ignore tokenizer.json * slightly cleaner q6k dequantize * specialized q6k sgemv kernel * remove log * add a link to the llama.cpp kernel * make q6k work with multiple rows at once * make preloading optional in q6k gemv * specialized q4k gemv implementation * first value correct * simplify scale calculation * fix q4k * remove log * slightly faster * specialized q_n gemv * add q5_0 * fix llama.cpp link * cache downloads for tests * cache qmatmul bench file * faster q_n kernels * add specialized q8_0 gemv kernel * double dispatch size * bump dependencies and move closer to wasm compat * fix compilation * disable zero initialization * same configuration for tests * slightly faster Q4k * explicit vectorization * unroll loops * fix kalosm llama * fix dispatch size * faster q8_0 gmv * refactor matmul impl * vectorized sgemm multiply * revert changes to kalosm-llama * undo kalosm-language cargo.toml changes * restore ocr changes * fix formatting * fix tokenizers * clippy fix * fix dependencies * fix clippy and formatting * fix formatting * fix clippy * fix tests	9 个月前
README.md	Remote chat, remote structured generation models, and single file gguf chat model loading (#319) * add chat template support and remove the VectorSpace trait * move sampling and chat templates to kalosm llama * update kalosm-llama unstructured generation to the new interface * restore structured generation module * Restore llama implementation of structured generation * clean up kalosm-llama clippy lints * restore llama chat and structured chat implementation * improve infer chat example * add support for remote chat models * support constraints for openai remote models * load the tokenizer from the gguf file if a huggingface tokenizer is not present * Fix tokenizer conversion * restore chat struct * Fix chat implementation with llama * remove tokio from language model * Create chat and text completion extension traits * add task helper to the chat extension trait * update kalosm-language to new task interface * make llama callable * add with_constraints method to task * fix task example * update examples to new chat and task api * set tools to none to fix llama chat template * Add helpers for the default parser for a specific type and model combo * simplify constrained rust type example * restore prompt annealing * fix structured example * document text completion model * document new chat api * update task documentation * Fix tokenizer gguf * fix custom llama source example * fix remaining tests * add logging to remote examples * Clippy fixes * More clippy fixes * use function call in docs more constantly * fix remaining doc tests	1 年前

Kalosm Vision

Kalosm Vision is a collection of image models and utilities for the Kalosm framework. It includes utilities for generating images from text and segmenting images into objects.

Image Generation

You can use the [Wuerstchen] model to generate images from text:

use futures_util::StreamExt;
use kalosm_vision::{Wuerstchen, WuerstchenInferenceSettings};

#[tokio::main]
async fn main() {
    let model = Wuerstchen::builder().build().await.unwrap();
    let settings = WuerstchenInferenceSettings::new(
        "a cute cat with a hat in a room covered with fur with incredible detail",
    );

    let mut images = model.run(settings);
    while let Some(image) = images.next().await {
        if let Some(buf) = image.generated_image() {
            buf.save(&format!("{}.png", image.sample_num())).unwrap();
        }
    }
}

Image Segmentation

Kalosm supports image segmentation with the [SegmentAnything] model. You can use the [SegmentAnything::segment_everything] method to segment an image into objects or the [SegmentAnything::segment_from_points] method to segment an image into objects at specific points:

use kalosm::vision::*;

let model = SegmentAnything::builder().build().unwrap();
let image = image::open("examples/landscape.jpg").unwrap();
let x = image.width() / 2;
let y = image.height() / 4;
let images = model
    .segment_from_points(
        SegmentAnythingInferenceSettings::new(image)
            .add_goal_point(x, y),
    )
    .unwrap();

images.save("out.png").unwrap();