Instant, controllable, local pre-trained AI models in Rust

分支37Tags5
文件最后提交记录最后更新时间
remove winit linux code 2 年前
Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp28 天前
Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp28 天前
Floneum qwen embed and rbert cpu support (#425) * qwen embed working * Add 3×4 outer-loop unrolled quantized matmul for m≥2 Process 3 LHS rows simultaneously so each weight block is loaded once and reused across all 3 rows, reducing memory traffic by ~3×. 2-4.4× speedup for prompt processing (m≥2); m=1 path unchanged. * Deduplicate activation quantization in parallel m=1 path Each thread now quantizes the activation row once instead of re-quantizing in every 32-column chunk. 14-23% speedup for m=1. * Add x86 AVX2 SIMD for BlockQ4_0 and BlockQ8_0 vec_dot Use _mm256_maddubs_epi16 with the sign trick (abs(x) * sign(y,x)) for signed i8×i8 dot products. Runtime AVX2 detection with scalar fallback for older x86_64 CPUs. * Add x86 AVX2 SIMD for BlockQ8_0 activation quantization Use AVX2 for the full quantize pipeline on x86_64: max-abs reduction, scale+round via _mm256_cvtps_epi32, and saturating pack i32→i16→i8. Runtime AVX2 detection with scalar fallback. On aarch64, the compiler auto-vectorizes the scalar code better than explicit NEON intrinsics (confirmed by benchmarks showing 7-11% regression with explicit NEON). * Remove unused process_row_integer_range function * cpu support for rbert * better parallelization * remove uninit unchecked * start refactoring * reduce unsafe * fix formatting * clean up conditional * more refactoring * a bit more cleanup * more formatting + clippy * fix clippy in fusion bench * fix flash attention * fix flash * fix tests * fix clippy * make device week2 个月前
feat: add language selection support to transcription tasks and streams (#391) * feat: add language selection support to transcription tasks and streams * fix fmt issues10 个月前
Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp28 天前
update publish script 1 年前
add additional bert presets 2 年前
improve the readme 1 年前
Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp28 天前
Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp28 天前
Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp28 天前
create book skafolding 2 年前
create book skafolding 2 年前
Rename Floneum to Kalosm in README Updated project name from 'Floneum' to 'Kalosm' and corrected minor text errors.2 小时前
Remote chat, remote structured generation models, and single file gguf chat model loading (#319) * add chat template support and remove the VectorSpace trait * move sampling and chat templates to kalosm llama * update kalosm-llama unstructured generation to the new interface * restore structured generation module * Restore llama implementation of structured generation * clean up kalosm-llama clippy lints * restore llama chat and structured chat implementation * improve infer chat example * add support for remote chat models * support constraints for openai remote models * load the tokenizer from the gguf file if a huggingface tokenizer is not present * Fix tokenizer conversion * restore chat struct * Fix chat implementation with llama * remove tokio from language model * Create chat and text completion extension traits * add task helper to the chat extension trait * update kalosm-language to new task interface * make llama callable * add with_constraints method to task * fix task example * update examples to new chat and task api * set tools to none to fix llama chat template * Add helpers for the default parser for a specific type and model combo * simplify constrained rust type example * restore prompt annealing * fix structured example * document text completion model * document new chat api * update task documentation * Fix tokenizer gguf * fix custom llama source example * fix remaining tests * add logging to remote examples * Clippy fixes * More clippy fixes * use function call in docs more constantly * fix remaining doc tests1 年前

Kalosm

Kalosm is an ecosystem of crates that make it easy to develop applications that use local or remote AI models. There are try main projects in this repo:

  • Kalosm: A simple interface for pre-trained models in rust
  • Fusor: A runtime for quantized ML inference. Fusor uses WGPU to run models on any accelerator natively or in the browser

Kalosm

Kalosm is a simple interface for pre-trained models in Rust. It makes it easy to interact with pre-trained, language, audio, and image models.

Model Support

Kalosm supports a variety of models. Here is a list of the models that are currently supported:

Model Modality Size Description Quantized CUDA + Metal Accelerated Example
Llama Text 1b-70b General purpose language model llama 3 chat
Mistral Text 7-13b General purpose language model mistral chat
Phi Text 2b-4b Small reasoning focused language model phi 3 chat
Whisper Audio 20MB-1GB Audio transcription model live whisper transcription
RWuerstchen Image 5gb Image generation model rwuerstchen image generation
TrOcr Image 3gb Optical character recognition model Text Recognition
Segment Anything Image 50MB-400MB Image segmentation model Image Segmentation
Bert Text 100MB-1GB Text embedding model Semantic Search

Utilities

Kalosm also supports a variety of utilities around pre-trained models. These include:

Performance

Kalosm uses the candle machine learning library to run models in pure rust. It supports quantized and accelerated models with performance on par with llama.cpp:

Mistral 7b

Accelerator Kalosm llama.cpp
Metal (M2) 39 t/s 27 t/s

Structured Generation

Kalosm supports structured generation with arbitrary parsers. It uses a custom parser engine and sampler and structure-aware acceleration to make structure generation even faster than uncontrolled text generation. You can take any rust type and add #[derive(Parse, Schema)] to make it usable with structured generation:

use kalosm::language::*;

/// A fictional character
#[derive(Parse, Schema, Clone, Debug)]
struct Character {
    /// The name of the character
    #[parse(pattern = "[A-Z][a-z]{2,10} [A-Z][a-z]{2,10}")]
    name: String,
    /// The age of the character
    #[parse(range = 1..=100)]
    age: u8,
    /// A description of the character
    #[parse(pattern = "[A-Za-z ]{40,200}")]
    description: String,
}

#[tokio::main]
async fn main() {
    // First create a model. Chat models tend to work best with structured generation
    let model = Llama::phi_3().await.unwrap();
    // Then create a task with the parser as constraints
    let task = model.task("You generate realistic JSON placeholders for characters")
        .typed();
    // Finally, run the task
    let mut stream = task(&"Create a list of random characters", &model);
    stream.to_std_out().await.unwrap();
    let characters: [Character; 10] = stream.await.unwrap();
    println!("{characters:?}");
}

https://github.com/user-attachments/assets/8900f57d-55c8-4d4a-a67b-73beab1e5155

In addition to regex, you can provide your own grammar to generate structured data. This lets you constrain the response to any structure you want including complex data structures like JSON, HTML, and XML.

Kalosm Quickstart!

This quickstart will get you up and running with a simple chatbot. Let's get started!

A more complete guide for Kalosm is available on the Kalosm website, and examples are available in the examples folder.

  1. Install rust
  2. Create a new project:
cargo new kalosm-hello-world
cd ./kalosm-hello-world
  1. Add Kalosm as a dependency
# You can use `--features language,metal`, `--features language,cuda`, or `--features language,mkl` if your machine supports an accelerator
cargo add kalosm --features language
cargo add tokio --features full
  1. Add this code to your main.rs file
use kalosm::language::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
  let model = Llama::phi_3().await?;
  let mut chat = model.chat()
    .with_system_prompt("You are a pirate called Blackbeard");

  loop {
    chat(&prompt_input("\n> ")?)
      .to_std_out()
      .await?;
  }
}
  1. Run your application with:
cargo run --release

chat bot demo

Fusor

⚠️ Fusor is still early in development and is not ready for production use. Fusor will serve as the backend for Kalosm in the 0.5 release to enable web and AMD support

Fusor is a WGPU runtime for quantized ML inference. Fusor works with the gguf file format to load quantized models. It targets uses WebGpu to target many different accelerators including Nvidia GPUs, AMD GPUs, and Metal. Most ML frameworks contain hand optimized kernels that perform a series of operations together. Fusor uses a kernel fusion compiler to make merge custom operation chains into an optimized kernel without dropping down to the shader code. This compiles to a single kernel:

fn exp_add_one(tensor: Tensor<2, f32>) -> Tensor<2, f32> {
  1. + (-tensor).exp()
}

Community

If you are interested in either project, you can join the discord to discuss the project and get help.

Contributing

  • Report issues on our issue tracker.
  • Help other users in the discord
  • If you are interested in contributing, feel free to reach out on discord

项目介绍

本地人工智能工作流的图形编辑器【此简介由AI生成】

定制我的领域
262.19 K130访问 GitHub

下载使用量

0

项目总下载次数(含Clone、Pull、 zip 包及 release 下载),每日凌晨更新

语言类型

Rust99.97%
Shell0.03%