Instant, controllable, local pre-trained AI models in Rust

GGitHubRename Floneum to Kalosm in README

文件	最后提交记录	最后更新时间
.cargo	remove winit linux code	2 年前
.github	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
fusor-ml	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
interfaces	Floneum qwen embed and rbert cpu support (#425) * qwen embed working * Add 3×4 outer-loop unrolled quantized matmul for m≥2 Process 3 LHS rows simultaneously so each weight block is loaded once and reused across all 3 rows, reducing memory traffic by ~3×. 2-4.4× speedup for prompt processing (m≥2); m=1 path unchanged. * Deduplicate activation quantization in parallel m=1 path Each thread now quantizes the activation row once instead of re-quantizing in every 32-column chunk. 14-23% speedup for m=1. * Add x86 AVX2 SIMD for BlockQ4_0 and BlockQ8_0 vec_dot Use _mm256_maddubs_epi16 with the sign trick (abs(x) * sign(y,x)) for signed i8×i8 dot products. Runtime AVX2 detection with scalar fallback for older x86_64 CPUs. * Add x86 AVX2 SIMD for BlockQ8_0 activation quantization Use AVX2 for the full quantize pipeline on x86_64: max-abs reduction, scale+round via _mm256_cvtps_epi32, and saturating pack i32→i16→i8. Runtime AVX2 detection with scalar fallback. On aarch64, the compiler auto-vectorizes the scalar code better than explicit NEON intrinsics (confirmed by benchmarks showing 7-11% regression with explicit NEON). * Remove unused process_row_integer_range function * cpu support for rbert * better parallelization * remove uninit unchecked * start refactoring * reduce unsafe * fix formatting * clean up conditional * more refactoring * a bit more cleanup * more formatting + clippy * fix clippy in fusion bench * fix flash attention * fix flash * fix tests * fix clippy * make device week	2 个月前
media	feat: add language selection support to transcription tasks and streams (#391) * feat: add language selection support to transcription tasks and streams * fix fmt issues	10 个月前
models	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
scripts	update publish script	1 年前
src	add additional bert presets	2 年前
.gitattributes	improve the readme	1 年前
.gitignore	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
Cargo.lock	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
Cargo.toml	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
LICENSE-APACHE	create book skafolding	2 年前
LICENSE-MIT	create book skafolding	2 年前
README.md	Rename Floneum to Kalosm in README Updated project name from 'Floneum' to 'Kalosm' and corrected minor text errors.	8 小时前
rustfmt.toml	Remote chat, remote structured generation models, and single file gguf chat model loading (#319) * add chat template support and remove the VectorSpace trait * move sampling and chat templates to kalosm llama * update kalosm-llama unstructured generation to the new interface * restore structured generation module * Restore llama implementation of structured generation * clean up kalosm-llama clippy lints * restore llama chat and structured chat implementation * improve infer chat example * add support for remote chat models * support constraints for openai remote models * load the tokenizer from the gguf file if a huggingface tokenizer is not present * Fix tokenizer conversion * restore chat struct * Fix chat implementation with llama * remove tokio from language model * Create chat and text completion extension traits * add task helper to the chat extension trait * update kalosm-language to new task interface * make llama callable * add with_constraints method to task * fix task example * update examples to new chat and task api * set tools to none to fix llama chat template * Add helpers for the default parser for a specific type and model combo * simplify constrained rust type example * restore prompt annealing * fix structured example * document text completion model * document new chat api * update task documentation * Fix tokenizer gguf * fix custom llama source example * fix remaining tests * add logging to remote examples * Clippy fixes * More clippy fixes * use function call in docs more constantly * fix remaining doc tests	1 年前

自动翻译

Kalosm

Kalosm 是一个 crate 生态系统，可轻松开发使用本地或远程 AI 模型的应用程序。此仓库中有三个主要项目：

Kalosm：Rust 中预训练模型的简易接口
Fusor：用于量化 ML 推理的运行时。Fusor 使用 WGPU 在任何加速器上原生运行模型，或在浏览器中运行

Kalosm

Kalosm 是 Rust 中预训练模型的简易接口。它使与预训练的语言、音频和图像模型交互变得简单。

模型支持

Kalosm 支持多种模型。以下是当前支持的模型列表：

模型	模态	大小	描述	已量化	CUDA + Metal 加速	示例
Llama	文本	1b-70b	通用语言模型	✅	✅	llama 3 聊天
Mistral	文本	7-13b	通用语言模型	✅	✅	mistral 聊天
Phi	文本	2b-4b	小型推理专用语言模型	✅	✅	phi 3 聊天
Whisper	音频	20MB-1GB	音频转录模型	✅	✅	实时 Whisper 转录
RWuerstchen	图像	5gb	图像生成模型	❌	✅	RWuerstchen 图像生成
TrOcr	图像	3gb	光学字符识别模型	❌	✅	文本识别
Segment Anything	图像	50MB-400MB	图像分割模型	❌	❌	图像分割
Bert	文本	100MB-1GB	文本嵌入模型	❌	✅	语义搜索

实用工具

Kalosm 还支持围绕预训练模型的多种实用工具。这些工具包括：

性能表现

Kalosm 使用 candle 机器学习库在纯 Rust 环境中运行模型。它支持量化和加速模型，性能与 llama.cpp 相当：

Mistral 7b

加速器	Kalosm	llama.cpp
Metal (M2)	39 t/s	27 t/s

结构化生成

Kalosm 支持使用任意解析器进行结构化生成。它采用自定义解析器引擎、采样器以及结构感知加速技术，使得结构化生成比无控制的文本生成速度更快。您可以获取任何 Rust 类型，并添加 #[derive(Parse, Schema)]，使其可用于结构化生成：

use kalosm::language::*;

/// A fictional character
#[derive(Parse, Schema, Clone, Debug)]
struct Character {
    /// The name of the character
    #[parse(pattern = "[A-Z][a-z]{2,10} [A-Z][a-z]{2,10}")]
    name: String,
    /// The age of the character
    #[parse(range = 1..=100)]
    age: u8,
    /// A description of the character
    #[parse(pattern = "[A-Za-z ]{40,200}")]
    description: String,
}

#[tokio::main]
async fn main() {
    // First create a model. Chat models tend to work best with structured generation
    let model = Llama::phi_3().await.unwrap();
    // Then create a task with the parser as constraints
    let task = model.task("You generate realistic JSON placeholders for characters")
        .typed();
    // Finally, run the task
    let mut stream = task(&"Create a list of random characters", &model);
    stream.to_std_out().await.unwrap();
    let characters: [Character; 10] = stream.await.unwrap();
    println!("{characters:?}");
}

https://github.com/user-attachments/assets/8900f57d-55c8-4d4a-a67b-73beab1e5155

除了正则表达式外，您还可以提供自定义语法来生成结构化数据。这使您能够将响应约束为任何所需的结构，包括 JSON、HTML 和 XML 等复杂数据结构。

Kalosm 快速入门！

本快速入门将帮助您搭建并运行一个简单的聊天机器人。让我们开始吧！

有关 Kalosm 的更完整指南，请访问 Kalosm 网站，示例可在 examples 文件夹中找到。

安装 rust
创建新项目：

cargo new kalosm-hello-world
cd ./kalosm-hello-world

将 Kalosm 添加为依赖项

# You can use `--features language,metal`, `--features language,cuda`, or `--features language,mkl` if your machine supports an accelerator
cargo add kalosm --features language
cargo add tokio --features full

将此代码添加到您的 main.rs 文件中

use kalosm::language::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
  let model = Llama::phi_3().await?;
  let mut chat = model.chat()
    .with_system_prompt("You are a pirate called Blackbeard");

  loop {
    chat(&prompt_input("\n> ")?)
      .to_std_out()
      .await?;
  }
}

使用以下命令运行应用程序：

cargo run --release

聊天机器人演示

Fusor

⚠️ Fusor 仍处于早期开发阶段，尚未准备好投入生产使用。在 0.5 版本中，Fusor 将作为 Kalosm 的后端，以支持 Web 和 AMD。

Fusor 是一个用于量化机器学习推理的 WGPU 运行时。Fusor 可与 gguf 文件格式配合使用，以加载量化模型。它旨在通过 WebGpu 支持多种不同的加速器，包括 Nvidia GPU、AMD GPU 和 Metal。大多数机器学习框架都包含手动优化的内核，这些内核可协同执行一系列操作。Fusor 使用内核融合编译器，将自定义操作链合并为优化的内核，而无需深入编写着色器代码。这会编译为单个内核：

fn exp_add_one(tensor: Tensor<2, f32>) -> Tensor<2, f32> {
  1. + (-tensor).exp()
}

社区

如果您对任一项目感兴趣，可加入 discord 参与项目讨论并获取帮助。

贡献

在我们的 issue tracker 上报告问题。
在 discord 中帮助其他用户
若您有意参与贡献，欢迎在 discord 上联系我们

项目介绍

本地人工智能工作流的图形编辑器【此简介由AI生成】

Apache-2.0 Rust 1.54 K提交数ai candle constrained-generation dioxus floneum-v3 kalosm llama llamacpp llm mistral rust transcription whisper

定制我的领域

README

262.19 K130访问 GitHub

下载使用量

项目总下载次数（含Clone、Pull、 zip 包及 release 下载），每日凌晨更新

发行版

kalosm-0.4最新版本

2025年2月10日发布

查看全部发行版

语言类型

Rust99.97%

Shell0.03%

kalosm:Instant, controllable, local pre-trained AI models in Rust

Kalosm

Kalosm

模型支持

实用工具

性能表现

结构化生成

Kalosm 快速入门！

Fusor

社区

贡献

项目介绍

下载使用量

语言类型

目录