文件最后提交记录最后更新时间
Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp29 天前
Get whisper running in wasm and enable support for gpus without subgroups or f16 (#406) * make rwhisper wasm compatable * move around traits to minimize whisper dependencies * fix transcription task * fix import FutureWasmNotSend * more wasm fixes * fix sgemv * reduce with or without subgroups * fix subgroup reduction * softmax without subgroups * tiled map without subgroups * quantized without subgroups * all tests passing without subgroups * fix sgemv dispatch * restore subgroupless CI workflows * only require f16 support for quantization support * don't require f16 support for qmatmul * fix kalosm model-types dev dependancies * fix kalosm-model-types tests * fix softmax bounds without subgroups * better required limits * fix clippy * more clippy fixes * shorter odyssey test * more clippy fixes * fix solve when workgroups are not supported * exclude fusor core - these tests pass locally, but are too slow to run in CI * exclude inference tests on windows * exclude kalosm-learning on windows which doesn't have f16 support6 个月前
Implement whisper with fusor (#405) * sliding window view * implement conv and pool * add zeros function * more layers * faster cache * remove candle dependency from whisper * Port loading logic * start conversion * more progress converting * add casting from u32 tensors * add variance and mean functions * Initial whisper implementation working * use rustfft instead of manually computing the fft * re-use some allocations * fix bert * clean up some warnings * fix max seq len * delete floneum * clean up remains of unquantized variant * fix whisper * faster final layer whisper * use sgemv for small n values * more matmul bench shapes * larger n limit * smarter resize lowering * limit the buffer re-use cache * allow larger allocations * fix embedding tests * add a bunch more sources * fix formatting * fix clippy * fix cargo check * restore transcribe file * fix doc examples6 个月前
Refactor compute graph (#409) * switch whisper to f16 activations * fix casts before reduce * fix f16 reduce and softmax * start moving to petgraph * refactor compute graph * refactor visitors * move cache and reference count onto the nodes * fix initial reference count * fix check_life * fix rwhisper * add shared var builder * add into methods for gguf * fix formatting * fix formatting * fix clippy * fix vision * fix doc tests5 个月前
reorganize packages 2 年前
Add support for distil-large-v3 and quantized-distil-large-v3 2 年前
Floneum cpu (#424) * fix the summary chunker * start * wip * some progress * abox * start simding * add some operations * pull out ResolveTensor * refactor * more refactoring * fill in missing ops * remove ResolvedTensorMut * add benchmarks * faster * matmul op * reduce operations * refactor into multiple files * fuse ops * more ops * clean up comments * remove unsafe * remove more unsafe * initial fusor crate * refactoring fusor cpu * tensor type * move more ops to tensor * more refactoring * more refactoring * clean up some unused code * remove most of fusor * slice assign op * quantized cpu * optimize qmatmul * wip * new types crate * use layout type in cpu * pull out rank * as_slice for gpuor * better add impl * partially borrowed ops * refactor pairwise * more gpuor ops * more ops * batched matmul * implement some composite ops * reduce ops * normalization ops * reshape ops * move most of the logic onto layout itself * move sliding window into the layout type * a bunch more composite ops * more methods * more ops * rename qmatmul * batched cpu qmatmul * quantized support for gpuor * rename to be closer to fusor-core * start porting rwhisper to cpu * wip whisper * optimize a bit * more optimization * more optimization * rename eval * more lazy * remove some logs * move all matches to dispatch * more dispatch * fix add_ and other ops * use layout instead * get rid of ResolveTensor * remove useless Expr methods * fix fusor * shape on cpu tensor * as ref tensor * remove Expr trait * remove expr from fusor * owned and copy cpu tensors * fix fusor/fusor * start migrating fusor * fix fusor * rwhisper compiles * fix reshape * fix cpu map layout * remove debug * remove submodule * simd qmatmul * nightly optimizations * switch llama to fusor * remove const generic from QMatrix * don't use rayon for parallelization * MapLayout is currently a concrete tensor * simd gather * less matching in slow softmax * wip * remove SimdComparisonOp * fix fusor * lazy maplayout * fix maplayout * rwhisper compiles * fix rwhisper * wip * llama running! * fix vision on cpu * wip q5k * start gpu impl * fix out of memory * q5k_sgemv * fix formatting * fix some warnings * remove some dead code * refactor from array * Use rustversion to conditionally enable nightly NEON intrinsics The vdotq_s32 intrinsic requires the nightly-only stdarch_neon_dotprod feature. This change uses rustversion to detect nightly builds and only enables the optimized NEON code path when building with nightly Rust. On stable Rust, the scalar fallback implementation is used instead. * fix formatting * fix nightly formatting * import SimdElement trait for x86 gather operations * fix unsafe blocks for Rust 2024 edition * fix clippy warnings in test modules * fix tensor shape handling in conv1d and layer_norm loading * skip qwen download in CI * fix gelu in warp3 个月前
Make kalosm-sound usable in wasm (#407) * make kalosm-sound usable in wasm * smaller default tiny en model * fix cargo check * forward input features * fix examples6 个月前