kalosm/fusor-ml/cpu · fl/kalosm - AtomGit

GGitHubUnified conformance test library (#434 )

文件	最后提交记录	最后更新时间
benches	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
src	Unified conformance test library (#434) * Extract fusor conformance library * Include mismatch position in conformance errors * Remove remaining crate-local fusor tests * Add missing conformance coverage * Broaden conformance test coverage * Use variable-size conformance fuzz shapes * fix formatting * fix tests * larger fuzz ranges * more tests * better cpu parity * fix clippy * fix formatting * fix clippy * refactor: replace clippy #[allow] suppressions with real fixes Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx, AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for the conformance comparator return type, and rewrite needless_range_loop sites in tests/common/mod.rs to use iter_mut().enumerate(). * chore: ignore .claude scheduler artifacts * fix(conformance): skip f16 tests on GPU adapters without SHADER_F16 Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so the f16 shader fails to validate. Filter the device list per test rather than removing GPU coverage entirely — Mac Metal still runs the f16 path. * fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh near their domain edges. 1e-4 covers the observed gap without masking real regressions; the macOS Metal adapter still passes comfortably. * fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a different fuzz seed. lavapipe asin precision is limited near the asymptotes where the derivative blows up. 1e-3 covers observed drift; algorithmic regressions would diverge by orders of magnitude. * fix transcribe example * fix(conformance): stabilize CI edge cases * fix(ci): avoid brittle benchmark formatter * fix(conformance): cover Windows WARP tanh drift * fix tanh * more ci fixes * more software gpu backend fixes * looser bounds for trig * relative tolerance * more relative comparisons * passing on warp	28 天前
Cargo.toml	Floneum cpu (#424) * fix the summary chunker * start * wip * some progress * abox * start simding * add some operations * pull out ResolveTensor * refactor * more refactoring * fill in missing ops * remove ResolvedTensorMut * add benchmarks * faster * matmul op * reduce operations * refactor into multiple files * fuse ops * more ops * clean up comments * remove unsafe * remove more unsafe * initial fusor crate * refactoring fusor cpu * tensor type * move more ops to tensor * more refactoring * more refactoring * clean up some unused code * remove most of fusor * slice assign op * quantized cpu * optimize qmatmul * wip * new types crate * use layout type in cpu * pull out rank * as_slice for gpuor * better add impl * partially borrowed ops * refactor pairwise * more gpuor ops * more ops * batched matmul * implement some composite ops * reduce ops * normalization ops * reshape ops * move most of the logic onto layout itself * move sliding window into the layout type * a bunch more composite ops * more methods * more ops * rename qmatmul * batched cpu qmatmul * quantized support for gpuor * rename to be closer to fusor-core * start porting rwhisper to cpu * wip whisper * optimize a bit * more optimization * more optimization * rename eval * more lazy * remove some logs * move all matches to dispatch * more dispatch * fix add_ and other ops * use layout instead * get rid of ResolveTensor * remove useless Expr methods * fix fusor * shape on cpu tensor * as ref tensor * remove Expr trait * remove expr from fusor * owned and copy cpu tensors * fix fusor/fusor * start migrating fusor * fix fusor * rwhisper compiles * fix reshape * fix cpu map layout * remove debug * remove submodule * simd qmatmul * nightly optimizations * switch llama to fusor * remove const generic from QMatrix * don't use rayon for parallelization * MapLayout is currently a concrete tensor * simd gather * less matching in slow softmax * wip * remove SimdComparisonOp * fix fusor * lazy maplayout * fix maplayout * rwhisper compiles * fix rwhisper * wip * llama running! * fix vision on cpu * wip q5k * start gpu impl * fix out of memory * q5k_sgemv * fix formatting * fix some warnings * remove some dead code * refactor from array * Use rustversion to conditionally enable nightly NEON intrinsics The vdotq_s32 intrinsic requires the nightly-only stdarch_neon_dotprod feature. This change uses rustversion to detect nightly builds and only enables the optimized NEON code path when building with nightly Rust. On stable Rust, the scalar fallback implementation is used instead. * fix formatting * fix nightly formatting * import SimdElement trait for x86 gather operations * fix unsafe blocks for Rust 2024 edition * fix clippy warnings in test modules * fix tensor shape handling in conv1d and layer_norm loading * skip qwen download in CI * fix gelu in warp	3 个月前