| Add support for Qwen 2.5 Vision (#382)
* implement qwen vision embed and patch merger
* implement qwen vision block
* calculate the rope index of images and videos
* add get_window_index
* fix get window index
* unwrap less
* Create media source api
* integrate the new media support into the language model trait
* Create QwenVisionTransformer
* implement QwenVisionTransformer::forward
* fix formatting
* fix loading qwen 2.5 vl
* fix rot_pos_emb
* add image preprocessing utilities
* fix vision rope
* fix mask
* Fix feed forward
* qwen vision forward working
* unwrap less
* clean up
* create tensor tools cli
* fix cli
* fix fuse tokenizer
* move parse into its own module
* Use llama.cpp compatible tensor names
* add preset
* load qwen vision metadata from the gguf file
* fix loading the vision encoder
* test process image
* forward eps and add more tests
* fix image processing
* implement image chat templating
* full pipeline running
* fix formatting
* use 3d rope index
* fix dimension_sections decoding
* qwen vl rope working
* remove logs
* fix rope tests
* fix rope size
* fix rope index to tensor conversion
* Fix rope updates
* normalize image input
* match image resize behavior
* fix fullatt_block calculation
* vision model works
* remove logs
* add more qwen vl presets
* fix some clippy lints
* fix clippy
* Fix ToChatMessage
* expose image processing hints
* remove unwraps
* fix unwraps in tests
* fix more examples | 11 个月前 |
| Unified conformance test library (#434)
* Extract fusor conformance library
* Include mismatch position in conformance errors
* Remove remaining crate-local fusor tests
* Add missing conformance coverage
* Broaden conformance test coverage
* Use variable-size conformance fuzz shapes
* fix formatting
* fix tests
* larger fuzz ranges
* more tests
* better cpu parity
* fix clippy
* fix formatting
* fix clippy
* refactor: replace clippy #[allow] suppressions with real fixes
Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx,
AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for
the conformance comparator return type, and rewrite needless_range_loop sites
in tests/common/mod.rs to use iter_mut().enumerate().
* chore: ignore .claude scheduler artifacts
* fix(conformance): skip f16 tests on GPU adapters without SHADER_F16
Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so
the f16 shader fails to validate. Filter the device list per test rather
than removing GPU coverage entirely — Mac Metal still runs the f16 path.
* fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity
Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh
near their domain edges. 1e-4 covers the observed gap without masking real
regressions; the macOS Metal adapter still passes comfortably.
* fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift
First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a
different fuzz seed. lavapipe asin precision is limited near the asymptotes
where the derivative blows up. 1e-3 covers observed drift; algorithmic
regressions would diverge by orders of magnitude.
* fix transcribe example
* fix(conformance): stabilize CI edge cases
* fix(ci): avoid brittle benchmark formatter
* fix(conformance): cover Windows WARP tanh drift
* fix tanh
* more ci fixes
* more software gpu backend fixes
* looser bounds for trig
* relative tolerance
* more relative comparisons
* passing on warp | 28 天前 |
| Unified conformance test library (#434)
* Extract fusor conformance library
* Include mismatch position in conformance errors
* Remove remaining crate-local fusor tests
* Add missing conformance coverage
* Broaden conformance test coverage
* Use variable-size conformance fuzz shapes
* fix formatting
* fix tests
* larger fuzz ranges
* more tests
* better cpu parity
* fix clippy
* fix formatting
* fix clippy
* refactor: replace clippy #[allow] suppressions with real fixes
Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx,
AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for
the conformance comparator return type, and rewrite needless_range_loop sites
in tests/common/mod.rs to use iter_mut().enumerate().
* chore: ignore .claude scheduler artifacts
* fix(conformance): skip f16 tests on GPU adapters without SHADER_F16
Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so
the f16 shader fails to validate. Filter the device list per test rather
than removing GPU coverage entirely — Mac Metal still runs the f16 path.
* fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity
Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh
near their domain edges. 1e-4 covers the observed gap without masking real
regressions; the macOS Metal adapter still passes comfortably.
* fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift
First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a
different fuzz seed. lavapipe asin precision is limited near the asymptotes
where the derivative blows up. 1e-3 covers observed drift; algorithmic
regressions would diverge by orders of magnitude.
* fix transcribe example
* fix(conformance): stabilize CI edge cases
* fix(ci): avoid brittle benchmark formatter
* fix(conformance): cover Windows WARP tanh drift
* fix tanh
* more ci fixes
* more software gpu backend fixes
* looser bounds for trig
* relative tolerance
* more relative comparisons
* passing on warp | 28 天前 |
| Unified conformance test library (#434)
* Extract fusor conformance library
* Include mismatch position in conformance errors
* Remove remaining crate-local fusor tests
* Add missing conformance coverage
* Broaden conformance test coverage
* Use variable-size conformance fuzz shapes
* fix formatting
* fix tests
* larger fuzz ranges
* more tests
* better cpu parity
* fix clippy
* fix formatting
* fix clippy
* refactor: replace clippy #[allow] suppressions with real fixes
Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx,
AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for
the conformance comparator return type, and rewrite needless_range_loop sites
in tests/common/mod.rs to use iter_mut().enumerate().
* chore: ignore .claude scheduler artifacts
* fix(conformance): skip f16 tests on GPU adapters without SHADER_F16
Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so
the f16 shader fails to validate. Filter the device list per test rather
than removing GPU coverage entirely — Mac Metal still runs the f16 path.
* fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity
Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh
near their domain edges. 1e-4 covers the observed gap without masking real
regressions; the macOS Metal adapter still passes comfortably.
* fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift
First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a
different fuzz seed. lavapipe asin precision is limited near the asymptotes
where the derivative blows up. 1e-3 covers observed drift; algorithmic
regressions would diverge by orders of magnitude.
* fix transcribe example
* fix(conformance): stabilize CI edge cases
* fix(ci): avoid brittle benchmark formatter
* fix(conformance): cover Windows WARP tanh drift
* fix tanh
* more ci fixes
* more software gpu backend fixes
* looser bounds for trig
* relative tolerance
* more relative comparisons
* passing on warp | 28 天前 |
| Unified conformance test library (#434)
* Extract fusor conformance library
* Include mismatch position in conformance errors
* Remove remaining crate-local fusor tests
* Add missing conformance coverage
* Broaden conformance test coverage
* Use variable-size conformance fuzz shapes
* fix formatting
* fix tests
* larger fuzz ranges
* more tests
* better cpu parity
* fix clippy
* fix formatting
* fix clippy
* refactor: replace clippy #[allow] suppressions with real fixes
Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx,
AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for
the conformance comparator return type, and rewrite needless_range_loop sites
in tests/common/mod.rs to use iter_mut().enumerate().
* chore: ignore .claude scheduler artifacts
* fix(conformance): skip f16 tests on GPU adapters without SHADER_F16
Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so
the f16 shader fails to validate. Filter the device list per test rather
than removing GPU coverage entirely — Mac Metal still runs the f16 path.
* fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity
Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh
near their domain edges. 1e-4 covers the observed gap without masking real
regressions; the macOS Metal adapter still passes comfortably.
* fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift
First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a
different fuzz seed. lavapipe asin precision is limited near the asymptotes
where the derivative blows up. 1e-3 covers observed drift; algorithmic
regressions would diverge by orders of magnitude.
* fix transcribe example
* fix(conformance): stabilize CI edge cases
* fix(ci): avoid brittle benchmark formatter
* fix(conformance): cover Windows WARP tanh drift
* fix tanh
* more ci fixes
* more software gpu backend fixes
* looser bounds for trig
* relative tolerance
* more relative comparisons
* passing on warp | 28 天前 |
| Unified conformance test library (#434)
* Extract fusor conformance library
* Include mismatch position in conformance errors
* Remove remaining crate-local fusor tests
* Add missing conformance coverage
* Broaden conformance test coverage
* Use variable-size conformance fuzz shapes
* fix formatting
* fix tests
* larger fuzz ranges
* more tests
* better cpu parity
* fix clippy
* fix formatting
* fix clippy
* refactor: replace clippy #[allow] suppressions with real fixes
Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx,
AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for
the conformance comparator return type, and rewrite needless_range_loop sites
in tests/common/mod.rs to use iter_mut().enumerate().
* chore: ignore .claude scheduler artifacts
* fix(conformance): skip f16 tests on GPU adapters without SHADER_F16
Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so
the f16 shader fails to validate. Filter the device list per test rather
than removing GPU coverage entirely — Mac Metal still runs the f16 path.
* fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity
Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh
near their domain edges. 1e-4 covers the observed gap without masking real
regressions; the macOS Metal adapter still passes comfortably.
* fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift
First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a
different fuzz seed. lavapipe asin precision is limited near the asymptotes
where the derivative blows up. 1e-3 covers observed drift; algorithmic
regressions would diverge by orders of magnitude.
* fix transcribe example
* fix(conformance): stabilize CI edge cases
* fix(ci): avoid brittle benchmark formatter
* fix(conformance): cover Windows WARP tanh drift
* fix tanh
* more ci fixes
* more software gpu backend fixes
* looser bounds for trig
* relative tolerance
* more relative comparisons
* passing on warp | 28 天前 |
| Unified conformance test library (#434)
* Extract fusor conformance library
* Include mismatch position in conformance errors
* Remove remaining crate-local fusor tests
* Add missing conformance coverage
* Broaden conformance test coverage
* Use variable-size conformance fuzz shapes
* fix formatting
* fix tests
* larger fuzz ranges
* more tests
* better cpu parity
* fix clippy
* fix formatting
* fix clippy
* refactor: replace clippy #[allow] suppressions with real fixes
Bundle args into structs (FlushBatch, MmaParams, TileALoadCtx, TileBLoadCtx,
AttentionInputs, BertShape, QMatMulFuzz), introduce CompareFut type alias for
the conformance comparator return type, and rewrite needless_range_loop sites
in tests/common/mod.rs to use iter_mut().enumerate().
* chore: ignore .claude scheduler artifacts
* fix(conformance): skip f16 tests on GPU adapters without SHADER_F16
Linux CI's lavapipe adapter doesn't expose wgpu::Features::SHADER_F16, so
the f16 shader fails to validate. Filter the device list per test rather
than removing GPU coverage entirely — Mac Metal still runs the f16 path.
* fix(conformance): bump inverse-trig tolerance to 1e-4 for lavapipe parity
Linux CI's lavapipe adapter diverges from libm by ~6e-5 on asin/acos/atanh/acosh
near their domain edges. 1e-4 covers the observed gap without masking real
regressions; the macOS Metal adapter still passes comfortably.
* fix(conformance): bump inverse-trig tolerance to 1e-3 for lavapipe drift
First CI run showed asin diverging 5.6e-5; second run hit 2.1e-4 on a
different fuzz seed. lavapipe asin precision is limited near the asymptotes
where the derivative blows up. 1e-3 covers observed drift; algorithmic
regressions would diverge by orders of magnitude.
* fix transcribe example
* fix(conformance): stabilize CI edge cases
* fix(ci): avoid brittle benchmark formatter
* fix(conformance): cover Windows WARP tanh drift
* fix tanh
* more ci fixes
* more software gpu backend fixes
* looser bounds for trig
* relative tolerance
* more relative comparisons
* passing on warp | 28 天前 |
| Floneum cpu (#424)
* fix the summary chunker
* start
* wip
* some progress
* abox
* start simding
* add some operations
* pull out ResolveTensor
* refactor
* more refactoring
* fill in missing ops
* remove ResolvedTensorMut
* add benchmarks
* faster
* matmul op
* reduce operations
* refactor into multiple files
* fuse ops
* more ops
* clean up comments
* remove unsafe
* remove more unsafe
* initial fusor crate
* refactoring fusor cpu
* tensor type
* move more ops to tensor
* more refactoring
* more refactoring
* clean up some unused code
* remove most of fusor
* slice assign op
* quantized cpu
* optimize qmatmul
* wip
* new types crate
* use layout type in cpu
* pull out rank
* as_slice for gpuor
* better add impl
* partially borrowed ops
* refactor pairwise
* more gpuor ops
* more ops
* batched matmul
* implement some composite ops
* reduce ops
* normalization ops
* reshape ops
* move most of the logic onto layout itself
* move sliding window into the layout type
* a bunch more composite ops
* more methods
* more ops
* rename qmatmul
* batched cpu qmatmul
* quantized support for gpuor
* rename to be closer to fusor-core
* start porting rwhisper to cpu
* wip whisper
* optimize a bit
* more optimization
* more optimization
* rename eval
* more lazy
* remove some logs
* move all matches to dispatch
* more dispatch
* fix add_ and other ops
* use layout instead
* get rid of ResolveTensor
* remove useless Expr methods
* fix fusor
* shape on cpu tensor
* as ref tensor
* remove Expr trait
* remove expr from fusor
* owned and copy cpu tensors
* fix fusor/fusor
* start migrating fusor
* fix fusor
* rwhisper compiles
* fix reshape
* fix cpu map layout
* remove debug
* remove submodule
* simd qmatmul
* nightly optimizations
* switch llama to fusor
* remove const generic from QMatrix
* don't use rayon for parallelization
* MapLayout is currently a concrete tensor
* simd gather
* less matching in slow softmax
* wip
* remove SimdComparisonOp
* fix fusor
* lazy maplayout
* fix maplayout
* rwhisper compiles
* fix rwhisper
* wip
* llama running!
* fix vision on cpu
* wip q5k
* start gpu impl
* fix out of memory
* q5k_sgemv
* fix formatting
* fix some warnings
* remove some dead code
* refactor from array
* Use rustversion to conditionally enable nightly NEON intrinsics
The vdotq_s32 intrinsic requires the nightly-only
stdarch_neon_dotprod feature. This change uses rustversion
to detect nightly builds and only enables the optimized
NEON code path when building with nightly Rust. On stable
Rust, the scalar fallback implementation is used instead.
* fix formatting
* fix nightly formatting
* import SimdElement trait for x86 gather operations
* fix unsafe blocks for Rust 2024 edition
* fix clippy warnings in test modules
* fix tensor shape handling in conv1d and layer_norm loading
* skip qwen download in CI
* fix gelu in warp | 3 个月前 |