Fusor ML
This is a WGPU ML runtime with kernel fusion for ergonomic high performance custom operations. This will hopefully serve as the web and amd runtime for kalosm once it is stable enough.
Status
Basic operations are working and simple kernel fusion is implemented, but this is not production ready yet.
Features:
- Elementwise ops
- Fuse Elementwise ops together
- MatMul
- Reduce ops
- Fuse Elementwise ops into Reduce ops
- PairWise ops
- Fuse Elementwise ops into PairWise ops
- Analyze buffer usage for in-place ops
- Memory move/cat/etc ops
- Cast ops
- Fuse PairWise ops together?
- Fuse parallel Reduce ops?
- Fuse PairWise ops with two of the same input into an elementwise op
- Dynamically apply fusion based on runtime throughput data
Operations required for a Llama implementation:
- RmsNorm
- Matmul
- Rope
- Unqueeze
- Cat
- Reshape
- Transpose
- Softmax
- narraw
- silu
- arange
- sin
- cos