Introduction

This project provides calling samples for different scenarios. After setting up the environment, you can try to run them according to actual scenarios:

Sample Algorithm Description
Quantize model using MIN-MAX algorithm Min-Max Simple quantization based on extrema, best for beginners
Quantize model using AWQ algorithm AWQ Activation-aware weight quantization, suitable for large model PTQ
Quantize model using GPTQ algorithm GPTQ Weight quantization based on second-order information, layer-by-layer optimization
Quantize model using SmoothQuant algorithm SmoothQuant W8A8 quantization that smooths activation distribution
Quantize model using Cast direct conversion algorithm Cast HiFloat8 data direct conversion
Quantize model using Quantile algorithm Quantile HiFloat8 quantile quantization
Quantize model using OFMR algorithm OFMR Output feature Min-Max quantization
Quantize model using MXQuant algorithm MXQuant Micro-scaling floating-point quantization (MXFP8/MXFP4)
Quantize model using FlatQuant algorithm (Experimental) FlatQuant Quantization that flattens distribution through affine transformation

Note: Samples marked as "Experimental" depend on content under amct_pytorch/experimental/. Build the package with bash build.sh --torch --experimental (or --pkg --experimental) before use.