| [AMD] Emulate Float8E4M3FN with Float16 on CDNA3 and below (#7186)
The fact that gfx942 has its own FP8 variants, not the OCP ones, is a
common pitfall. Also starting gfx950, we switch to OCP FP8 variants. So
it means we have a one-generation special case here.
This commit enables emulating Float8E4M3FN with FP16 like what we
already do for Float8E5M2 for better portability, with a performance
remark. | 11 个月前 |