| [AMD] Handle denorms properly for exp2 and exp (#3816)
This PR enables denorm flushing for tl.math.exp2 and preserves denorms
for tl.math.exp, which match their behaviors on Nvidia backend.
More specifically,
- denorm flushing for tl.math.exp2 with f32 inputs is controlled by
__CUDA_FTZ or __HIP_FTZ and the default is set to flushing denorm.
These flags can be set by developers, but are not exposed as kernel
argument.
tl.math.exp2(f32) | NV | NV | AMD | AMD
-- | -- | -- | -- | --
control flag | __CUDA_FTZ=1 (default) | __CUDA_FTZ=0 | __HIP_FTZ=1
(default) | __HIP_FTZ=0
device lib | __nv_exp2f | __nv_exp2f | |
llvm intrinsics | llvm.nvvm.ex2.approx.ftz.f | llvm.nvvm.ex2.approx.f |
llvm.amdgcn.exp2.f32 | llvm.exp2.f32
ptx | ex2.approx.ftz.f32 | ex2.approx.f32 | |
sass/amdgcn | MUFU.EX2 | MUFU.EX2<br>and instructions to<br>check and
adjust for<br>denorms | v_exp_f32 | v_exp_f32<br>and instructions<br>to
check and<br>adjust for<br>denorms
- denorms are preserved for tl.math.exp2 with f64 inputs
tl.math.exp2(f64) | NV | AMD
-- | -- | --
device lib | __nv_exp2 | __ocml_exp2_f64
- denorms are preserved for tl.math.exp with both f32 and f64 inputs.
Note that tl.math.exp(f32) on nv path is lowered with inline ptx
directly without the .ftz flag.
tl.math.exp(f32) | NV | AMD
-- | -- | --
llvm intrinsics | | llvm.exp2.f32
ptx | ex2.approx.f32 |
tl.math.exp(f64) | NV | AMD
-- | -- | --
device lib | __nv_exp | __ocml_exp_f64 | 1 年前 |