Libdevice Developer Guide

SIMT Compilation Mode Example

Triton kernel example with SIMT compilation mode

# Enable libdevice SIMT compilation
import os
os.environ['TRITON_ENABLE_LIBDEVICE_SIMT'] = '1'

import triton
import triton.language as tl
import triton.language.extra.cann.libdevice as libdevice
import torch

@triton.jit
def triton_kernel(input, output, XBLOCK: tl.constexpr, XBLOCK_SUB: tl.constexpr):
    offset = tl.program_id(0) * XBLOCK
    base = tl.arange(0, XBLOCK_SUB)
    loops: tl.constexpr = XBLOCK // XBLOCK_SUB
    for loop in range(loops):
        x0 = offset + (loop * XBLOCK_SUB) + base
        x = tl.load(input + (x0), None)
        y = libdevice.abs(x)
        tl.store(output + (x0), y, None)

dtype, shape, ncore, xblock, xblock_sub = ['int32', (128, 4096), 512, 1024, 1024]
input = torch.randn(shape, dtype=dtype).npu()
output = torch.randn(shape, dtype=dtype).npu()
triton_kernel[ncore, 1, 1](input, output, xblock, xblock_sub, force_simt_only=True)

1. triton.language.extra.cann.abs

OP Overview

Computes the absolute value of the input parameter.

Prototype:

triton.language.extra.cann.abs(x, _builder=None)

Return Value: tl.tensor, containing the absolute value of the input parameter.

Supported Datatypes:int32, float32

3. triton.language.extra.cann.acos

OP Overview

Computes the inverse cosine (arccos) of the input parameter.

Prototype:

triton.language.extra.cann.acos(x, _builder=None)

Return Value: tl.tensor, containing the inverse cosine of the input parameter, in the range [0, π] radians.

Supported Datatypes:float32

4. triton.language.extra.cann.acosh

OP Overview

Computes the inverse hyperbolic cosine of the input parameter.

Prototype:

triton.language.extra.cann.acosh(x, _builder=None)

Return Value: tl.tensor, containing the inverse hyperbolic cosine of the input parameter, in the range [0, +∞].

Supported Datatypes:float32

5. triton.language.extra.cann.add_rd

OP Overview

Floating-point addition with round-down (toward negative infinity) rounding mode.

Prototype:

triton.language.extra.cann.add_rd(x, y, _builder=None)

Return Value: tl.tensor, containing the addition result rounded down.

Supported Datatypes:float32

6. triton.language.extra.cann.add_rn

OP Overview

Floating-point addition with round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.add_rn(x, y, _builder=None)

Return Value: tl.tensor, containing the addition result rounded to the nearest even number.

Supported Datatypes:float32

7. triton.language.extra.cann.add_ru

OP Overview

Floating-point addition with round-up (toward positive infinity) rounding mode.

Prototype:

triton.language.extra.cann.add_ru(x, y, _builder=None)

Return Value: tl.tensor, containing the addition result rounded up.

Supported Datatypes:float32

8. triton.language.extra.cann.add_rz

OP Overview

Floating-point addition with round-toward-zero rounding mode.

Prototype:

triton.language.extra.cann.add_rz(x, y, _builder=None)

Return Value: tl.tensor, containing the addition result rounded toward zero.

Supported Datatypes:float32

9. triton.language.extra.cann.asin

OP Overview

Computes the inverse sine (arcsin) of the input parameter.

Prototype:

triton.language.extra.cann.asin(x, _builder=None)

Return Value: tl.tensor, containing the inverse sine of the input parameter, in the range [-π/2, π/2] radians.

Supported Datatypes:float32

10. triton.language.extra.cann.asinh

OP Overview

Computes the inverse hyperbolic sine of the input parameter.

Prototype:

triton.language.extra.cann.asinh(x, _builder=None)

Return Value: tl.tensor, containing the inverse hyperbolic sine of the input parameter.

Supported Datatypes:float32

11. triton.language.extra.cann.atan

OP Overview

Computes the inverse tangent (arctan) of the input parameter.

Prototype:

triton.language.extra.cann.atan(x, _builder=None)

Return Value: tl.tensor, containing the inverse tangent of the input parameter, in the range [-π/2, π/2] radians.

Supported Datatypes:float32

12. triton.language.extra.cann.atan2

OP Overview

Two-argument inverse tangent function, computes the arctangent of x / y.

Prototype:

triton.language.extra.cann.atan2(x, y, _builder=None)

Return Value: tl.tensor, containing the arctangent of x / y, in the range [-π, π] radians.

Supported Datatypes:float32

13. triton.language.extra.cann.atanh

OP Overview

Inverse hyperbolic tangent function, computes the inverse hyperbolic tangent of the input parameter.

Prototype:

triton.language.extra.cann.atanh(x, _builder=None)

Return Value: tl.tensor, containing the inverse hyperbolic tangent of the input parameter, in the range [-1, 1].

Supported Datatypes:float32

14. triton.language.extra.cann.brev

OP Overview

Bit reversal function, reverses the bit order of a 32-bit integer.

Prototype:

triton.language.extra.cann.brev(x, _builder=None)

Return Value: tl.tensor, containing the 32-bit integer with reversed bit order.

Supported Datatypes:int32

15. triton.language.extra.cann.byte_perm

OP Overview

Prototype:

triton.language.extra.cann.byte_perm(x, y, s, _builder=None)

Byte permutation operation, selects bytes from two 32-bit integers to form a new integer. The byte order of input integers x and y is as follows:

input[0] = x<7:0>     input[1] = x<15:8>
input[2] = x<23:16>   input[3] = x<31:24>
input[4] = y<7:0>     input[5] = y<15:8>
input[6] = y<23:16>   input[7] = y<31:24>

The byte selection parameter s is a 32-bit integer, with each bit group corresponding to byte selection as follows:

selector[0] = s<2:0>    selector[1] = s<6:4>
selector[2] = s<10:8>   selector[3] = s<14:12>

Return Value: tl.tensor, where return[n] := input[selector[n]], where n represents the n-th byte of the output integer.

Supported Datatypes:int32

16. triton.language.extra.cann.ceil

OP Overview

Ceiling operation, returns the smallest integer greater than or equal to x.

Prototype:

triton.language.extra.cann.ceil(x, _builder=None)

Return Value: tl.tensor, containing the ceiling result.

Supported Datatypes:float32

17. triton.language.extra.cann.clz

OP Overview

Counts the number of leading zeros in a 32-bit integer.

Prototype:

triton.language.extra.cann.clz(x, _builder=None)

Return Value: tl.tensor, containing the number of leading zeros in the input parameter. Range: [0, 32].

Supported Datatypes:int32

18. triton.language.extra.cann.copysign

OP Overview

Generates a floating-point number with magnitude equal to the magnitude of x and sign equal to the sign of y.

Prototype:

triton.language.extra.cann.copysign(x, y, _builder=None)

Return Value: tl.tensor, containing a floating-point number with magnitude equal to the magnitude of x and sign equal to the sign of y.

Supported Datatypes:float32

19. triton.language.extra.cann.cos

OP Overview

Computes the cosine of the input parameter (in radians).

Prototype:

triton.language.extra.cann.cos(x, _builder=None)

Return Value: tl.tensor, containing the cosine of the input parameter.

Supported Datatypes:float32

20. triton.language.extra.cann.cosh

OP Overview

Computes the hyperbolic cosine of the input parameter.

Prototype:

triton.language.extra.cann.cosh(x, _builder=None)

Return Value: tl.tensor, containing the hyperbolic cosine of the input parameter.

21. triton.language.extra.cann.cyl_bessel_i0

OP Overview

Computes the modified Bessel function of the first kind, order 0, of the input parameter.

Prototype:

triton.language.extra.cann.cyl_bessel_i0(x, _builder=None)

Return Value: tl.tensor, containing the modified Bessel function of the first kind, order 0, of the input parameter.

Supported Datatypes:float32

22. triton.language.extra.cann.div_rd

OP Overview

Floating-point division with round-down (toward negative infinity) rounding mode.

Prototype:

triton.language.extra.cann.div_rd(x, y, _builder=None)

Return Value: tl.tensor, containing the division result.

Supported Datatypes:float32

23. triton.language.extra.cann.div_rn

OP Overview

Floating-point division with round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.div_rn(x, y, _builder=None)

Return Value: tl.tensor, containing the division result.

Supported Datatypes:float32

24. triton.language.extra.cann.div_ru

OP Overview

Floating-point division with round-up (toward positive infinity) rounding mode.

Prototype:

triton.language.extra.cann.div_ru(x, y, _builder=None)

Return Value: tl.tensor, containing the division result.

Supported Datatypes:float32

25. triton.language.extra.cann.div_rz

OP Overview

Floating-point division with round-toward-zero rounding mode.

Prototype:

triton.language.extra.cann.div_rz(x, y, _builder=None)

Return Value: tl.tensor, containing the division result.

Supported Datatypes:float32

26. triton.language.extra.cann.erfinv

OP Overview

Inverse error function, finds the value y such that x = erf(y).

Prototype:

triton.language.extra.cann.erfinv(x, _builder=None)

Return Value: tl.tensor, containing the inverse error function of the input parameter.

Supported Datatypes:float32

27. triton.language.extra.cann.exp10

OP Overview

Base-10 exponential function, computes 10 raised to the power of x.

Prototype:

triton.language.extra.cann.exp10(x, _builder=None)

Return Value: tl.tensor, containing the result of 10 raised to the power of x.

Supported Datatypes:float32

28. triton.language.extra.cann.exp2

OP Overview

Base-2 exponential function, computes 2 raised to the power of x.

Prototype:

triton.language.extra.cann.exp2(x, _builder=None)

Return Value: tl.tensor, containing the result of 2 raised to the power of x.

Supported Datatypes:float32

29. triton.language.extra.cann.exp

OP Overview

Exponential function, computes e raised to the power of x.

Prototype:

triton.language.extra.cann.exp(x, _builder=None)

Return Value: tl.tensor, containing the result of e raised to the power of x.

Supported Datatypes:float32

30. triton.language.extra.cann.expm1

OP Overview

Computes e raised to the power of x, minus 1.

Prototype:

triton.language.extra.cann.expm1(x, _builder=None)

Return Value: tl.tensor, containing the result of e raised to the power of x, minus 1.

31. triton.language.extra.cann.fast_dividef

OP Overview

Fast approximate division.

Prototype:

triton.language.extra.cann.fast_dividef(x, y, _builder=None)

Return Value: tl.tensor, containing the result of fast approximate division.

Supported Datatypes:float32

32. triton.language.extra.cann.fast_expf

OP Overview

Fast approximate exponential function.

Prototype:

triton.language.extra.cann.fast_expf(x, _builder=None)

Return Value: tl.tensor, containing the result of fast approximate exponential function.

Supported Datatypes:float32

33. triton.language.extra.cann.fdim

OP Overview

Computes the positive difference between x and y. When x > y, returns x - y; otherwise returns 0.

Prototype:

triton.language.extra.cann.fdim(x, y, _builder=None)

Return Value: tl.tensor, containing the positive difference between x and y.

Supported Datatypes:float32

34. triton.language.extra.cann.ffs

OP Overview

Finds the first bit set to 1, returns the index of the lowest bit set to 1.

Prototype:

triton.language.extra.cann.ffs(x, _builder=None)

Return Value: tl.tensor, containing the index of the lowest bit set to 1. Range: [0, 32].

Supported Datatypes:int32

35. triton.language.extra.cann.float_as_int

OP Overview

Reinterprets the bit pattern of a floating-point number as a 32-bit integer. No numeric conversion is performed.

Prototype:

triton.language.extra.cann.float_as_int(x, _builder=None)

Return Value: tl.tensor, containing the bit pattern of the floating-point number reinterpreted as a 32-bit integer.

36. triton.language.extra.cann.floor

OP Overview

Floor operation, returns the largest integer less than or equal to x.

Prototype:

triton.language.extra.cann.floor(x, _builder=None)

Return Value: tl.tensor, containing the floor result.

Supported Datatypes:float32

37. triton.language.extra.cann.fma

OP Overview

Fused multiply-add, computes x × y + z.

Prototype:

triton.language.extra.cann.fma(x, y, z, _builder=None)

Return Value: tl.tensor, containing the result of fused multiply-add.

Supported Datatypes:float32

39. triton.language.extra.cann.fma_rn

OP Overview

Fused multiply-add operation with round-down rounding mode.

Prototype:

triton.language.extra.cann.fma_rd(x, y, z, _builder=None)

Return Value: tl.tensor, containing the result of fused multiply-add.

Supported Datatypes:float32

39. triton.language.extra.cann.fma_rn

OP Overview

Fused multiply-add operation with round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.fma_rn(x, y, z, _builder=None)

Return Value: tl.tensor, containing the result of fused multiply-add.

Supported Datatypes:float32

40. triton.language.extra.cann.fma_ru

OP Overview

Fused multiply-add operation with round-up rounding mode.

Prototype:

triton.language.extra.cann.fma_ru(x, y, z, _builder=None)

Return Value: tl.tensor, containing the result of fused multiply-add.

Supported Datatypes:float32

41. triton.language.extra.cann.fma_rz

OP Overview

Fused multiply-add operation with round-toward-zero rounding mode.

Prototype:

triton.language.extra.cann.fma_rz(x, y, z, _builder=None)

Return Value: tl.tensor, containing the result of fused multiply-add.

Supported Datatypes:float32

42. triton.language.extra.cann.fmod

OP Overview

Floating-point modulo, computes the remainder of x / y, with the same sign as x.

Prototype:

triton.language.extra.cann.fmod(x, y, _builder=None)

Return Value: tl.tensor, containing the floating-point modulo result.

Supported Datatypes:float32

43. triton.language.extra.cann.hadd

OP Overview

Computes the average of x and y.

Prototype:

triton.language.extra.cann.hadd(x, y, _builder=None)

Return Value: tl.tensor, containing the average of x and y.

Supported Datatypes:int32

44. triton.language.extra.cann.hypot

OP Overview

Computes the Euclidean distance between x and y.

Prototype:

triton.language.extra.cann.hypot(x, y, _builder=None)

Return Value: tl.tensor, containing the Euclidean distance between x and y.

Supported Datatypes:float32

45. triton.language.extra.cann.lgamma

OP Overview

Computes the natural logarithm of the absolute value of the gamma function for input x.

Prototype:

triton.language.extra.cann.lgamma(x, _builder=None)

Return Value: tl.tensor, containing the natural logarithm of the absolute value of the gamma function for input x.

Supported Datatypes:float32

46. triton.language.extra.cann.log10

OP Overview

Computes the base-10 logarithm of input x.

Prototype:

triton.language.extra.cann.log10(x, _builder=None)

Return Value: tl.tensor, containing the base-10 logarithm of input x.

Supported Datatypes:float32

47. triton.language.extra.cann.log2

OP Overview

Computes the base-2 logarithm of input x.

Prototype:

triton.language.extra.cann.log2(x, _builder=None)

Return Value: tl.tensor, containing the base-2 logarithm of input x.

Supported Datatypes:float32

48. triton.language.extra.cann.log

OP Overview

Computes the natural (base-e) logarithm of input x.

Prototype:

triton.language.extra.cann.log(x, _builder=None)

Return Value: tl.tensor, containing the natural logarithm of input x.

Supported Datatypes:float32

49. triton.language.extra.cann.mul24

OP Overview

Computes the lower 24-bit multiplication result of x and y.

Prototype:

triton.language.extra.cann.mul24(x, y, _builder=None)

Return Value: tl.tensor, containing the lower 24-bit multiplication result of x and y.

Supported Datatypes:int32

50. triton.language.extra.cann.mul_rd

OP Overview

Floating-point multiplication with round-down rounding mode.

Prototype:

triton.language.extra.cann.mul_rd(x, y, _builder=None)

Return Value: tl.tensor, containing the floating-point multiplication result.

Supported Datatypes:float32

51. triton.language.extra.cann.mul_rn

OP Overview

Floating-point multiplication with round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.mul_rn(x, y, _builder=None)

Return Value: tl.tensor, containing the floating-point multiplication result.

Supported Datatypes:float32

52. triton.language.extra.cann.mul_ru

OP Overview

Floating-point multiplication with round-up rounding mode.

Prototype:

triton.language.extra.cann.mul_ru(x, y, _builder=None)

Return Value: tl.tensor, containing the floating-point multiplication result.

Supported Datatypes:float32

53. triton.language.extra.cann.mul_rz

OP Overview

Floating-point multiplication with round-toward-zero rounding mode.

Prototype:

triton.language.extra.cann.mul_rz(x, y, _builder=None)

Return Value: tl.tensor, containing the floating-point multiplication result.

Supported Datatypes:float32

54. triton.language.extra.cann.mulhi

OP Overview

Computes the high 32 bits of the multiplication result of x and y.

Prototype:

triton.language.extra.cann.mulhi(x, y, _builder=None)

Return Value: tl.tensor, containing the high 32 bits of the multiplication result of x and y.

Supported Datatypes:int32

54. triton.language.extra.cann.nearbyint

OP Overview

Converts x to the nearest integer.

Prototype:

triton.language.extra.cann.nearbyint(x, _builder=None)

Return Value: tl.tensor, containing the nearest integer.

Supported Datatypes:float32

56. triton.language.extra.cann.nextafter

OP Overview

Computes the next representable floating-point number from x toward y.

Prototype:

triton.language.extra.cann.nextafter(x, y, _builder=None)

Return Value: tl.tensor, containing the next representable floating-point number.

Supported Datatypes:float32

57. triton.language.extra.cann.popc

OP Overview

Counts the number of bits set to 1 in x.

Prototype:

triton.language.extra.cann.popc(x, _builder=None)

Return Value: tl.tensor, containing the number of bits set to 1 in x. Range: [0, 32].

Supported Datatypes:int32

58. triton.language.extra.cann.pow

OP Overview

Power function, computes x raised to the power of y.

Prototype:

triton.language.extra.cann.pow(x, y, _builder=None)

Return Value: tl.tensor, containing x raised to the power of y.

Supported Datatypes:float32

59. triton.language.extra.cann.rcp_rd

OP Overview

Floating-point reciprocal with round-down rounding mode.

Prototype:

triton.language.extra.cann.rcp_rd(x, _builder=None)

Return Value: tl.tensor, containing 1 / x.

Supported Datatypes:float32

60. triton.language.extra.cann.rcp_rn

OP Overview

Floating-point reciprocal with round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.rcp_rn(x, _builder=None)

Return Value: tl.tensor, containing 1 / x.

Supported Datatypes:float32

61. triton.language.extra.cann.rcp_ru

OP Overview

Floating-point reciprocal with round-up rounding mode.

Prototype:

triton.language.extra.cann.rcp_ru(x, _builder=None)

Return Value: tl.tensor, containing 1 / x.

Supported Datatypes:float32

62. triton.language.extra.cann.rcp_rz

OP Overview

Floating-point reciprocal with round-toward-zero rounding mode.

Prototype:

triton.language.extra.cann.rcp_rz(x, _builder=None)

Return Value: tl.tensor, containing 1 / x.

Supported Datatypes:float32

63. triton.language.extra.cann.remainder

OP Overview

Computes the remainder of x divided by y, where r = x - ny, and n is the nearest integer to x / y.

Prototype:

triton.language.extra.cann.remainder(x, y, _builder=None)

Return Value: tl.tensor, containing the remainder of x divided by y.

Supported Datatypes:float32

64. triton.language.extra.cann.rhadd

OP Overview

Computes the rounded average of x and y.

Prototype:

triton.language.extra.cann.rhadd(x, y, _builder=None)

Return Value: tl.tensor, containing the rounded average of x and y.

Supported Datatypes:int32

65. triton.language.extra.cann.rint

OP Overview

Computes the nearest integer to x using round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.rint(x, _builder=None)

Return Value: tl.tensor, containing the nearest integer to x.

Supported Datatypes:float32

66. triton.language.extra.cann.round

OP Overview

Computes the nearest integer to x using round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.round(x, _builder=None)

Return Value: tl.tensor, containing the nearest integer to x.

Supported Datatypes:float32

67. triton.language.extra.cann.rsqrt

OP Overview

Computes the reciprocal square root of x.

Prototype:

triton.language.extra.cann.rsqrt(x, _builder=None)

Return Value: tl.tensor, containing the reciprocal square root of x.

Supported Datatypes:float32

68. triton.language.extra.cann.rsqrt_rn

OP Overview

Computes the reciprocal square root of x using round-to-nearest-even rounding mode.

Prototype:

triton.language.extra.cann.rsqrt_rn(x, _builder=None)

Return Value: tl.tensor, containing the reciprocal square root of x.

Supported Datatypes:float32

69. triton.language.extra.cann.sad

OP Overview

Computes |x-y|+z, where x and y are signed integers and z is an unsigned integer.

Prototype:

triton.language.extra.cann.sad(x, y, z, _builder=None)

Return Value: tl.tensor, containing |x-y|+z.

Supported Datatypes:float32

70. triton.language.extra.cann.saturatef

OP Overview

Clamps x to the range [+0.0, 1.0].

Prototype:

triton.language.extra.cann.saturatef(x, _builder=None)

Return Value: tl.tensor, containing the saturated value of x, in the range [+0.0, 1.0].

Supported Datatypes:float32

71. triton.language.extra.cann.signbit

OP Overview

Extracts the sign bit of x.

Prototype:

triton.language.extra.cann.signbit(x, _builder=None)

Return Value: tl.tensor, containing the sign bit of x.

Supported Datatypes:float32

72. triton.language.extra.cann.sin

OP Overview

Computes the sine of the input parameter x (in radians).

Prototype:

triton.language.extra.cann.sin(x, _builder=None)

Return Value: tl.tensor, containing the sine of input x.

Supported Datatypes:float32

73. triton.language.extra.cann.sinh

OP Overview

Computes the hyperbolic sine of input parameter x.

Prototype:

triton.language.extra.cann.sinh(x, _builder=None)

Return Value: tl.tensor, containing the hyperbolic sine of input x.

Supported Datatypes:float32

74. triton.language.extra.cann.sqrt

OP Overview

Computes the square root of x.

Prototype:

triton.language.extra.cann.sqrt(x, _builder=None)

Return Value: tl.tensor, containing the square root of x.

Supported Datatypes:float32

75. triton.language.extra.cann.tan

OP Overview

Computes the tangent of input parameter x (in radians).

Prototype:

triton.language.extra.cann.tan(x, _builder=None)

Return Value: tl.tensor, containing the tangent of input x.

Supported Datatypes:float32

76. triton.language.extra.cann.tanh

OP Overview

Computes the hyperbolic tangent of input parameter x.

Prototype:

triton.language.extra.cann.tanh(x, _builder=None)

Return Value: tl.tensor, containing the hyperbolic tangent of input x.

Supported Datatypes:float32

77. triton.language.extra.cann.trunc

OP Overview

Truncation operation, rounds toward zero to the nearest integer.

Prototype:

triton.language.extra.cann.trunc(x, _builder=None)

Return Value: tl.tensor, containing the truncation result.

Supported Datatypes:float32