feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning
Co-authored-by: Xuan Peng<pengxuan9@huawei.com>
# message auto-generated for no-merge-commit merge:
!1268 merge feat/async-compile-0210 into main
feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning
Created-by: HinPeng
Commit-by: Xuan Peng
Merged-by: ascend-robot
Description:
## PR description
1. Introduce async compile mode from triton v3.5.1 (with little modification to be compatible with current branch and torch-2.7.1)
2. Refactor autotuner to compile triton kernel in parallel
## Notice
1. Introduce MLIR_DISABLE_MULTITHREADING environment variable ahead from triton v3.5.1
2. Add TRITON_AUTOTUNE_PARALLEL_COMPILE to control whether compiling kernels in parallel in autotuner, default to '1'
See merge request: Ascend/triton-ascend!1268