torch.distributed

Note

若API“是否支持”为“是”,“限制与说明”为“-”,说明此API和原生API支持度保持一致。

API名称 是否支持 限制与说明
torch.distributed.is_available -
torch.distributed.init_process_group 当pg_options函数传入类型为torch_npu._C._distributed_c10d.ProcessGroupHCCL.Options()时,配置该变量属性hccl_config可控制HCCL通信域缓存区大小。具体示例可参考《PyTorch 训练模型迁移调优指南》的“hccl_buffer_size”章节,配置变量属性hccl_config的group_name字段可以设置HCCL通信域的通信组自定义名称,取值为长度不超过32的字符串。
torch.distributed.is_initialized -
torch.distributed.is_mpi_available -
torch.distributed.is_nccl_available -
torch.distributed.is_gloo_available -
torch.distributed.is_torchelastic_launched -
torch.distributed.Backend -
torch.distributed.Backend.register_backend -
torch.distributed.get_backend -
torch.distributed.get_rank -
torch.distributed.get_world_size -
torch.distributed.Store -
torch.distributed.TCPStore -
torch.distributed.HashStore -
torch.distributed.FileStore -
torch.distributed.PrefixStore -
torch.distributed.Store.set -
torch.distributed.Store.get -
torch.distributed.Store.add -
torch.distributed.Store.compare_set -
torch.distributed.Store.wait -
torch.distributed.Store.num_keys -
torch.distributed.Store.delete_key -
torch.distributed.Store.set_timeout -
torch.distributed.new_group 当pg_options函数传入类型为torch_npu._C._distributed_c10d.ProcessGroupHCCL.Options()时,配置该变量属性hccl_config可控制HCCL通信域缓存区大小。具体示例可参考《PyTorch 训练模型迁移调优指南》的“hccl_buffer_size”章节,配置变量属性hccl_config的group_name字段可以设置HCCL通信域的通信组自定义名称,取值为长度不超过32的字符串。
torch.distributed.get_group_rank -
torch.distributed.get_global_rank -
torch.distributed.get_process_group_ranks -
torch.distributed.device_mesh.DeviceMesh -
torch.distributed.send 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
torch.distributed.recv 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
torch.distributed.isend 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
torch.distributed.irecv 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
torch.distributed.batch_isend_irecv 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
torch.distributed.P2POp 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
torch.distributed.broadcast 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
torch.distributed.broadcast_object_list -
torch.distributed.all_reduce 支持bf16,fp16,fp32, int32, int64, bool
torch.distributed.reduce 支持bf16,fp16,fp32,uint8,int8,int32,int64,bool
torch.distributed.all_gather 支持bf16,fp16,fp32,int8,int32,bool
torch.distributed.all_gather_into_tensor 支持bf16,fp16,fp32,int8,int32,bool
world size不支持3,5,6,7
torch.distributed.all_gather_object -
torch.distributed.gather 支持bf16,fp16,fp32,int8,int32,bool
通过设置torch_npu.npu.use_compatible_impl(True),torch.distributed.gather切换至与原生实现保持一致
torch.distributed.gather_object 支持的输入类型为Python Object对象
torch.distributed.scatter 支持bf16,fp16,fp32,fp64,uint8,int8,int16,int32,int64,bool
通过设置torch_npu.npu.use_compatible_impl(True),torch.distributed.scatter切换至与原生实现保持一致
torch.distributed.scatter_object_list 不涉及dtype参数
torch.distributed.reduce_scatter 支持bf16,fp16,fp32,int8,int32
torch.distributed.reduce_scatter_tensor 支持bf16,fp16,fp32,int8,int32
world size不支持3,5,6,7
针对Atlas A2 训练系列产品,当前版本"prod"操作不支持int16、bf16数据类型
torch.distributed.all_to_all_single 支持fp32
torch.distributed.all_to_all 支持fp32
通过设置torch_npu.npu.use_compatible_impl(True),torch.distributed.all_to_all切换至与原生实现保持一致
torch.distributed.barrier -
torch.distributed.monitored_barrier -
torch.distributed.ReduceOp 支持bf16,fp16,fp32,uint8,int8,int32,int64,bool
torch.distributed.reduce_op 支持bf16,fp16,fp32,uint8,int8,int32,int64
torch.distributed.DistBackendError -
torch.distributed.device_mesh.DeviceMesh.from_group -
torch.distributed.device_mesh.DeviceMesh.get_all_groups -
torch.distributed.device_mesh.DeviceMesh.get_coordinate -
torch.distributed.device_mesh.DeviceMesh.get_group -
torch.distributed.device_mesh.DeviceMesh.get_local_rank -
torch.distributed.device_mesh.DeviceMesh.get_rank -