Enabling Tools for Common Frameworks
Overview
This document describes how to enable debugging tools in common frameworks, including the dump tool and monitor tool.
Tool Enablement
Reference
Method for Locating the Position to Add Tools
To locate the target position, you can print the call stack information of any API and find the position from the call stack.
For example, you can print the call stack of linear in either of the following ways:
Method 1: Printing the call stack in torch

Method 2: Replacing the original linear in the startup script
import torch
import torch.nn.functional as F
import traceback
# Save the original function.
original_linear = F.linear
# Replace linear in functional and use *args and **kwargs to accommodate all parameters.
def custom_linear(*args, **kwargs):
print("="*50)
print("Call F.linear. The call stack is as follows:")
traceback.print_stack()
# Pass the received parameters to the original function without any modification.
return original_linear(*args, **kwargs)
F.linear = custom_linear
Note: Once the call stack is obtained, finding the position to add tools becomes easy.

Tool Adding Position in Common Frameworks
MindSpeed-LLM

MindSpeed-MM

LLaMA-Factory

accelerate + DeepSpeed

TorchTitan (FSDP2)

verl (FSDP)
Positions where deterministic computing is enabled:

generate_sequences

The preceding enabling modes are only applicable to the vLLM eager backend, which may vary depending on configurations.
update_actor

compute_log_prob

compute_ref_log_prob

Note: The preceding enabling modes are only applicable to the vLLM eager backend. Positions and enabling modes may vary depending on configurations.