Mojo struct
BlackwellWarpProfilingWorkspaceManager
struct BlackwellWarpProfilingWorkspaceManager[load_warps: UInt32, mma_warps: UInt32, scheduler_warps: UInt32, epilogue_warps: UInt32, max_entries_per_warp: UInt32]
This struct manages the profiling workspace. The workspaces consists of equal sized chunks, the total number of which is equal to the total number of active SMs. Each SM chunk consists of sequences of entries, with a maximum number of entries per warp role.
Template Parameters: load_warps: Number of warps specialized for load operations mma_warps: Number of warps specialized for matrix multiply-accumulate operations scheduler_warps: Number of warps specialized for scheduling operations epilogue_warps: Number of warps specialized for epilogue operations max_entries_per_warp: Maximum number of entries per warp (common across all warp roles)
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime membersβ
entries_per_smβ
comptime entries_per_sm = max_entries_per_warp.__rmul__(UInt32(4))
headerβ
comptime header = "time_start,time_end,sm_id,block_idx_x,block_idx_y,role,entry_idx\n"
sm_countβ
comptime sm_count = B200.sm_count
total_data_pointsβ
comptime total_data_points = 7
total_warp_rolesβ
comptime total_warp_roles = 4
Methodsβ
get_workspaceβ
static get_workspace(ctx: DeviceContext) -> Span[UInt64, MutAnyOrigin]
Returns:
write_to_workspaceβ
static write_to_workspace[warp_role: UInt32](sm_idx: UInt32, entry_idx: UInt32, workspace: Span[UInt64, MutAnyOrigin], timeline: Tuple[UInt64, UInt64])
dump_workspace_as_csvβ
static dump_workspace_as_csv(ctx: DeviceContext, workspace: Span[UInt64, MutAnyOrigin], filename: StringSlice[StaticConstantOrigin])
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!