Skip to main content

Mojo struct

BlackwellWarpProfilingWorkspaceManager

struct BlackwellWarpProfilingWorkspaceManager[load_warps: UInt32, mma_warps: UInt32, scheduler_warps: UInt32, epilogue_warps: UInt32, max_entries_per_warp: UInt32]

This struct manages the profiling workspace. The workspaces consists of equal sized chunks, the total number of which is equal to the total number of active SMs. Each SM chunk consists of sequences of entries, with a maximum number of entries per warp role.

Template Parameters: load_warps: Number of warps specialized for load operations mma_warps: Number of warps specialized for matrix multiply-accumulate operations scheduler_warps: Number of warps specialized for scheduling operations epilogue_warps: Number of warps specialized for epilogue operations max_entries_per_warp: Maximum number of entries per warp (common across all warp roles)

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime members​

entries_per_sm​

comptime entries_per_sm = max_entries_per_warp.__rmul__(UInt32(4))

comptime header = "time_start,time_end,sm_id,block_idx_x,block_idx_y,role,entry_idx\n"

sm_count​

comptime sm_count = B200.sm_count

total_data_points​

comptime total_data_points = 7

total_warp_roles​

comptime total_warp_roles = 4

Methods​

get_workspace​

static get_workspace(ctx: DeviceContext) -> Span[UInt64, MutAnyOrigin]

Returns:

Span[UInt64, MutAnyOrigin]

write_to_workspace​

static write_to_workspace[warp_role: UInt32](sm_idx: UInt32, entry_idx: UInt32, workspace: Span[UInt64, MutAnyOrigin], timeline: Tuple[UInt64, UInt64])

dump_workspace_as_csv​

static dump_workspace_as_csv(ctx: DeviceContext, workspace: Span[UInt64, MutAnyOrigin], filename: StringSlice[StaticConstantOrigin])