Skip to main content

Mojo struct

BlackwellWarpProfilingWorkspaceManager

@register_passable(trivial) struct BlackwellWarpProfilingWorkspaceManager[load_warps: UInt32, mma_warps: UInt32, scheduler_warps: UInt32, epilogue_warps: UInt32, max_entries_per_warp: UInt32]

This struct manages the profiling workspace. The workspaces consists of equal sized chunks, the total number of which is equal to the total number of active SMs. Each SM chunk consists of sequences of entries, with a maximum number of entries per warp role.

Template Parameters: load_warps: Number of warps specialized for load operations mma_warps: Number of warps specialized for matrix multiply-accumulate operations scheduler_warps: Number of warps specialized for scheduling operations epilogue_warps: Number of warps specialized for epilogue operations max_entries_per_warp: Maximum number of entries per warp (common across all warp roles)

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, Movable, UnknownDestructibility

Aliases

__copyinit__is_trivial

alias __copyinit__is_trivial = True

__del__is_trivial

alias __del__is_trivial = True

__moveinit__is_trivial

alias __moveinit__is_trivial = True

entries_per_sm

alias entries_per_sm = max_entries_per_warp.__rmul__[DType.uint32, 1](4)

alias header = "time_start,time_end,sm_id,block_idx_x,block_idx_y,role,entry_idx\n"

sm_count

alias sm_count = GPUInfo.from_family(AcceleratorArchitectureFamily(32, 2048, 233472, 65536, 1024), "B200", Vendor(2), "cuda", "blackwell", 10, "sm_100a", 148).sm_count

total_data_points

alias total_data_points = 7

total_warp_roles

alias total_warp_roles = 4

Methods

get_workspace

static get_workspace(ctx: DeviceContext) -> Span[UInt64, MutableAnyOrigin]

Returns:

Span

write_to_workspace

static write_to_workspace[warp_role: UInt32](sm_idx: UInt32, entry_idx: UInt32, workspace: Span[UInt64, MutableAnyOrigin], timeline: Tuple[UInt64, UInt64])

dump_workspace_as_csv

static dump_workspace_as_csv(ctx: DeviceContext, workspace: Span[UInt64, MutableAnyOrigin], filename: StringSlice[StaticConstantOrigin])

Was this page helpful?