Mojo module
device_query
Provides device query utilities for communication primitives.
comptime values
allreduce_table
comptime allreduce_table = Table(List(VariadicList(TuningConfigAllreduce(-1, -1, StringSlice("sm_90a"), 216), TuningConfigAllreduce(4, 134217728, StringSlice("sm_90a"), 232), TuningConfigAllreduce(-1, -1, StringSlice("sm_100a"), 512), TuningConfigAllreduce(2, 8388608, StringSlice("sm_100a"), 512), TuningConfigAllreduce(2, 16777216, StringSlice("sm_100a"), 512), TuningConfigAllreduce(2, 33554432, StringSlice("sm_100a"), 512), TuningConfigAllreduce(2, 67108864, StringSlice("sm_100a"), 512), TuningConfigAllreduce(2, 134217728, StringSlice("sm_100a"), 512), TuningConfigAllreduce(4, 8388608, StringSlice("sm_100a"), 512), TuningConfigAllreduce(4, 16777216, StringSlice("sm_100a"), 512), TuningConfigAllreduce(4, 33554432, StringSlice("sm_100a"), 512), TuningConfigAllreduce(4, 67108864, StringSlice("sm_100a"), 512), TuningConfigAllreduce(4, 134217728, StringSlice("sm_100a"), 512), TuningConfigAllreduce(-1, -1, StringSlice("CDNA3"), 32), TuningConfigAllreduce(-1, -1, StringSlice("CDNA4"), 64), TuningConfigAllreduce(8, 1048576, StringSlice("CDNA4"), 64), TuningConfigAllreduce(8, 2147483648, StringSlice("CDNA4"), 44)), Tuple()), String("allreduce_table"))
Structs
-
TuningConfigAllreduce: Parameters: ngpus: Number of GPUs for running allreduce. num_bytes: Total number of input bytes supported by the config. sm_version: SM version (as string). num_blocks: Number of thread blocks for running allreduce.
Functions
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!