Skip to main content

Mojo module

ring_buffer

Ring buffer implementation for producer-consumer synchronization in GPU kernels.

This module provides a ring buffer abstraction that enables efficient overlap of memory transfers and computation in matrix multiplication kernels. The pattern divides work between:

  • Producer: One warp group that loads tiles from global to shared memory
  • Consumers: Multiple warp groups that process tiles using tensor cores

The ring buffer uses barrier synchronization to coordinate access to a circular queue of tile buffers, allowing the producer to work ahead while consumers process previously loaded data.

Usage Example: # Create ring buffer during kernel initialization var ring_buffer = RingBuffer[...](full_mbar, empty_mbar, ...)

# Producer pattern
with ring_buffer.producer() as producer:
    while has_work():
        with producer.get_tiles() as tiles:
            # Load data into tiles.a_tile and tiles.b_tile
            load_tile(tiles.a_tile, tiles.barrier)

# Consumer pattern
with ring_buffer.consumer() as consumer:
    while has_work():
        with consumer.get_tiles() as tiles:
            # Process tiles.a_tile and tiles.b_tile
            gemm(tiles.a_tile, tiles.b_tile, output)

Structs

Was this page helpful?