Mojo module
ring_buffer
Ring buffer implementation for producer-consumer synchronization in GPU kernels.
This module provides a ring buffer abstraction that enables efficient overlap of memory transfers and computation in matrix multiplication kernels. The pattern divides work between:
- Producer: One warp group that loads tiles from global to shared memory
- Consumers: Multiple warp groups that process tiles using tensor cores
The ring buffer uses barrier synchronization to coordinate access to a circular queue of tile buffers, allowing the producer to work ahead while consumers process previously loaded data.
Usage Example: # Create ring buffer during kernel initialization var ring_buffer = RingBuffer[...](full_mbar, empty_mbar, ...)
# Producer pattern
with ring_buffer.producer() as producer:
while has_work():
with producer.get_tiles() as tiles:
# Load data into tiles.a_tile and tiles.b_tile
load_tile(tiles.a_tile, tiles.barrier)
# Consumer pattern
with ring_buffer.consumer() as consumer:
while has_work():
with consumer.get_tiles() as tiles:
# Process tiles.a_tile and tiles.b_tile
gemm(tiles.a_tile, tiles.b_tile, output)Structs
-
ConsumerTiles: Context manager for consumer access to ring buffer tiles. -
ProducerTiles: Context manager for producer access to ring buffer tiles. -
RingBuffer: Ring buffer for managing pipeline synchronization between producers and consumers. -
RingBufferConsumer: Consumer view of the ring buffer. -
RingBufferProducer: Producer view of the ring buffer.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!