IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

PagedMemoryPlanner

PagedMemoryPlanner​

class max.pipelines.kv_cache.PagedMemoryPlanner(config)

source

Bases: MemoryPlanner

Memory planner for models that use a paged KV cache.

This is the standard planner for autoregressive text-generation models. It delegates KV-parameter queries to the model config via the ModelConfigWithKVCache protocol.

For models that require a fixed activation-memory reservation (e.g. VLMs that need headroom for vision processing), use with_activation_reservation() to create a pre-configured subclass instead of writing a custom MemoryPlanner:

memory_planner=PagedMemoryPlanner.with_activation_reservation(
    15 * 1024**3
)

Parameters:

config (Any) – Model configuration that implements ModelConfigWithKVCache (i.e. exposes both devices and get_kv_params).

Raises:

TypeError – If config does not implement ModelConfigWithKVCache.

Initializes the paged memory planner.

Parameters:

config (Any) – Must implement ModelConfigWithKVCache.

Raises:

TypeError – If config does not satisfy ModelConfigWithKVCache.

estimate_activation_memory()​

estimate_activation_memory(pipeline_config, huggingface_config)

source

Returns the fixed activation-memory reservation for this planner.

The default is 0. Subclasses created via with_activation_reservation() return the configured value.

Parameters:

  • pipeline_config (Any) – Unused by the default implementation.
  • huggingface_config (Any) – Unused by the default implementation.

Returns:

Activation memory reservation in bytes.

Return type:

int

with_activation_reservation()​

classmethod with_activation_reservation(activation_bytes, always_signal_buffers=False)

source

Returns a PagedMemoryPlanner subclass with a fixed activation-memory reservation.

Use this instead of writing a custom MemoryPlanner subclass for architectures that simply need to reserve a fixed chunk of GPU memory before KV cache allocation (e.g. for vision processing headroom):

memory_planner=PagedMemoryPlanner.with_activation_reservation(
    15 * 1024**3  # 15 GiB
)

For models that perform allreduce unconditionally (e.g. VLMs using VocabParallelEmbedding), pass always_signal_buffers=True so signal-buffer memory is reserved even on single-GPU:

memory_planner=PagedMemoryPlanner.with_activation_reservation(
    15 * 1024**3, always_signal_buffers=True
)

Parameters:

  • activation_bytes (int) – Activation memory to reserve in bytes.
  • always_signal_buffers (bool) – When True, reserve signal-buffer memory even on single-device pipelines.

Returns:

A new PagedMemoryPlanner subclass whose estimate_activation_memory() returns activation_bytes.

Return type:

type[PagedMemoryPlanner]