For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
PagedMemoryPlanner
PagedMemoryPlannerβ
class max.pipelines.kv_cache.PagedMemoryPlanner(config)
Bases: MemoryPlanner
Memory planner for models that use a paged KV cache.
This is the standard planner for autoregressive text-generation models.
It delegates KV-parameter queries to the model config via the
ModelConfigWithKVCache protocol.
For models that require a fixed activation-memory reservation (e.g. VLMs
that need headroom for vision processing), use
with_activation_reservation() to create a pre-configured subclass
instead of writing a custom MemoryPlanner:
memory_planner=PagedMemoryPlanner.with_activation_reservation(
15 * 1024**3
)-
Parameters:
-
config (Any) β Model configuration that implements
ModelConfigWithKVCache(i.e. exposes bothdevicesandget_kv_params). -
Raises:
-
TypeError β If
configdoes not implementModelConfigWithKVCache.
Initializes the paged memory planner.
-
Parameters:
-
config (Any) β Must implement
ModelConfigWithKVCache. -
Raises:
-
TypeError β If
configdoes not satisfyModelConfigWithKVCache.
estimate_activation_memory()β
estimate_activation_memory(pipeline_config, huggingface_config)
Returns the fixed activation-memory reservation for this planner.
The default is 0. Subclasses created via
with_activation_reservation() return the configured value.
with_activation_reservation()β
classmethod with_activation_reservation(activation_bytes, always_signal_buffers=False)
Returns a PagedMemoryPlanner subclass with a fixed activation-memory reservation.
Use this instead of writing a custom MemoryPlanner subclass for
architectures that simply need to reserve a fixed chunk of GPU memory
before KV cache allocation (e.g. for vision processing headroom):
memory_planner=PagedMemoryPlanner.with_activation_reservation(
15 * 1024**3 # 15 GiB
)For models that perform allreduce unconditionally (e.g. VLMs using
VocabParallelEmbedding), pass always_signal_buffers=True so
signal-buffer memory is reserved even on single-GPU:
memory_planner=PagedMemoryPlanner.with_activation_reservation(
15 * 1024**3, always_signal_buffers=True
)-
Parameters:
-
Returns:
-
A new
PagedMemoryPlannersubclass whoseestimate_activation_memory()returnsactivation_bytes. -
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!