For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
MemoryPlanner
MemoryPlannerβ
class max.pipelines.kv_cache.MemoryPlanner(config)
Bases: object
Base class for pipeline model memory planning.
Provides default implementations for all estimation methods. Subclasses override the methods that require architecture-specific logic:
- Estimating KV cache memory requirements.
- Estimating activation, weight, signal-buffer, and vision-cache memory overheads specific to the model.
A MemoryPlanner is constructed from a ModelConfig alone (not from a
full PipelineConfig) so that it can be used independently of the
pipeline stack.
Initializes the memory planner with the model config.
-
Parameters:
-
config (Any) β Model configuration.
estimate_activation_memory()β
estimate_activation_memory(pipeline_config, huggingface_config)
Estimates activation memory beyond model weights.
The default implementation returns 0. Override in subclasses that
require temporary buffers for large intermediate tensors (e.g. MLA
up-projection during prefill, expert-parallel routing buffers).
estimate_signal_buffer_memory()β
estimate_signal_buffer_memory(pipeline_config, arch_config=None)
Estimates signal-buffer memory in bytes across all devices.
Signal buffers are fixed-size per-GPU allocations used by P2P
collectives. The default returns 0 for single-device pipelines and
delegates to pipeline_config.estimate_signal_buffer_memory for
multi-device.
Models that perform allreduce unconditionally (e.g. via
VocabParallelEmbedding) need signal buffers even on a single device.
Set always_signal_buffers=True on the planner class to enable this.
-
Parameters:
-
Returns:
-
Estimated signal-buffer memory in bytes across all devices.
-
Return type:
estimate_vision_cache_entry_bytes()β
estimate_vision_cache_entry_bytes(huggingface_config)
Estimates bytes for one vision encoder cache entry.
The default implementation returns 0. Override in VLM planners to
return the worst-case memory for a single max-resolution image after the
vision encoderβs spatial merge / patch merge step.
estimate_weights_size()β
estimate_weights_size(pipeline_config)
Estimates the memory consumed by model weights in bytes.
The default implementation delegates to
pipeline_config.model.weights_size(). Override in subclasses that
need architecture-specific weight accounting (e.g. expert-parallel
sharding adjustments).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!