Python class
MemoryEstimator
MemoryEstimator
class max.pipelines.MemoryEstimator
Bases: object
Estimates available memory for pipeline model allocation.
available_kv_cache_memory()
classmethod available_kv_cache_memory(model_weights_size, activation_memory_size, model_config, devices)
Estimates available KV cache memory after model weights and activations.
-
Parameters:
-
- model_weights_size (int) – Size of model weights.
- activation_memory_size (int) – Size of activation memory.
- model_config (MAXModelConfig) – The model configuration.
- devices (list[Device]) – The list of devices on which the model will run.
-
Returns:
-
Available KV cache memory in bytes.
-
Return type:
estimate_memory_footprint()
classmethod estimate_memory_footprint(pipeline_config, model_config, arch_config, devices, model_weights_size, activation_memory_size)
Estimates memory footprint and validates max_length/max_batch_size fit.
-
Parameters:
-
- pipeline_config (PipelineConfig)
- model_config (MAXModelConfig)
- arch_config (ArchConfig)
- devices (list[Device])
- model_weights_size (int)
- activation_memory_size (int)
-
Return type:
-
None
free_memory()
classmethod free_memory(devices)
Returns the total free memory available across all provided devices.
max_supported_sequence_length()
classmethod max_supported_sequence_length(model_weights_size, activation_memory_size, model_config, devices, arch_config)
Computes the hard upper bound on tokens for a single request.
Mirrors the paged KV cache constraint: per replica, a request cannot exceed total pages per device times page size.
-
Parameters:
-
- model_weights_size (int)
- activation_memory_size (int)
- model_config (MAXModelConfig)
- devices (list[Device])
- arch_config (ArchConfig)
-
Return type:
-
int | None
static_memory_size()
classmethod static_memory_size(model_weights_size, activation_memory_size)
Calculates static memory usage: model weights plus activations.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!