Python class

MemoryEstimator

`MemoryEstimator`

class max.pipelines.MemoryEstimator

source

Bases: object

Estimates available memory for pipeline model allocation.

`available_kv_cache_memory()`

classmethod available_kv_cache_memory(model_weights_size, activation_memory_size, model_config, devices)

source

Estimates available KV cache memory after model weights and activations.

Parameters:

model_weights_size (int) – Size of model weights.
activation_memory_size (int) – Size of activation memory.
model_config (MAXModelConfig) – The model configuration.
devices (list[Device]) – The list of devices on which the model will run.

Returns:

Available KV cache memory in bytes.

Return type:

int

`estimate_memory_footprint()`

classmethod estimate_memory_footprint(pipeline_config, model_config, arch_config, devices, model_weights_size, activation_memory_size)

source

Estimates memory footprint and validates max_length/max_batch_size fit.

Parameters:

pipeline_config (PipelineConfig)
model_config (MAXModelConfig)
arch_config (ArchConfig)
devices (list[Device])
model_weights_size (int)
activation_memory_size (int)

Return type:

None

`free_memory()`

classmethod free_memory(devices)

source

Returns the total free memory available across all provided devices.

Parameters:: devices (list[Device])
Return type:: int

`max_supported_sequence_length()`

classmethod max_supported_sequence_length(model_weights_size, activation_memory_size, model_config, devices, arch_config)

source

Computes the hard upper bound on tokens for a single request.

Mirrors the paged KV cache constraint: per replica, a request cannot exceed total pages per device times page size.

Parameters:

model_weights_size (int)
activation_memory_size (int)
model_config (MAXModelConfig)
devices (list[Device])
arch_config (ArchConfig)

Return type:

int | None

`static_memory_size()`

classmethod static_memory_size(model_weights_size, activation_memory_size)

source

Calculates static memory usage: model weights plus activations.

Parameters:

model_weights_size (int) – Size of model weights.
activation_memory_size (int) – Size of activation memory.

Returns:

Total static memory usage in bytes.

Return type:

int

MemoryEstimator​

available_kv_cache_memory()​

estimate_memory_footprint()​

free_memory()​

max_supported_sequence_length()​

static_memory_size()​

`MemoryEstimator`

`available_kv_cache_memory()`

`estimate_memory_footprint()`

`free_memory()`

`max_supported_sequence_length()`

`static_memory_size()`