Skip to main content

Python class

MemoryEstimator

MemoryEstimator

class max.pipelines.MemoryEstimator

source

Bases: object

Estimates available memory for pipeline model allocation.

available_kv_cache_memory()

classmethod available_kv_cache_memory(model_weights_size, activation_memory_size, model_config, devices)

source

Estimates available KV cache memory after model weights and activations.

Parameters:

  • model_weights_size (int) – Size of model weights.
  • activation_memory_size (int) – Size of activation memory.
  • model_config (MAXModelConfig) – The model configuration.
  • devices (list[Device]) – The list of devices on which the model will run.

Returns:

Available KV cache memory in bytes.

Return type:

int

estimate_memory_footprint()

classmethod estimate_memory_footprint(pipeline_config, model_config, arch_config, devices, model_weights_size, activation_memory_size)

source

Estimates memory footprint and validates max_length/max_batch_size fit.

Parameters:

Return type:

None

free_memory()

classmethod free_memory(devices)

source

Returns the total free memory available across all provided devices.

Parameters:

devices (list[Device])

Return type:

int

max_supported_sequence_length()

classmethod max_supported_sequence_length(model_weights_size, activation_memory_size, model_config, devices, arch_config)

source

Computes the hard upper bound on tokens for a single request.

Mirrors the paged KV cache constraint: per replica, a request cannot exceed total pages per device times page size.

Parameters:

Return type:

int | None

static_memory_size()

classmethod static_memory_size(model_weights_size, activation_memory_size)

source

Calculates static memory usage: model weights plus activations.

Parameters:

  • model_weights_size (int) – Size of model weights.
  • activation_memory_size (int) – Size of activation memory.

Returns:

Total static memory usage in bytes.

Return type:

int