# Classes

## Graph Context

- *class* qairt.optimizer.onnx.GraphContext(*model\_ir: Model*, *named\_encodings: Optional[dict[str, dict]] = None*, *named\_safetensors: Optional[dict[str, dict]] = None*, *updatable\_tensors: Optional[list[str]] = None*, *naming\_prefix: str = 'opt'*, *\_skip\_shape\_infer: bool = False*)

    - Bases: `object`

Central data structure for ONNX graph optimization operations

This class encapsulates an ONNX model along with its metadata like encodings,
safetensors, and updatable tensors. It provides methods for embedding this
metadata into the graph, extracting it, and serializing/deserializing the model

All graph passes operate on this context object rather than directly on the graph

- embed\_encodings\_into\_graph(*named\_encodings: dict[str, qairt.optimizer.onnx.utils.encodings.GraphEncodingInfo]*)

    - Embed encodings into ir graph

- Parameters

    - **named\_encodings** – a dictionary conatins all the encodings information of the graph
key is the enc\_set name, value is the corresponding graph encodings

- embed\_safetensors\_into\_graph(*named\_safetensors: dict[str, dict]*)

    - Embed safetensors into ir graph

- Parameters

    - **named\_safetensors** – a dictionary conatins all the lora saftensors information of the graph
key is the enc\_set name, value is the corresponding saftensors

- embed\_updatable\_tensors\_into\_graph(*updatable\_tensors: list[str]*)

    - Embed updatable tensors into ir graph

- Parameters

    - **updatable\_tensors** – The updatable tensor names of the model.

- export(*path: str | os.PathLike*, *prefix: str = 'model'*) → ExportedFiles

    - Export model artifacts
:param path: Directory where the artifacts are to be saved
:param prefix: Prefix to model and artifact file names. Defaults to “model”

- *classmethod* from\_files(*model\_path: str | os.PathLike*, *encodings\_path: Optional[Union[str, PathLike]] = None*, *lora\_adapters\_path: Optional[Union[str, PathLike]] = None*, *lora\_tensor\_names\_path: Optional[Union[str, PathLike]] = None*, *naming\_prefix: str = 'opt'*, *\*\*kwargs*)

    - Load the model and encodings from file and initialize a GraphContext.

- Parameters

    - - **model\_path** – Path to ONNX model.
- **encodings\_path** – Path to AIMET encodings file. Supported versions are v0.6.1 and v1.0.0.
- **lora\_adapters\_path** –

    Path to LoRA adapters YAML file (lora\_importer\_config).

    YAML schema:

use_case:
          - name: <usecase_1/adapter_1 name>
            lora_weights: <path to safetensor file for adapter_1>
            quant_overrides: <path to AIMET encodings file for adapter_1>
          - name: <usecase_2/adapter_2 name>
            lora_weights: <path to safetensor file for adapter_2>
            quant_overrides: <path to AIMET encodings file for adapter_2>
          ...
        Copy to clipboard
- **lora\_tensor\_names\_path** – Path to .txt file with updatable LoRA tensor names.

- Returns

    - Initialized graph context.

- Return type

    - GraphContext

- get\_encodings()

    - Extract encodings from the extra\_info of each tensor in the ir Graph

- get\_onnx\_proto()

    - Serialize the model into onnx.ModelProto

- get\_safetensors()

    - Extract saftensors from the extra\_info of each tensor in the ir Graph

- get\_tracing\_info(*merged=False*)

    - Get tracing information of all transformations recorded

- Parameters

    - **merged** – whether to merge the chainable transformations into one

- get\_updatable\_tensor\_names()

    - Extract updatable tensors from the extra\_info of each tensor in the ir Graph

- save\_onnx(*path: str*, *external\_data: Optional[str] = None*)

    - saved the ir Graph into the onnx file.
This function is much more memory efficient than onnx.save(get\_onnx\_proto(load\_weights=True))

- Parameters

    - - **path** – the path to save the onnx file
- **external\_data** – the filename for external data. If None, defaults to basename[:-5] + “.data”

- save\_tracing\_info(*path: str*, *merged=False*)

    - Save tracing information to the file

- Parameters

    - - **path** – the path to save the tracing information
- **merged** – whether to merge the chainable transformations into one

## Axis Denotations

- *class* qairt.optimizer.onnx.AxisDenotationConfig(*input\_ids\_name\_pattern: str = 'input\_ids'*, *inputs\_embeds\_name\_pattern: str = '(input|inputs)\_embeds'*, *hidden\_states\_name\_pattern: str = 'hidden\_states'*, *layer\_output\_name\_pattern: str = '/?model\_(layers\_\\d+\_Add/Add|embed\_tokens/Gather)\_output\_0'*, *position\_ids\_name\_pattern: str = '(swa\_)?position\_ids(\_sin|\_cos)?'*, *key\_cache\_name\_pattern: str = 'past\_key\_(\\d)+\_in'*, *value\_cache\_name\_pattern: str = 'past\_value\_(\\d)+\_in'*, *attention\_mask\_name\_pattern: str = 'attention\_mask'*, *cache\_index\_name\_pattern: str = '(swa\_)?cache\_index'*, *swa\_mask\_name\_pattern: str = 'swa\_attention\_mask'*, *swa\_key\_name\_pattern: str = 'swa\_key\_(\\d)+\_in'*, *swa\_value\_name\_pattern: str = 'swa\_value\_(\\d)+\_in'*, *recurrent\_state\_name\_pattern: str = 'recurrent\_state\_(\\d)+\_in'*, *conv\_state\_name\_pattern: str = 'conv\_state\_(\\d)+\_in'*, *linear\_attn\_mask\_name\_pattern: str = 'linear\_attn\_mask'*, *transposed\_key\_cache: bool = True*, *lora\_alpha\_name\_pattern: str = 'lora\_alpha'*, *custom\_seed\_rules: list[qairt.optimizer.onnx.passes.axis\_denotation\_infer.config.AxisDenotationSeedRule] = &lt;factory&gt;*)

    - Bases: `PassConfig`

Configuration for axis denotation inference passes

This config specifies regex patterns to match ONNX graph input names and determine
what each dimension represents. It’s used to bootstrap the denotation inference
process by identifying known tensor patterns in LLM models

### How it works:

1. The name\_pattern is compiled as a Python regex pattern
2. If a graph input name matches the name pattern, the tensor is initialized with the denotations
3. These initial denotations propagate through the graph via inference passes

The config includes built-in patterns for common LLM inputs (input\_ids,
attention\_mask, KV caches, etc.) and supports custom patterns for non-standard
models via custom\_seed\_rules

### Pattern Matching Priority:

- Custom seed rules (custom\_seed\_rules) are checked FIRST
- Rules are evaluated in the order they appear in the list
- The FIRST matching rule is used (subsequent rules are skipped)
- If no custom rule matches, built-in patterns are tried
- This allows overriding built-in patterns or handling non-standard naming

Example:

# Basic usage with defaults
    config = AxisDenotationConfig()
    
    # Custom model with non-standard names
    config = AxisDenotationConfig(
        custom_seed_rules=[
            AxisDenotationSeedRule(
                name_pattern=r"my_custom_input",
                denotations=[AxisDenotation.BATCH, AxisDenotation.SEQ_LENGTH],
            ),
            AxisDenotationSeedRule(
                name_pattern=r"my_cache_\d+",
                denotations=[
                    AxisDenotation.BATCH,
                    AxisDenotation.UNKNOWN,
                    AxisDenotation.PAST_SEQ_LENGTH,
                    AxisDenotation.UNKNOWN,
                ],
            ),
        ],
    )
    Copy to clipboard

- All pattern attributes use Python regex syntax and are case-insensitive

    - 

- Patterns must match the entire input name

    - - Type

    - fullmatch, not search

- attention\_mask\_name\_pattern*: str*  *= 'attention\_mask'*

    - Naming pattern for attention mask

- cache\_index\_name\_pattern*: str*  *= '(swa\_)?cache\_index'*

    - Naming pattern for cache\_index, including for Sliding Window Attention(SWA) models

- conv\_state\_name\_pattern*: str*  *= 'conv\_state\_(\\d)+\_in'*

    - Naming pattern for convolution (local) state inputs in linear attention models.

These inputs have shape [BATCH, UNKNOWN, UNKNOWN, UNKNOWN]

- custom\_seed\_rules*: list[qairt.optimizer.onnx.passes.axis\_denotation\_infer.config.AxisDenotationSeedRule]*

    - Custom seed rules for models with non-standard tensor names
Checked first before built-in patterns

- hidden\_states\_name\_pattern*: str*  *= 'hidden\_states'*

    - Naming pattern for hidden\_states input used in speculative decoding techniques like EAGLE/EAGLET

EAGLE/EAGLET (Extrapolation Algorithm for Greater Language-model Efficiency) is a speculative
decoding technique that speeds up LLM inference by predicting future tokens directly from
the “hidden states” of the target model. Instead of using a separate draft model, EAGLE uses
a lightweight “EAGLE head” (small transformer layer) that takes hidden states from the main
LLM to generate candidate tokens.

This input has shape [BATCH, SEQ\_LENGTH, UNKNOWN]

- input\_ids\_name\_pattern*: str*  *= 'input\_ids'*

    - input\_ids\_name\_pattern would apply only to the first split in an LLM
This input has the shape [BATCH, SEQ\_LENGTH]

- inputs\_embeds\_name\_pattern*: str*  *= '(input|inputs)\_embeds'*

    - Naming pattern for pre-computed embedding inputs.

Some LLMs accept pre-computed embeddings instead of token IDs, bypassing the
embedding lookup (Gather) layer entirely. Both `inputs_embeds` and
`input_embeds` naming conventions are found in practice.

This input has shape [BATCH, SEQ\_LENGTH, UNKNOWN] where the last dimension is
the hidden/embedding dimension.

- key\_cache\_name\_pattern*: str*  *= 'past\_key\_(\\d)+\_in'*

    - 

- layer\_output\_name\_pattern*: str*  *= '/?model\_(layers\_\\d+\_Add/Add|embed\_tokens/Gather)\_output\_0'*

    - layer\_output\_name\_pattern would apply only to splits apart from the first
Matches:
- Residual Add outputs: /model\_layers\_0\_Add/Add\_output\_0
- Embedding Gather outputs: /model/embed\_tokens/Gather\_output\_0

These inputs have shape [BATCH, SEQ\_LENGTH, UNKNOWN]

- linear\_attn\_mask\_name\_pattern*: str*  *= 'linear\_attn\_mask'*

    - Naming pattern for the linear attention mask input.

This input has shape [BATCH, SEQ\_LENGTH, UNKNOWN]

- lora\_alpha\_name\_pattern*: str*  *= 'lora\_alpha'*

    - Naming pattern for the LoRA alpha scaling tensor.

This input is created dynamically by the LoraAlphaExtractor or LoRA model creator,
which promotes the baked-in alpha constant from the LoRA `Mul` node into a named graph
input so that it can be updated at runtime without recompiling the model.

The tensor carries no sequence or batch semantics — all axes are `UNKNOWN`.

- position\_ids\_name\_pattern*: str*  *= '(swa\_)?position\_ids(\_sin|\_cos)?'*

    - This should cover both RoPE and absolute position encodings

- recurrent\_state\_name\_pattern*: str*  *= 'recurrent\_state\_(\\d)+\_in'*

    - Naming pattern for recurrent (global) state inputs in linear attention models.

These inputs have shape [BATCH, UNKNOWN, UNKNOWN, UNKNOWN]

- swa\_key\_name\_pattern*: str*  *= 'swa\_key\_(\\d)+\_in'*

    - 

- swa\_mask\_name\_pattern*: str*  *= 'swa\_attention\_mask'*

    - Naming pattern for attention mask for Sliding Window Attention(SWA) models

- swa\_value\_name\_pattern*: str*  *= 'swa\_value\_(\\d)+\_in'*

    - Naming pattern for SWA KV cache

- transposed\_key\_cache*: bool*  *= True*

    - 

- value\_cache\_name\_pattern*: str*  *= 'past\_value\_(\\d)+\_in'*

    - Naming pattern for KV cache input

- *class* qairt.optimizer.onnx.AxisDenotationSeedRule(*name\_pattern: str*, *denotations: list[qairt.optimizer.onnx.utils.ir\_extra\_info.AxisDenotation]*)

    - Bases: `object`

Seed rule for bootstrapping axis denotations based on ONNX graph input name patterns

This rule maps tensor name patterns to lists of axis denotations. When an ONNX graph
input’s name matches the pattern (using regex fullmatch), the specified denotations
are assigned to its axes. This bootstraps the denotation inference process.

### How seed rules work:

1. The name\_pattern is compiled as a Python regex pattern
2. If a graph input name matches the name pattern, the denotations is assigned to the tensor
3. The length of denotations must match the tensor’s rank

- param name\_pattern

    - Regex pattern to match ONNX graph input names (case-insensitive)
Uses Python regex syntax. The pattern must match the entire name

- param denotations

    - List of AxisDenotation values to assign to matching tensors
Length must equal the tensor’s rank

Example:

# For a custom input named "my_input_0" with shape [batch, seq_len]:
    AxisDenotationSeedRule(
        name_pattern=r"my_input_\d+",
        denotations=[AxisDenotation.BATCH, AxisDenotation.SEQ_LENGTH],
    )
    
    # For KV cache inputs with shape [batch, heads, past_seq, head_dim]:
    AxisDenotationSeedRule(
        name_pattern=r"past_key_\d+_in",
        denotations=[
            AxisDenotation.BATCH,
            AxisDenotation.UNKNOWN,
            AxisDenotation.PAST_SEQ_LENGTH,
            AxisDenotation.UNKNOWN,
        ],
    )
    Copy to clipboard

- denotations*: list[qairt.optimizer.onnx.utils.ir\_extra\_info.AxisDenotation]*

    - 

- name\_pattern*: str*

    -

- *class* qairt.optimizer.onnx.AxisDenotation(*value*)

    - Bases: `str`, `Enum`

Axis denotation values for tensor dimensions

Denotation describes what each dimension represents in the context of LLM models.
This aligns with ONNX’s dimension denotation concept, specialized for LLMs.

- Values:
    - BATCH: Batch size dimension
SEQ\_LENGTH: Current sequence length (autoregressive length)
PAST\_SEQ\_LENGTH: Past sequence length in KV cache
CONTEXT\_LENGTH: Global context length (PAST\_SEQ\_LENGTH + SEQ\_LENGTH)
SLIDING\_CONTEXT\_LENGTH: Sliding-window context length for SWA models
UNKNOWN: Dimension meaning is unknown or not yet inferred

- BATCH *= 'BATCH'*

    - 

- CONTEXT\_LENGTH *= 'CONTEXT\_LENGTH'*

    - 

- PAST\_SEQ\_LENGTH *= 'PAST\_SEQ\_LENGTH'*

    - 

- SEQ\_LENGTH *= 'SEQ\_LENGTH'*

    - 

- SLIDING\_CONTEXT\_LENGTH *= 'SLIDING\_CONTEXT\_LENGTH'*

    - 

- UNKNOWN *= 'UNKNOWN'*

    -

Last Published: May 08, 2026

Previous Topic
 
adapt\_moe() Next Topic

Utilities