# Advanced Features

This page covers Gen AI Builder features that go beyond the core build-and-deploy workflow.

On this page

- LoRA Adapters
- Speculative Decoding
- Attaching Models for Specific AR Values

## LoRA Adapters

LoRA (Low-Rank Adaptation) adapters let you fine-tune a deployed model without recompiling the
full context binary. Setting `lora_config` instructs the builder to construct the LoRA graph
and import adapter weights during the build.

from qairt.modules.lora.lora_config import LoraBuilderInputConfig
    
    builder.lora_config = LoraBuilderInputConfig(
        lora_config_path="path/to/lora_config.yaml",
        create_lora_graph=True,
        quant_updatable_mode="adapter_only",
        alpha_tensor_name="lora_alpha",
    )
    Copy to clipboard

When `lora_config` is set, the builder automatically enables `split_embedding` and
`split_lm_head` regardless of the model defaults.

See also

Low-Rank Adaptation (LoRA) Tutorial for a full end-to-end walkthrough.

## Speculative Decoding

Speculative decoding accelerates token generation by predicting ahead and verifying multiple
tokens in a single forward pass. Set `speculative_config` to one of the three supported
methods before calling `build()`.

### LADE (Look-Ahead Decoding)

from qairt.gen_ai_api.configs.lade_config import LadeBuilderConfig
    
    builder.speculative_config = LadeBuilderConfig(window=8, ngram=5, gcap=8)
    Copy to clipboard

### SSD (Self-Speculative Decoding)

from qairt.gen_ai_api.configs.ssd_config import SsdBuilderConfig
    
    builder.speculative_config = SsdBuilderConfig(
        forecast_token_count=4,
        forecast_prefix=16,
        branches=[4, 4],
        ssd_tensor_file="./ssd_tensor.pt",
    )
    Copy to clipboard

### Eaglet (EAGLE-Based Decoding)

from qairt.gen_ai_api.configs.eaglet_config import EagletBuilderConfig
    
    builder.speculative_config = EagletBuilderConfig(
        draft_len=6,
        n_branches=6,
        draft_model_path="./draft_model.onnx",
        draft_token_map="./draft_token_map.json",
    )
    Copy to clipboard

See also

Speculative Decoding Tutorial for a full end-to-end
walkthrough of each method.

## Attaching Models for Specific AR Values

By default the builder generates AR variants automatically via AR/CL conversion. Use
`attach_model_for_arn()` to pin a pre-converted ONNX model to a specific auto-regression
number, bypassing the conversion step for that AR value:

builder.attach_model_for_arn(
        arn=1,
        model_path="path/to/ar1_model.onnx",
        encodings_path="path/to/ar1_model.encodings",  # optional
    )
    Copy to clipboard

This is useful when you have an externally prepared decode-phase model (AR=1) and want the
builder to handle the remaining AR values normally.

Last Published: May 08, 2026

Previous Topic
 
Conversion Options Next Topic

Migrating from Notebook Workflows