# Advanced Features

This page covers Gen AI Builder features that go beyond the core build-and-deploy workflow.

On this page

- [LoRA Adapters](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#lora-adapters)
- [Speculative Decoding](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#speculative-decoding)
- [Attaching Models for Specific AR Values](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#attaching-models-for-specific-ar-values)

## [LoRA Adapters](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#id1)

LoRA (Low-Rank Adaptation) adapters let you fine-tune a deployed model without recompiling the
full context binary. Setting `lora_config` instructs the builder to construct the LoRA graph
and import adapter weights during the build.

from qairt.modules.lora.lora_config import LoraBuilderInputConfig
    
    builder.lora_config = LoraBuilderInputConfig(
        lora_config_path="path/to/lora_config.yaml",
        create_lora_graph=True,
        quant_updatable_mode="adapter_only",
        alpha_tensor_name="lora_alpha",
    )
    Copy to clipboard

When `lora_config` is set, the builder automatically enables `split_embedding` and
`split_lm_head` regardless of the model defaults.

See also

[Low-Rank Adaptation (LoRA) Tutorial](https://docs.qualcomm.com/doc/80-87189-2/topic/lora_tutorial.html) for a full end-to-end walkthrough.

## [Speculative Decoding](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#id2)

Speculative decoding accelerates token generation by predicting ahead and verifying multiple
tokens in a single forward pass. Set `speculative_config` to one of the three supported
methods before calling `build()`.

### LADE (Look-Ahead Decoding)

from qairt.gen_ai_api.configs.lade_config import LadeBuilderConfig
    
    builder.speculative_config = LadeBuilderConfig(window=8, ngram=5, gcap=8)
    Copy to clipboard

### SSD (Self-Speculative Decoding)

from qairt.gen_ai_api.configs.ssd_config import SsdBuilderConfig
    
    builder.speculative_config = SsdBuilderConfig(
        forecast_token_count=4,
        forecast_prefix=16,
        branches=[4, 4],
        ssd_tensor_file="./ssd_tensor.pt",
    )
    Copy to clipboard

### Eaglet (EAGLE-Based Decoding)

from qairt.gen_ai_api.configs.eaglet_config import EagletBuilderConfig
    
    builder.speculative_config = EagletBuilderConfig(
        draft_len=6,
        n_branches=6,
        draft_model_path="./draft_model.onnx",
        draft_token_map="./draft_token_map.json",
    )
    Copy to clipboard

See also

[Speculative Decoding Tutorial](https://docs.qualcomm.com/doc/80-87189-2/topic/speculative_decoding_tutorial.html) for a full end-to-end
walkthrough of each method.

## [Attaching Models for Specific AR Values](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#id3)

By default the builder generates AR variants automatically via AR/CL conversion. Use
`attach_model_for_arn()` to pin a pre-converted ONNX model to a specific auto-regression
number, bypassing the conversion step for that AR value:

builder.attach_model_for_arn(
        arn=1,
        model_path="path/to/ar1_model.onnx",
        encodings_path="path/to/ar1_model.encodings",  # optional
    )
    Copy to clipboard

This is useful when you have an externally prepared decode-phase model (AR=1) and want the
builder to handle the remaining AR values normally.

Last Published: May 26, 2026

[Previous Topic
Serializing to JSON](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/genai_backend_extensions.md) [Next Topic
Migrating from Notebook Workflows](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/genai_migration.md)