# Advanced Features

This page covers Gen AI Builder features that go beyond the core build-and-deploy workflow.

On this page

- [LoRA Adapters](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#lora-adapters)
- [Speculative Decoding](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#speculative-decoding)
- [Attaching Child Builders for Prefill/Decode Graphs](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#attaching-child-builders-for-prefill-decode-graphs)

## [LoRA Adapters](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#id1)

LoRA (Low-Rank Adaptation) adapters let you fine-tune a deployed model without recompiling the
full context binary. Setting `lora_config` instructs the builder to construct the LoRA graph
and import adapter weights during the build.

from qairt.modules.lora.lora_config import LoraBuilderInputConfig
    
    builder.lora_config = LoraBuilderInputConfig(
        lora_config_path="path/to/lora_config.yaml",
        create_lora_graph=True,
        quant_updatable_mode="adapter_only",
        alpha_tensor_name="lora_alpha",
    )
    Copy to clipboard

When `lora_config` is set, the builder automatically enables `split_embedding` and
`split_lm_head` regardless of the model defaults.

See also

[Low-Rank Adaptation (LoRA) Tutorial](https://docs.qualcomm.com/doc/80-87189-2/topic/lora_tutorial.html) for a full end-to-end walkthrough.

## [Speculative Decoding](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#id2)

Speculative decoding accelerates token generation by predicting ahead and verifying multiple
tokens in a single forward pass. Set `speculative_config` to one of the three supported
methods before calling `build()`.

### LADE (Look-Ahead Decoding)

from qairt.gen_ai_api.configs.lade_config import LadeBuilderConfig
    
    builder.speculative_config = LadeBuilderConfig(window=8, ngram=5, gcap=8)
    Copy to clipboard

### SSD (Self-Speculative Decoding)

from qairt.gen_ai_api.configs.ssd_config import SsdBuilderConfig
    
    builder.speculative_config = SsdBuilderConfig(
        forecast_token_count=4,
        forecast_prefix=16,
        branches=[4, 4],
        ssd_tensor_file="./ssd_tensor.pt",
    )
    Copy to clipboard

### Eaglet (EAGLE-Based Decoding)

from qairt.gen_ai_api.configs.eaglet_config import EagletBuilderConfig
    
    builder.speculative_config = EagletBuilderConfig(
        draft_len=6,
        n_branches=6,
        draft_model_path="./draft_model.onnx",
        draft_token_map="./draft_token_map.json",
    )
    Copy to clipboard

See also

[Speculative Decoding Tutorial](https://docs.qualcomm.com/doc/80-87189-2/topic/speculative_decoding_tutorial.html) for a full end-to-end
walkthrough of each method.

## [Attaching Child Builders for Prefill/Decode Graphs](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#id3)

The parent builder represents one graph (typically prefill). Use
`attach_model_for_arn()` to attach a child builder for a second graph (typically
decode). The child inherits all parent configuration — compilation, conversion,
calibration, and transformation — and is compiled weight-shared with the parent.
Any of these settings can then be overridden independently on the returned child.

child = builder.attach_model_for_arn(
        arn=1,
        model_path="path/to/ar1_model.onnx",
        encodings_path="path/to/ar1_model.encodings",  # optional
        name="decode",  # optional; defaults to "attached_ar1", used as the cache subdirectory
    )
    Copy to clipboard

To customise the child’s transformation options (e.g. different MHA2SHA start points
for the decode graph):

child.set_transformation_options(options={
        "mha2sha.m2s_additional_start_points": [M2sStartPoint(...)],
    })
    Copy to clipboard

To override per-graph compilation options on the child (e.g. a different VTCM size
for the decode graph only):

child.set_compilation_options(options={"graphs.vtcm_size_in_mb": 8})
    Copy to clipboard

Note

Only graph-level compile options (`graphs.*`) may differ between parent and child.
Because parent and child share weights, all other compile configuration (context,
device, compiler, backend) must be identical — `sync_attached_builder_options()`
enforces this before the build begins.

Last Published: Jun 19, 2026

[Previous Topic
Serializing to JSON](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/genai_backend_extensions.md) [Next Topic
Migrating from Notebook Workflows](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/genai_migration.md)