# BuilderTransformerConfig

- *class* qairt.gen\_ai\_api.configs.builder\_transformer\_config.BuilderTransformerConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

- backend*: [BackendType](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-api-configs.html#qairt.api.configs.common.BackendType)*  *= 'HTP'*

    - backend to use for the model

- model\_transformer\_config*: ModelTransformerConfig*  *= ModelTransformerConfig(arn\_cl\_options=ARn\_ContextLengthConfig(context\_length=[4096], auto\_regression\_number=[1, 128], skip\_ar\_cl\_conversion=False, axis\_denotation\_config=None), split\_model=SplitModelConfig(num\_splits=1, split\_embedding=False, split\_lm\_head=False, skip\_verification=False, log\_level='info', input\_ids\_name='input\_ids', input\_embeds\_name='inputs\_embeds'), mha\_config=None, adapt\_moe=None)*

    - additional transformation-specific configurations

## GenAIConfig

- *class* qairt.gen\_ai\_api.configs.gen\_ai\_config.EmbeddingConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

EmbeddingConfig holds configuration information for the embedding table LUT.

- embed\_datatype*: str*

    - Embedding datatype.

- embed\_length*: int*

    - Embedding length.

- embed\_path*: str | os.PathLike*

    - Path to embedding table LUT.

- embed\_quant\_offset*: Optional[int]*  *= None*

    - Embedding quant offset.

- embed\_quant\_scale*: Optional[float]*  *= None*

    - Embedding quant scale.

- *class* qairt.gen\_ai\_api.configs.gen\_ai\_config.ExpertConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

Configuration for Mixture-of-Experts inference behaviour.

- enable\_expert\_subselection*: bool*  *= False*

    - Enable expert subselection optimisation.

- enable\_op\_predication*: bool*  *= False*

    - Enable operation predication for expert routing.

- *classmethod* from\_pretrained\_config(*config: PretrainedConfig*) → [ExpertConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs.html#qairt.gen_ai_api.configs.gen_ai_config.ExpertConfig)

    - Create an ExpertConfig from a HuggingFace PretrainedConfig.
Fields default to False when absent from the HF config.

- *class* qairt.gen\_ai\_api.configs.gen\_ai\_config.GenAIConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

GenAIConfig holds common configuration information for the Generative AI Model, needed for Genie
execution.  Common attributes (present in all subclasses):

- adapter\_count\_by\_use\_case*: Optional[Dict[str, int]]*  *= {}*

    - Dict of number of adapters per use case.

- allow\_async\_init*: Optional[bool]*  *= None*

    - Deprecated since version Set: this via `EngineConfig.htp.allow_async_init`
instead and pass it to `get_executor()` / `T2TExecutor`.  Support
here will be removed in a future release.

- alpha\_tensor\_name*: Optional[str]*  *= ''*

    - Name of the tensor where LoRA adapter is being applied.

- bos\_token*: int*

    - The id of the beginning of stream token.

- chat\_template*: Union[NullChatTemplate, HFChatTemplate, CustomChatTemplate]*  *= FieldInfo(annotation=NoneType, required=False, default\_factory=NullChatTemplate, discriminator='type')*

    - Chat template for message formatting.
Supports NullChatTemplate, HFChatTemplate, or CustomChatTemplate instances.

- context\_length*: int*

    - context length

- embedding\_config*: Optional[[EmbeddingConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs.html#qairt.gen_ai_api.configs.gen_ai_config.EmbeddingConfig)]*  *= None*

    - Embedding config.

- enable\_graph\_switching*: Optional[bool]*  *= None*

    - Deprecated since version Set: this via `EngineConfig.htp.enable_graph_switching`
instead and pass it to `get_executor()` / `T2TExecutor`.  Support
here will be removed in a future release.

- eos\_token*: int | list[int]*

    - The id of the end of stream token.

- eot\_token*: Optional[int]*  *= None*

    - The id of the end of turn token.

- expert\_config*: Optional[[ExpertConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs.html#qairt.gen_ai_api.configs.gen_ai_config.ExpertConfig)]*  *= None*

    - MoE expert configuration. Populated automatically for MoE architectures.

- kv\_dim*: Optional[int]*  *= None*

    - dimension of the kv cache

- linear\_attention*: Optional[bool]*  *= None*

    - enable linear/hybrid attention handling (e.g. Qwen3.5 SSM layers)

- n\_embd*: Optional[int]*  *= None*

    - The hidden size of the model

- n\_heads*: Optional[int]*  *= None*

    - The number of attention heads used in the multi-head attention layers of the model

- n\_layer*: Optional[int]*  *= None*

    - The number of blocks in the model

- n\_vocab*: int*

    - The number of tokens in the vocabulary, which is also the first dimension of the embeddings matrix

- positional\_encoding*: Optional[[PositionalEncoding](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.PositionalEncoding)]*  *= None*

    - An object describing the positional encodings

- rope\_scaling*: Optional[[RopeScaling](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.RopeScaling)]*  *= None*

    - rope scaling configuration for extended context

- rope\_theta*: Optional[float]*  *= None*

    - theta value for rotational positional encoding

- sampler\_params*: Optional[[Sampler](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.Sampler)]*  *= None*

    - Model-specific sampler parameters for optimal model performance.
Applied automatically by the executor.

- speculative\_run\_config*: Optional[Union[LadeRunConfig, EagletRunConfig, [SsdRunConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.SsdRunConfig)]]*  *= None*

    - Speculative decoding config.

- tokenizer\_path*: str | os.PathLike*

    - The path to the tokenizer.  Must point to an existing file.

- validate\_tokenizer\_path(*v*)

    -

## EngineConfig

`EngineConfig` carries runtime engine and deployment
settings (such as thread count and HTP tuning via
`HTPEngineConfig`) separately from build-time
configuration. Pass it to
[`export()`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-containers.html#qairt.gen_ai_api.containers.llm_container.LLMContainer.export),
[`get_executor()`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-containers.html#qairt.gen_ai_api.containers.llm_container.LLMContainer.get_executor), or
`T2TExecutor` to control execution without rebuilding the model.

- *class* qairt.gen\_ai\_api.configs.engine\_config.EngineConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

Engine-level execution parameters supplied at deployment time.

Mirrors the `engine` section of the Genie standalone-engine configuration.
Independent of both model architecture
([`GenAIConfig`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs.html#qairt.gen_ai_api.configs.gen_ai_config.GenAIConfig)) and
per-call generation settings
(`GenerationConfig`).

This config is applied uniformly to all engines in a pipeline.  When
per-engine configuration is needed (e.g. Eaglet target vs. draft), add an
`engine_configs: Optional[List[EngineConfig]]` parameter *alongside*
`engine_config` rather than replacing it to preserve backward compatibility.

- n\_threads

    - Number of threads for the Genie dialog engine.  `0`
lets the backend choose the thread count.  Defaults to `6`
when not set.

- Type

    - Optional[int]

- htp

    - HTP backend parameters.  Only valid when the executor backend
is [`HTP`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-api-configs.html#qairt.api.configs.common.BackendType.HTP).

- Type

    - Optional[qairt.gen\_ai\_api.configs.engine\_config.HTPEngineConfig]

- htp*: Optional[HTPEngineConfig]*  *= None*

    - 

- n\_threads*: Optional[int]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Ge(ge=0)])*

    -

- *class* qairt.gen\_ai\_api.configs.engine\_config.HTPEngineConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

HTP backend parameters supplied at deployment time.

Mirrors the `engine.backend.QnnHtp` section of the Genie standalone-engine
configuration.  All fields are optional; omitted fields fall back to
`EngineConfig` / factory defaults.

Passing an `EngineConfig` with this field set to an executor
targeting a non-HTP backend raises `ValueError` at
[`prepare_environment()`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-executors.html#qairt.gen_ai_api.executors.gen_ai_executable.GenAIExecutable.prepare_environment)
time.

- cpu\_mask

    - CPU core affinity mask for the HTP runtime.  For example
`"0x3"` restricts execution to cores 0 and 1.  Defaults to
`"0x00"` (all cores) when not set.

- Type

    - Optional[str]

- poll

    - Whether to enable polling mode in the HTP backend.  When not
set, defaults to `True`.

- Type

    - Optional[bool]

- use\_mmap

    - Use memory-mapped file access for model weights.  When not
set, defaults to `True` for device (non-native) execution.

- Type

    - Optional[bool]

- spill\_fill\_bufsize

    - Spill/fill buffer size in bytes.  Defaults to
`0` when not set.

- Type

    - Optional[int]

- mmap\_budget

    - Memory-mapped budget in MB.  `0` means fully
memory-mapped (no budget cap).  Defaults to `40` when not set.

- Type

    - Optional[int]

- pos\_id\_dim

    - Positional ID dimension override.

- Type

    - Optional[int]

- kv\_update\_method

    - KV-cache update method override.

- Type

    - Optional[str]

- allow\_async\_init

    - Allow context binaries to be initialized
asynchronously.  Preferred over the deprecated
`GenAIConfig.allow_async_init` when both are set.

- Type

    - Optional[bool]

- enable\_graph\_switching

    - Enable graph switching for graphs within each
context binary.  Preferred over the deprecated
`GenAIConfig.enable_graph_switching` when both are set.

- Type

    - Optional[bool]

Extra fields are forwarded verbatim to the underlying Genie QnnHtp backend
configuration, allowing callers to pass parameters not yet explicitly modeled
here without waiting for an API update.

- allow\_async\_init*: Optional[bool]*  *= None*

    - 

- cpu\_mask*: Optional[str]*  *= None*

    - 

- enable\_graph\_switching*: Optional[bool]*  *= None*

    - 

- kv\_update\_method*: Optional[str]*  *= None*

    - 

- mmap\_budget*: Optional[int]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Ge(ge=0)])*

    - 

- model\_config *= {'extra': 'allow', 'populate\_by\_name': True, 'protected\_namespaces': (), 'validate\_assignment': True}*

    - 

- poll*: Optional[bool]*  *= None*

    - 

- pos\_id\_dim*: Optional[int]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Gt(gt=0)])*

    - 

- spill\_fill\_bufsize*: Optional[int]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Ge(ge=0)])*

    - 

- use\_mmap*: Optional[bool]*  *= None*

    -

## GenerationConfig

`GenerationConfig` holds per-call (runtime)
generation parameters — for example `temperature`, `top_k`, `top_p`, `seed`, `greedy`,
and `max_num_tokens` — separated from build-time configuration so they can be varied per inference
request.

- *class* qairt.gen\_ai\_api.configs.generation\_config.GenerationConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

Per-call generation parameters that override the static dialog defaults.

All fields are optional; omitted fields fall back to the value baked into
the base [`GenieConfig`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.GenieConfig)
at [`prepare_environment()`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-executors.html#qairt.gen_ai_api.executors.gen_ai_executable.GenAIExecutable.prepare_environment)
time.

This config is intentionally free of genie-internal types so that callers
only need to depend on the public `qairt.gen_ai_api` namespace.  Runner
implementations are responsible for mapping these fields to their internal
representation (e.g. `temperature` → `Sampler.temp`).

- temperature

    - Sampling temperature.  Higher values increase randomness;
lower values make the output more deterministic.  Must be &gt;= 0.

- Type

    - Optional[float]

- top\_k

    - Limits sampling to the *k* most likely tokens.  Must be &gt; 0.

- Type

    - Optional[int]

- top\_p

    - Nucleus-sampling probability threshold.  Must be in (0, 1].

- Type

    - Optional[float]

- seed

    - Random seed for the sampler.

- Type

    - Optional[int]

- greedy

    - When `True`, use greedy (argmax) decoding regardless of other
sampler settings.

- Type

    - Optional[bool]

- max\_num\_tokens

    - Maximum number of tokens to generate.  Must be &gt; 0.

- Type

    - Optional[int]

- lora\_config

    - LoRA adapter selection and per-adapter alpha scaling for
this call.  Selects which use-case (adapter set) is active and
allows the caller to override the alpha scaling factor for each
adapter within that use case.

- Type

    - Optional[[qairt.modules.lora.lora_config.UseCaseRunConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-lora.html#qairt.modules.lora.lora_config.UseCaseRunConfig)]

- greedy*: Optional[bool]*  *= None*

    - 

- lora\_config*: Optional[[UseCaseRunConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-lora.html#qairt.modules.lora.lora_config.UseCaseRunConfig)]*  *= None*

    - 

- max\_num\_tokens*: Optional[int]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Gt(gt=0)])*

    - 

- seed*: Optional[int]*  *= None*

    - 

- temperature*: Optional[float]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Ge(ge=0.0)])*

    - 

- top\_k*: Optional[int]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Gt(gt=0)])*

    - 

- top\_p*: Optional[float]*  *= FieldInfo(annotation=NoneType, required=False, default=None, metadata=[Gt(gt=0.0), Le(le=1.0)])*

    -

## WorkflowConfig

See [Workflow Configuration](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs-workflow.html#workflow-configs) for `WorkflowNode`, `WorkflowGraph`, and `WorkflowNodeRole`.

## VisionEncoderConfig

`VisionEncoderConfig` configures a
single-pass vision encoder model built by
[`VisionEncoderBuilderHTP`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-builders.html#qairt.gen_ai_api.builders.vision_encoder_builder_htp.VisionEncoderBuilderHTP).

Configuration for a single-pass vision encoder model.

- *class* qairt.gen\_ai\_api.configs.vision\_encoder\_config.VisionEncoderConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

Configuration for a single-pass vision encoder model.

This config is populated from the `vision_config` sub-section of a
Qwen2-VL (or similar) `config.json`.  It intentionally omits all
LLM-specific fields (tokenizer path, vocabulary size, context length,
KV-cache dimension, positional encoding type, etc.).

- hidden\_size

    - Dimensionality of the output patch-embedding vectors
(e.g. 1280 for Qwen2.5-VL ViT, read from
`vision_config.hidden_size`).

- Type

    - int

- num\_patches

    - Number of image patches produced by the
patch-embedding layer (e.g. 480 for
`qwen_vit_base` at a 1920-patch input).
Derived automatically from the ONNX output shape
when not supplied.

- Type

    - Optional[int]

- num\_heads

    - Number of attention heads in the vision transformer
(`vision_config.num_heads`).

- Type

    - Optional[int]

- depth

    - Number of transformer layers (blocks) in the encoder
(`vision_config.depth`).

- Type

    - Optional[int]

- positional\_encoding

    - Optional dict describing the positional encoding
used by the vision encoder.  When present it is
written verbatim into the Genie node config JSON
under `image-encoder/engine/model/positional-encoding`.
Expected keys (all optional): `type`, `rope-dim`,
`rope-theta`, `rope-scaling`.

- Type

    - Optional[Dict[str, Any]]

- vision\_start\_token\_id

    - Token ID of the vision-start boundary token
(`vision_start_token_id` in `config.json`).
Used by
[`ImageT2TExecutor`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-executors.html#qairt.gen_ai_api.executors.image_t2t_executor.ImageT2TExecutor)
to split the formatted prompt into a prefix (up to
and including this token) and a suffix (from
`vision_end_token_id` onwards).

- Type

    - Optional[int]

- vision\_end\_token\_id

    - Token ID of the vision-end boundary token
(`vision_end_token_id` in `config.json`).
See `vision_start_token_id`.

- Type

    - Optional[int]

- depth*: Optional[int]*  *= None*

    - 

- grid\_height*: Optional[int]*  *= None*

    - 

- grid\_width*: Optional[int]*  *= None*

    - 

- hidden\_size*: int*

    - 

- num\_heads*: Optional[int]*  *= None*

    - 

- num\_patches*: Optional[int]*  *= None*

    - 

- positional\_encoding*: Optional[Dict[str, Any]]*  *= None*

    - 

- vision\_end\_token\_id*: Optional[int]*  *= None*

    - 

- vision\_start\_token\_id*: Optional[int]*  *= None*

    -

Last Published: Jul 08, 2026

[Previous Topic
GenAIExecutable.prepare\_environment()](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/qairt-gen-ai-api-executors.md) [Next Topic
Workflow Configuration](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/qairt-gen-ai-api-configs-workflow.md)