# BuilderTransformerConfig

- *class* qairt.gen\_ai\_api.configs.builder\_transformer\_config.BuilderTransformerConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

- backend*: [BackendType](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-api-configs.html#qairt.api.configs.common.BackendType)*  *= 'HTP'*

    - backend to use for the model

- model\_transformer\_config*: ModelTransformerConfig*  *= ModelTransformerConfig(arn\_cl\_options=ARn\_ContextLengthConfig(context\_length=[4096], auto\_regression\_number=[1, 128], arn\_model\_paths={}, arn\_encodings\_paths={}, skip\_ar\_cl\_conversion=False), split\_model=SplitModelConfig(num\_splits=1, split\_embedding=False, split\_lm\_head=False, skip\_verification=False, log\_level='info'), mha\_config=None, adapt\_moe=None)*

    - additional transformation-specific configurations

## GenAIConfig

- *class* qairt.gen\_ai\_api.configs.gen\_ai\_config.EmbeddingConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

EmbeddingConfig holds configuration information for the embedding table LUT.

- embed\_datatype*: str*

    - Embedding datatype.

- embed\_length*: int*

    - Embedding length.

- embed\_path*: str | os.PathLike*

    - Path to embedding table LUT.

- embed\_quant\_offset*: Optional[int]*  *= None*

    - Embedding quant offset.

- embed\_quant\_scale*: Optional[float]*  *= None*

    - Embedding quant scale.

- *class* qairt.gen\_ai\_api.configs.gen\_ai\_config.ExpertConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

Configuration for Mixture-of-Experts inference behaviour.

- enable\_expert\_subselection*: bool*  *= False*

    - Enable expert subselection optimisation.

- enable\_op\_predication*: bool*  *= False*

    - Enable operation predication for expert routing.

- *classmethod* from\_pretrained\_config(*config: PretrainedConfig*) → [ExpertConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs.html#qairt.gen_ai_api.configs.gen_ai_config.ExpertConfig)

    - Create an ExpertConfig from a HuggingFace PretrainedConfig.
Fields default to False when absent from the HF config.

- *class* qairt.gen\_ai\_api.configs.gen\_ai\_config.GenAIConfig(*\*args: Any*, *\*\*kwargs: Any*)

    - Bases: `AISWBaseModel`

GenAIConfig holds common configuration information for the Generative AI Model, needed for Genie
execution.  Common attributes (present in all subclasses):

- adapter\_count\_by\_use\_case*: Optional[Dict[str, int]]*  *= {}*

    - Dict of number of adapters per use case.

- allow\_async\_init*: Optional[bool]*  *= None*

    - Allow context binaries to be initialized asynchronously if the backend supports it.

- alpha\_tensor\_name*: Optional[str]*  *= ''*

    - Name of the tensor where LoRA adapter is being applied.

- bos\_token*: int*

    - The id of the beginning of stream token.

- chat\_template*: Union[NullChatTemplate, HFChatTemplate, CustomChatTemplate]*  *= FieldInfo(annotation=NoneType, required=False, default\_factory=NullChatTemplate, discriminator='type')*

    - Chat template for message formatting.
Supports NullChatTemplate, HFChatTemplate, or CustomChatTemplate instances.

- context\_length*: int*

    - context length

- embedding\_config*: Optional[[EmbeddingConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs.html#qairt.gen_ai_api.configs.gen_ai_config.EmbeddingConfig)]*  *= None*

    - Embedding config.

- enable\_graph\_switching*: Optional[bool]*  *= None*

    - Enable graph switching for graphs within each context binary.

- eos\_token*: int | list[int]*

    - The id of the end of stream token.

- eot\_token*: Optional[int]*  *= None*

    - The id of the end of turn token.

- expert\_config*: Optional[[ExpertConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-configs.html#qairt.gen_ai_api.configs.gen_ai_config.ExpertConfig)]*  *= None*

    - MoE expert configuration. Populated automatically for MoE architectures.

- kv\_dim*: Optional[int]*  *= None*

    - dimension of the kv cache

- n\_embd*: Optional[int]*  *= None*

    - The hidden size of the model

- n\_heads*: Optional[int]*  *= None*

    - The number of attention heads used in the multi-head attention layers of the model

- n\_layer*: Optional[int]*  *= None*

    - The number of blocks in the model

- n\_vocab*: int*

    - The number of tokens in the vocabulary, which is also the first dimension of the embeddings matrix

- positional\_encoding*: Optional[[PositionalEncoding](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.PositionalEncoding)]*  *= None*

    - An object describing the positional encodings

- rope\_scaling*: Optional[[RopeScaling](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.RopeScaling)]*  *= None*

    - rope scaling configuration for extended context

- rope\_theta*: Optional[float]*  *= None*

    - theta value for rotational positional encoding

- sampler\_params*: Optional[[Sampler](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.Sampler)]*  *= None*

    - Model-specific sampler parameters for optimal model performance.
Applied automatically by the executor.

- speculative\_run\_config*: Optional[Union[LadeRunConfig, EagletRunConfig, [SsdRunConfig](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.SsdRunConfig)]]*  *= None*

    - Speculative decoding config.

- tokenizer\_path*: str | os.PathLike*

    - The path to the tokenizer.  Must point to an existing file.

- validate\_tokenizer\_path(*v*)

    -

Last Published: May 26, 2026

[Previous Topic
GenAIExecutor](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/qairt-gen-ai-api-executors.md) [Next Topic
Modules](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/qairt-gen-ai.md)