# Genie Engine JSON configuration string

The following sections contain information that pertain to the format of the JSON configuration string that is supplied
to [GenieEngineConfig\_createFromJson](https://docs.qualcomm.com/doc/80-63442-100/topic/function_GenieEngine_8h_1a14bb1c0eba84e32550d5031244ef0432.html#exhale-function-genieengine-8h-1a14bb1c0eba84e32550d5031244ef0432). This JSON
configuration can be supplied to genie-t2t-run tool for switching the engine.

Note

Please refer to the example configs contained in the SDK at ${SDK\_ROOT}/examples/Genie/configs/.

## General configuration schema

The following provides the schema of the JSON configuration format that is provided to
[GenieEngineConfig\_createFromJson](https://docs.qualcomm.com/doc/80-63442-100/topic/function_GenieEngine_8h_1a14bb1c0eba84e32550d5031244ef0432.html#exhale-function-genieengine-8h-1a14bb1c0eba84e32550d5031244ef0432). Note that
dependencies are not specified in the schema, but are discussed in the following per-backend sections.

{
      "standalone-engine" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "context" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "size": {"type": "integer"},
              "n-vocab": {"type": "integer"},
              "draft-n-vocab": {"type": "integer"},
              "bos-token": {"type": "integer"},
              "eos-token": {"type": "array", "items": {"type": "integer"}},
              "pad-token": {"type": "integer"},
              "n-embd": {"type": "integer"}
            }
          },
          "embedding": {
            "type" : "object",
            "properties": {
              "version": {"type": "integer"},
              "type": {"type": "string", "enum" : ["lut", "callback"]},
              "lut-path": {"type": "string"},
              "size": {"type": "integer"},
              "datatype": {"type": "string"},
              "quant-param": {
                "type": "object",
                "properties": {
                  "scale": {"type": "float"},
                  "offset": {"type": "float"}
                }
              }
            }
          }
          "engine" : {
            "type": "array",
            "items": {
              "type" : "object",
              "properties": {
                "version" : {"type": "integer"},
                "n-threads" : {"type": "integer"},
                "backend" : {
                  "type": "object",
                  "properties": {
                    "version" : {"type": "integer"},
                    "type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
                    "QnnHtp" : {
                      "type": "object",
                      "properties": {
                        "version" : {"type": "integer"},
                        "spill-fill-bufsize" : {"type": "integer"},
                        "data-alignment-size" : {"type": "integer"},
                        "use-mmap" : {"type": "boolean"},
                        "mmap-budget" : {"type": "integer"},
                        "poll" : {"type": "boolean"},
                        "pos-id-dim" : {"type": "integer"},
                        "cpu-mask" : {"type": "string"},
                        "kv-dim" : {"type": "integer"},
                        "allow-async-init" : {"type": "boolean"},
                        "enable-graph-switching" : {"type": "boolean"},
                        "skip-lora-validation" : {"type" : "boolean"},
                        "rope-theta" : {"type": "double"}
                      }
                    },
                    "QnnGenAiTransformer" : {
                      "type": "object",
                      "properties": {
                        "version" : {"type": "integer"},
                        "n-logits" : {"type": "integer"},
                        "n-layer" : {"type": "integer"},
                        "n-embd" : {"type": "integer"},
                        "n-heads" : {"type": "integer"},
                        "kv-quantization" : {"type": "boolean"}
                      }
                    },
                    "extensions" : {"type": "string"}
                  }
                },
                "model" : {
                  "type": "object",
                  "properties": {
                    "version" : {"type": "integer"},
                    "type" : {"type": "string","enum":["binary", "library"]},
                    "binary" : {
                      "type": "object",
                      "properties": {
                        "version" : {"type": "integer"},
                        "ctx-bins" : {"type": "array", "items": {"type": "string"}}
                      }
                    },
                    "library" : {
                      "type": "object",
                      "properties": {
                        "version" : {"type": "integer"},
                        "model-bin" : {"type": "string"}
                      }
                    },
                    "positional-encoding": {
                      "type" : "object",
                      "properties" :{
                        "type": {"type" : "string", "enum" : ["rope", "absolute", "alibi"]},
                        "rope-dim": {"type" : "integer"},
                        "rope-theta": {"type" : "double"},
                        "rope-scaling": {
                          "type" : "object",
                          "properties" :{
                            "rope-type": {"type" : "string", "enum" : ["llama3", "longrope"]},
                            "factor": {"type" : "integer"},
                            "high-freq-factor": {"type" : "integer"},
                            "low-freq-factor": {"type" : "integer"},
                            "original-max-position-embeddings": {"type" : "integer"}
                          }
                        }
                      }
                    },
                    "draft-token-map" : {"type" : "string"}
                  }
                }
              }
            }
          }
        }
      }
    }
    Copy to clipboard

| Option | Applicability | Description |
| --- | --- | --- |
| standalone-engine::version | all backends | Version of standalone-engine object that is supported by<br>APIs.(1) |
| context::version | all backends | Version of context object that is supported by APIs. (1) |
| context::size | all backends | Context length. Maximum number of tokens to store. |
| context::n-vocab | all backends | Model vocabulary size. |
| context::draft-n-vocab | all backends | Draft model vocabulary size. |
| context::bos-token | all backends | Beginning of sentence token. |
| context::eos-token | all backends | End of sentence token.<br>Argument passed in as an integer or array of integers |
| context::eot-token | all backends | End of turn token. |
| context::n-embd | all backends | Embedding size of the input. |
| embedding::version | all backends | Version of embedding object that is supported by APIs. (1) |
| embedding::type | all backends | Type of embedding to use. Supported options: lut, callback |
| embedding::lut-path | all backends | Path to the look up table for embeddings. |
| embedding::size | all backends | Size of each token embedding for the model. |
| embedding::datatype | all backends | Datatype of lut. |
| quant-param::scale | all backends | Quantization scale of the lut. |
| quant-param::offset | all backends | Quantization offset of the lut. |
| engine::version | all backends | Version of engine object that is supported by APIs. (1) |
| engine::n-threads | all backends | Number of threads to use for KV-cache updates. |
| backend::version | all backends | Version of backend object that is supported by APIs. (1) |
| backend::type | all backends | Type of engine like “QnnHtp” for QNN HTP,<br>“QnnGenAiTransformer” for QNN GenAITransformer backend and<br>“QnnGpu” for QNN GPU. |
| backend::extensions | QNN HTP | Path to backend extensions configuration file. |
| QnnHtp::version | QNN HTP | Version of QnnHtp object that is supported by APIs. (1) |
| QnnHtp::spill-fill-bufsize | QNN HTP | Buffer size to pre-allocate for the QNN HTP spill fill.<br>This field depends upon the HTP VTCM memory size. It should<br>be set greater than the spill-fill required by each context<br>binary in the model. Consult the QNN HTP backend<br>documentation in the QAIRT SDK for more details. |
| QnnHtp::data-alignment-size | QNN HTP | Data will be aligned by rounding up the size to the nearest<br>multiple of alignment number. Typically should be zero. |
| QnnHtp::use-mmap | QNN HTP | Memory map the context binary files. Typically should be<br>turned on. |
| QnnHtp::mmap-budget | QNN HTP | Memory map the context binary files in chunks of the given<br>size. Typically should be 25MB. |
| QnnHtp::poll | QNN HTP | Specify whether to busy-wait on threads. |
| QnnHtp::pos-id-dim | QNN HTP | Dimension of positional embeddings, usually (kv-dim) / 2. |
| QnnHtp::cpumask | QNN HTP | CPU affinity mask. |
| QnnHtp::kv-dim | QNN HTP | Dimension of the KV-cache embedding. |
| QnnHtp::allow-async-init | QNN HTP | Allow context binaries to be initialized asynchronously<br>if the backend supports it. |
| QnnHtp::enable-graph-switching | QNN HTP | Enables graph switching for graphs within each context<br>binary. |
| QnnHtp::skip-lora-validation | QNN HTP | Skips CRC validation when LoRA binary sections are applied.<br>Please refer to QNN HTP documentation for more information. |
| QnnHtp::rope-theta | QNN HTP | Used to calculate rotary positional encodings. |
| QnnGenAiTransformer::version | QNN GenAiTransformer | Version of QnnGenAiTransformer object that is supported<br>by APIs. (1) |
| QnnGenAiTransformer::n-logits | QNN GenAiTransformer | Number of logit vectors that result will have for sampling. |
| QnnGenAiTransformer::n-layer | QNN GenAiTransformer | Number of decoder layers model is having. |
| QnnGenAiTransformer::n-embd | QNN GenAiTransformer | Size of embedding vector for each token. |
| QnnGenAiTransformer::n-heads | QNN GenAiTransformer | Number of heads model is having. |
| QnnGenAiTransformer::kv-quantization | QNN GenAiTransformer | Quantize KV Cache to Q8\_0\_32. |
| model::version | all backends | Version of model object that is supported by APIs. (1) |
| model::type | all backends | Type of model object “binary” for QNN HTP and “library”<br>for QNN GenAiTransformer. |
| model::positional-encoding | all backends | Captures positional encoding parameters for a model. |
| positional-encoding::type | all backends | Type of positional encoding. Supported types are rope,<br>alibi and absolute |
| positional-encoding::rope-dim | all backends | Dimension of Rope positional embeddings, usually<br>(kv-dim) / 2. |
| positional-encoding::rope-theta | all backends | Used to calculate rotary position encodings for type rope |
| rope-scaling::rope-type | all backends | Type of rope scaling. Supported types are llama3 and<br>longrope. |
| rope-scaling::factor | all backends | Rope scaling factor |
| rope-scaling::high-freq-factor | all backends | High frequency rope scaling factor. |
| rope-scaling::low-freq-factor | all backends | Low frequency rope scaling factor. |
| rope-scaling::original-max-<br>position-embeddings | all backends | Original maximum positional embedding length. |
| model::draft-token-map | all backends | Map for draft-token-id to target-token-id. |
| binary::version | QNN HTP | Version of binary object that is supported by APIs. (1) |
| binary::ctx-bins | QNN HTP | List of serialized model files. |
| library::version | QNN GenAiTransformer | Version of library object that is supported by APIs. (1) |
| library::model-bin | QNN GenAiTransformer | Path to model.bin file. |

## Standalone Engine configuration example for QNN HTP

The following is an example configuration for the standalone-engine for HTP backend.

{
      "standalone-engine" : {
        "version" : 1,
        "embedding": {
          "version": 1,
          "type": "lut",
          "lut-path": "uint8_lut.bin",
          "size": 3072,
          "datatype": "ufixed8",
          "quant-param": {
            "scale":  0.0022299130757649738,
            "offset":   -107.10983274409456
          }
        },
        "context": {
          "version": 1,
          "n-vocab": 128256,
          "draft-n-vocab": 3200,
          "size": 2048,
          "bos-token": 128000,
          "eos-token": [
            128001,
            128008,
            128009
          ],
          "n-embd": 3072,
          "pad-token": 128004
        },
        "engine" :       {
          "version": 1,
          "role": "draft",
          "n-threads": 3,
          "backend": {
            "version": 1,
            "type": "QnnHtp",
            "QnnHtp": {
              "version": 1,
              "spill-fill-bufsize": 0,
              "use-mmap": true,
              "mmap-budget": 0,
              "poll": true,
              "cpu-mask": "0xe0",
              "kv-dim": 128,
              "allow-async-init": false,
              "enable-graph-switching": false
            },
            "extensions": "htp_backend_ext_config.json"
          },
          "model": {
            "version": 1,
            "type": "binary",
            "binary": {
              "version": 1,
              "ctx-bins": [
                "weight_sharing_model_2_of_3.serialized.bin",
                "weight_sharing_model_3_of_3.serialized.bin"
              ]
            },
            "positional-encoding": {
              "type": "rope",
              "rope-dim": 64,
              "rope-theta": 500000,
              "rope-scaling": {
                "factor": 32,
                "high-freq-factor": 4,
                "low-freq-factor": 1,
                "original-max-position-embeddings": 8192,
                "rope-type": "llama3"
              }
            },
            "draft-token-map" : "vocab_trim_elementary16181.json"
          }
        }
      }
    }
    Copy to clipboard

An example of a standalone-engine configuration for HTP can be found at ${SDK\_ROOT}/examples/Genie/configs/standalone-engine.json.

Last Published: Oct 02, 2025

[Previous Topic
GenieEngine](https://docs.qualcomm.com/bundle/publicresource/80-63442-100/topics/engine.md) [Next Topic
GenieTokenizer](https://docs.qualcomm.com/bundle/publicresource/80-63442-100/topics/tokenizer.md)