# Genie Embedding JSON configuration string

The following sections contain information that pertain to the format of the JSON configuration string that is supplied
to [GenieEmbeddingConfig\_createFromJson](https://docs.qualcomm.com/doc/80-63442-100/topic/function_GenieEmbedding_8h_1a3d4da392995015a54825196b20318a26.html#exhale-function-genieembedding-8h-1a3d4da392995015a54825196b20318a26). This JSON
configuration is can also be supplied to the genie-t2e-run tool.

Note

Please refer to the example configs contained in the SDK at ${SDK\_ROOT}/examples/Genie/configs/.

## General configuration schema

The following provides the schema of the JSON configuration format that is provided to
[GenieEmbeddingConfig\_createFromJson](https://docs.qualcomm.com/doc/80-63442-100/topic/function_GenieEmbedding_8h_1a3d4da392995015a54825196b20318a26.html#exhale-function-genieembedding-8h-1a3d4da392995015a54825196b20318a26). Note that
dependencies are not specified in the schema, but are discussed in the following per-backend sections.

{
      "embedding" : {
        "type": "object",
        "properties": {
          "version" : {"type": "integer"},
          "context" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "ctx-size": {"type": "integer"},
              "n-vocab": {"type": "integer"},
              "embed-size": {"type": "integer"},
              "pad-token": {"type": "integer"}
            }
          },
          "prompt" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "prompt-template" : {"type": "array", "items": {"type": "string"}}
            }
          },
          "tokenizer" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "path" : {"type": "string"}
            }
          },
          "truncate-input" : {"type" : "boolean"},
          "engine" : {
            "type": "object",
            "properties": {
              "version" : {"type": "integer"},
              "n-threads" : {"type": "integer"},
              "backend" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "type" : {"type": "string","enum" : ["QnnHtp", "QnnGenAiTransformer"]},
                  "QnnHtp" : {
                    "type": "object",
                    "properties": {
                      "version" : {"type": "integer"},
                      "spill-fill-bufsize" : {"type": "integer"},
                      "data-alignment-size" : {"type": "integer"},
                      "use-mmap" : {"type": "boolean"},
                      "allow-async-init" : {"type": "boolean"},
                      "pooled-output" : {"type": "boolean"},
                      "disable-kv-cache" : {"type": "boolean"}
                    }
                  },
                  "QnnGenAiTransformer" : {
                    "type": "object",
                    "properties": {
                      "version" : {"type": "integer"},
                      "n-layer" : {"type": "integer"},
                      "n-embd" : {"type": "integer"},
                      "n-heads" : {"type": "integer"}
                    }
                  },
                  "extensions" : {"type": "string"}
                }
              },
              "model" : {
                "type": "object",
                "properties": {
                  "version" : {"type": "integer"},
                  "type" : {"type": "string","enum":["binary", "library"]},
                  "binary" : {
                    "type": "object",
                    "properties": {
                      "version" : {"type": "integer"},
                      "ctx-bins" : {"type": "array", "items": {"type": "string"}}
                    }
                  },
                  "library" : {
                    "type": "object",
                    "properties": {
                      "version" : {"type": "integer"},
                      "model-bin" : {"type": "string"}
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    Copy to clipboard

| Option | Applicability | Description |
| --- | --- | --- |
| embedding::version | all backends | Version of embedding object that is supported by APIs.(1) |
| embedding::truncate-input | all backends | To allow truncation of input, when it exceeds the context<br>length. |
| context::version | all backends | Version of context object that is supported by APIs. (1) |
| context::ctx-size | all backends | Context length. Maximum number of tokens to process. |
| context::n-vocab | all backends | Model vocabulary size. |
| context::embed-size | all backends | Embedding length. Embedding vector length for each token. |
| context::pad-token | all backends | Token id for pad token. |
| prompt::version | all backends | Version of prompt object that is supported by APIs. (1) |
| prompt::prompt-template | all backends | Prefix and Suffix string that will be added to each<br>prompt. |
| tokenizer::version | all backends | Version of tokenizer object that is supported by APIs. (1) |
| tokenizer::path | all backends | Path to tokenizer file. |
| engine::version | all backends | Version of engine object that is supported by APIs. (1) |
| engine::n-threads | all backends | Number of threads to use for KV-cache updates. |
| backend::version | all backends | Version of backend object that is supported by APIs. (1) |
| backend::type | all backends | Type of engine like “QnnHtp” for QNN HTP and<br>“QnnGenAiTransformer” for QNN GenAITransformer backend. |
| backend::extensions | QNN HTP | Path to backend extensions configuration file. |
| QnnHtp::version | QNN HTP | Version of QnnHtp object that is supported by APIs. (1) |
| QnnHtp::spill-fill-bufsize | QNN HTP | Buffer size to pre-allocate for the QNN HTP spill fill.<br>This field depends upon the HTP VTCM memory size. It should<br>be set greater than the spill-fill required by each context<br>binary in the model. Consult the QNN HTP backend<br>documentation in the QAIRT SDK for more details. |
| QnnHtp::use-mmap | QNN HTP | Memory map the context binary files. Typically should be<br>turned on. |
| QnnHtp::data-alignment-size | QNN HTP | Data will be aligned by rounding up the size to the nearest<br>multiple of alignment number. Typically should be zero. |
| QnnHtp::allow-async-init | QNN HTP | Allow context binaries to be initialized asynchronously<br>if the backend supports it. |
| QnnHtp::pooled-output | QNN HTP | To decide in between pooled or per token embedding result<br>as generation result. |
| QnnHtp::disable-kv-cache | QNN HTP | Disables the KV cache Manager, as models will not have KV<br>cache. |
| QnnGenAiTransformer::version | QNN GenAiTransformer | Version of QnnGenAiTransformer object that is supported<br>by APIs. (1) |
| QnnGenAiTransformer::n-layer | QNN GenAiTransformer | Number of decoder layers model is having. |
| QnnGenAiTransformer::n-embd | QNN GenAiTransformer | Size of embedding vector for each token. |
| QnnGenAiTransformer::n-heads | QNN GenAiTransformer | Number of heads model is having. |
| model::version | all backends | Version of model object that is supported by APIs. (1) |
| model::type | all backends | Type of model object “binary” for QNN HTP and “library”<br>for QNN GenAiTransformer. |
| binary::version | QNN HTP | Version of binary object that is supported by APIs. (1) |
| binary::ctx-bins | QNN HTP | List of serialized model files. |
| library::version | QNN GenAiTransformer | Version of library object that is supported by APIs. (1) |
| library::model-bin | QNN GenAiTransformer | Path to model.bin file. |

## QNN GenAITransformer backend configuration example

The following is an example configuration for the QNN GenAITransformer backend.

{
      "embedding" : {
        "version" : 1,
        "context": {
          "version": 1,
          "n-vocab": 30522,
          "ctx-size": 512,
          "embed-size" : 1024,
          "pad-token" : 0
        },
        "prompt": {
          "version" : 1,
          "prompt-template": ["[CLS]","[SEP]"]
        },
        "tokenizer" : {
          "version" : 1,
          "path" : "test_path"
        },
        "truncate-input" : true,
        "engine": {
          "version": 1,
          "n-threads" : 10,
          "backend" : {
            "version" : 1,
            "type" : "QnnGenAiTransformer",
            "QnnGenAiTransformer" : {
              "version" : 1,
              "n-layer": 24,
              "n-embd": 1024,
              "n-heads": 16
            }
          },
          "model" : {
            "version" : 1,
            "type" : "library",
            "library" : {
              "version" : 1,
              "model-bin" : "path_to_model_binary_file"
            }
          }
        }
      }
    }
    Copy to clipboard

## QNN HTP backend configuration example

The following is an example configuration for the QNN HTP backend.

{
      "embedding" : {
        "version" : 1,
        "context": {
          "version": 1,
          "n-vocab": 30522,
          "ctx-size": 512,
          "embed-size" : 1024,
          "pad-token" : 0
        },
        "prompt": {
          "version" : 1,
          "prompt-template": ["[CLS]","[SEP]"]
        },
        "tokenizer" : {
          "version" : 1,
          "path" : "test_path"
        },
        "truncate-input" : true,
        "engine" : {
          "version" : 1,
          "backend" : {
            "version" : 1,
            "type" : "QnnHtp",
            "QnnHtp" : {
              "version" : 1,
              "spill-fill-bufsize" : 0,
              "use-mmap" : true,
              "pooled-output" : true,
              "allow-async-init": false,
              "disable-kv-cache": true
            },
            "extensions" : "htp_backend_ext_config.json"
          },
          "model" : {
            "version" : 1,
            "type" : "binary",
            "binary" : {
              "version" : 1,
              "ctx-bins" : [
                "file_1_of_1.bin"
              ]
            }
          }
        }
      }
    }
    Copy to clipboard

Last Published: Oct 02, 2025

[Previous Topic
GenieEmbedding](https://docs.qualcomm.com/bundle/publicresource/80-63442-100/topics/embedding.md) [Next Topic
GenieProfile](https://docs.qualcomm.com/bundle/publicresource/80-63442-100/topics/profile.md)