# prefix\_quant

The **prefix\_quant** feature is a novel quantization technique for large language models (LLMs). By pre-inserting “prefixed outlier tokens” into the KV cache, it significantly reduces activation quantization error, resulting in more stable activation distributions and higher quantized model accuracy, making it more suitable for engineering-grade deployment of large models.

## Tutorial on how to use prefix\_quant cache in genie-t2t-run

Note

To use prefix\_quant cache in genie-t2t-run, we need to do the following two steps: 1. set “bos-token” as -1 in genie config; 2.remove the prefix tokens from the prompt(the prefix cache will provide these tokens automatically).

In genie-t2t-run, the –restore (or -r) option can be used to load the Prefix Quant cache prior to handling a query when running in basic dialog mode.

For example:

./genie-t2t-run -c llama3-3b-htp.json
                    -p "Tell me about Qualcomm."
                    -r /data/local/tmp/llama3-3b/prefix_cache
    Copy to clipboard

## Tutorial on how to use prefix\_quant cache in genie-app

Note

To use prefix\_quant cache in genie-app, we need to do the following two steps: 1. set “bos-token” as -1 in genie config; 2.remove the prefix tokens from the prompt(the prefix cache will provide these tokens automatically).

In genie-app, the “dialog restore DIALOG\_NAME PATH” can be used in genie-app script to load the Prefix Quant cache prior to handling a query when running in basic dialog mode.

Script for genie-app

profile create profile1
    log create log1 verbose log.txt
    dialog config create config1 llama3-3b-htp.json
    
    dialog config bind profile config1 profile1
    dialog config bind log config1 log1
    
    dialog create dialog1 config1
    
    dialog restore dialog1 /data/local/tmp/llama3-3b/prefix_cache
    
    dialog query dialog1 "Tell me about qualcomm."
    
    profile save profile1 profile1.json
    
    dialog config free config1
    
    dialog free dialog1
    
    profile free profile1
    
    log free log1
    Copy to clipboard

## Tutorial on how to use prefix\_quant cache in ssdDialog

Note

To use prefix\_quant cache in ssdDialog, we need to do the following two steps:
1. The SSD prefix cache and the prefix quant cache need to be merged into the kv-cache.primary.qnn-htp file.
2. set “bos-token” as -1 in genie config;
3. remove the prefix tokens from the prompt(the prefix cache will provide these tokens automatically).

For example:

./genie-t2t-run -c qwen3_ssd_prefix_quant.json
                    --prompt_file  prompt.txt
    Copy to clipboard

Last Published: Jun 04, 2026

[Previous Topic
Engine Sharing](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/engine_sharing.md) [Next Topic
batchQuery](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/batchQuery.md)