# KV$ Rewind

The **KV$ Rewind/KV$ Prefix Match** feature allows for efficient query processing by leveraging previously cached KV values. When using KV Rewind, Genie can reuse the KV cache values from a previous query to speed up the processing of a new, similar query. This is particularly useful in scenarios where the new query shares a common prefix with the previous one.

## Using KV Rewind between queries

typedef enum {
      /// The string is the entire query/response.
      GENIE_DIALOG_SENTENCE_COMPLETE = 0,
      /// The string is the beginning of the query/response.
      GENIE_DIALOG_SENTENCE_BEGIN = 1,
      /// The string is a part of the query/response and not the beginning or end.
      GENIE_DIALOG_SENTENCE_CONTINUE = 2,
      /// The string is the end of the query/response.
      GENIE_DIALOG_SENTENCE_END = 3,
      /// The query has been aborted.
      GENIE_DIALOG_SENTENCE_ABORT = 4,
      ///Rewind the KV cache as per prefix query match before processing the query
      GENIE_DIALOG_SENTENCE_REWIND = 5,
    } GenieDialog_SentenceCode_t;
    Copy to clipboard

GENIE_API
    Genie_Status_t GenieDialog_query(const GenieDialog_Handle_t dialogHandle,
                                     const char* queryStr,
                                     const GenieDialog_SentenceCode_t sentenceCode,
                                     const GenieDialog_QueryCallback_t callback,
                                     const void* userData);
    Copy to clipboard

Use the sentence code **GENIE\_DIALOG\_SENTENCE\_REWIND** and pass the query string as you would for a normal query. The API will handle prefix matching and KV rewind internally.

Note

KV$ prefix match works well with the KV update method SMART\_MASK. However, with KV update method POINTER\_SHIFT, we observed that in a few cases, it throws memory register-related errors for weight-shared bins. POINTER\_SHIFT works fine or shows no issues with decoder-only models (AR1 / AR8 / AR128, etc.).

In genie-t2t-run, we can use ‘-w’ option for rewind queries.

For example:

./genie-t2t-run -c llama2-7b-htp.json
                    -p "Answer in one sentence, what is the capital city of India?"
                    -w "Answer in one sentence, what is the capital city of Russia?"
    Copy to clipboard

Last Published: Jun 04, 2026

[Previous Topic
Dialog Pause/Resume](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/pause_resume.md) [Next Topic
Updating the Stop Sequence](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/stopseq.md)