# Imagine clients

The Qualcomm AI Inference Suite SDK exposes two clients, each with a different programming paradigm: synchronous and asynchronous.

[`ImagineClient`](https://docs.qualcomm.com/doc/80-88545-1/topic/imagine_clients.html#imagine.ImagineClient) is the synchronous Imagine client. If you don’t need asynchronous
programming for your Python code, or if you’re not familiar with asynchronous
programming, use this client.

Otherwise, if you are leveraging [`asyncio`](https://docs.python.org/3/library/asyncio.html#module-asyncio "(in Python v3.14)") on your codebase, [`ImagineAsyncClient`](https://docs.qualcomm.com/doc/80-88545-1/topic/imagine_clients.html#imagine.ImagineAsyncClient) might be a better choice.

## Synchronous client

- *class* imagine.ImagineClient(*endpoint=None*, *api\_key=None*, *max\_retries=3*, *timeout=60*, *verify=False*, *proxy=None*, *debug=False*, *ctx=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient)

    - Synchronous Imagine client. Provides methods for communicating with the Imagine API.

- chat(*messages*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*, *tools=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.chat)

    - Invokes the non-streaming version of the chat endpoint that returns `ChatCompletionResponse` for a given prompt.

- Parameters:

    - - **messages** ([*Sequence*](https://docs.python.org/3/library/typing.html#typing.Sequence "(in Python v3.14)")*[*[*ChatMessage*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatMessage) *|* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*,* [*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *]*) – A list of chat messages comprising the conversation so far.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for chat.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 makes the output more random, while lower values like 0.2 makes it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.
- **tools** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*,* [*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*,* [*Any*](https://docs.python.org/3/library/typing.html#typing.Any "(in Python v3.14)")*]* *]* *]* *|* *None*) – A list of tools the model can call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model can generate JSON inputs for.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ChatCompletionResponse`

- Return type:

    - [*ChatCompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionResponse)

- chat\_stream(*messages*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*, *stream\_options=None*, *tools=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.chat_stream)

    - Invokes chat endpoint streaming version that returns iterable `ChatCompletionStreamResponse` for a given prompt

- Parameters:

    - - **messages** ([*Sequence*](https://docs.python.org/3/library/typing.html#typing.Sequence "(in Python v3.14)")*[*[*ChatMessage*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatMessage) *|* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*,* [*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *]*) – A list of chat-messages comprising the conversation so far.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for chat.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text contains the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 makes the output more random, while lower values like 0.2 makes it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.
- **stream\_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*,* [*Any*](https://docs.python.org/3/library/typing.html#typing.Any "(in Python v3.14)")*]* *|* *None*) – Configure streaming outputs, like whether to return token usage when streaming (`{"include_usage": True}`).

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ChatCompletionStreamResponse`

- Return type:

    - [*Iterable*](https://docs.python.org/3/library/typing.html#typing.Iterable "(in Python v3.14)")[[*ChatCompletionStreamResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionStreamResponse)]

- completion(*prompt*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.completion)

    - Invokes completions endpoint non-streaming version that returns CompletionResponse for a given prompt

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – Prompt text for which completion needs to be generated.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for completion.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 makes the output more random, while lower values like 0.2 makes it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `CompletionResponse` object

- Return type:

    - [*CompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionResponse)

- completion\_stream(*prompt*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.completion_stream)

    - Invokes completions endpoint streaming version that returns CompletionResponse for a given prompt.

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – Prompt text for which completion needs to be generated
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for completion.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 makes the output more random, while lower values like 0.2 makes it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `CompletionStreamResponse` object

- Return type:

    - [*Iterable*](https://docs.python.org/3/library/typing.html#typing.Iterable "(in Python v3.14)")[[*CompletionStreamResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionStreamResponse)]

- embeddings(*text*, *model=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.embeddings)

    - An embeddings endpoint that returns embeddings for a single text

- Parameters:

    - - **text** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* [*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]*) – The text to embed
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The embedding model to use

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `EmbeddingResponse`: A response object containing the embeddings.

- Return type:

    - [*EmbeddingResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.EmbeddingResponse)

- get\_available\_models(*model\_type=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.get_available_models)

    - Returns a list of available models.

- Parameters:

    - **model\_type** (*ModelType*  *|* *None*) – Filter models by model type.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Available models.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")]

- get\_available\_models\_by\_type(*model\_type=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.get_available_models_by_type)

    - Returns a mapping of available models by model type.

- Parameters:

    - **model\_type** (*ModelType*  *|* *None*) – [`imagine.ModelType`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ModelType) Filter models by model type.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Available models grouped by model type.

- Return type:

    - [dict](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")[*ModelType*, [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")]]

- get\_chat\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.get_chat_history)

    - Returns a list of Chat (response, request) pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of Chat response, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*ChatCompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionResponse) | [*ChatCompletionRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionRequest)]]

- get\_completion\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.get_completion_history)

    - Returns a list of Completion (response, request) pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of Completion response, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*CompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionResponse) | [*CompletionRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionRequest)]]

- get\_embedding\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.get_embedding_history)

    - Returns a list of Embedding (response, request) pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of `EmbeddingResponse`, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*EmbeddingResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.EmbeddingResponse) | [*EmbeddingRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.EmbeddingRequest)]]

- get\_reranker\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.get_reranker_history)

    - Returns a list of ReRanker response, request pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of ReRanker response, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*ReRankerResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ReRankerResponse) | [*ReRankerRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ReRankerRequest)]]

- health\_check()[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.health_check)

    - Check the health of the server, including databases ands models.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - A `HealthResponse` object containing status of the server.

- Return type:

    - [*HealthResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.HealthResponse)

- images\_generate(*prompt*, *model=None*, *negative\_prompt='blurry'*, *seed=27*, *seed\_increment=100*, *n=1*, *num\_inference\_steps=20*, *size='512x512'*, *guidance\_scale=6.5*, *cache\_interval=None*, *response\_format='b64\_json'*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.images_generate)

    - Invokes images generate endpoint non-streaming version and returns an ImageResponse object

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The prompt to guide the image generation.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to be used for generation; defaults to None.
- **negative\_prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Characteristics to avoid in the image being generated; defaults to “blurry”.
- **seed** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The initial value used to generate random numbers. Set a unique seed for reproducible image results; defaults to 27.
- **seed\_increment** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images; defaults to 100.
- **n** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Number of images to be generated; defaults to 1.
- **num\_inference\_steps** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference; defaults to 20.
- **size** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The width x height in pixels of the generated image; defaults to 512x512.
- **guidance\_scale** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality; defaults to 6.5.
- **cache\_interval** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – \_description\_; defaults to None.
- **response\_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – “url” or “b64\_json”; defaults to “b64\_json”.

- Raises:

    - [**ImagineException**](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException) – [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - ImageResponse object

- Return type:

    - [*ImageResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImageResponse)

- images\_generate\_stream(*prompt*, *model=None*, *negative\_prompt='blurry'*, *seed=27*, *seed\_increment=100*, *n=1*, *num\_inference\_steps=20*, *size='512x512'*, *guidance\_scale=6.5*, *cache\_interval=None*, *response\_format='b64\_json'*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.images_generate_stream)

    - Invokes images generate endpoint streaming version and returns an Iterable ImageResponse object

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The prompt to guide the image generation.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to be used for generation; defaults to None.
- **negative\_prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Characteristics to avoid in the image being generated; defaults to “blurry”.
- **seed** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The initial value used to generate random numbers. Set a unique seed for reproducible image results; defaults to 27.
- **seed\_increment** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images; defaults to 100.
- **n** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Number of images to be generated; defaults to 1.
- **num\_inference\_steps** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference; defaults to 20.
- **size** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The width x height in pixels of the generated image; defaults to 512x512.
- **guidance\_scale** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality; defaults to 6.5.
- **cache\_interval** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – \_description\_; defaults to None.
- **response\_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – “url” or “b64\_json”; defaults to “b64\_json”.

- Raises:

    - [**ImagineException**](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException) – [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - ImageResponse object

- Return type:

    - [*Iterable*](https://docs.python.org/3/library/typing.html#typing.Iterable "(in Python v3.14)")[[*ImageResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImageResponse)]

- ping()[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.ping)

    - Ping the API to check if the Imagine server is reachable.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - A `PingResponse` object containing status of the server.

- Return type:

    - [*PingResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.PingResponse)

- reranker(*query*, *documents*, *model=None*, *top\_n=None*, *return\_documents=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.reranker)

    - Reranker endpoint receives as input a query, a list of documents, and other arguments such as the model name, and returns a response containing the reranking results.

- Parameters:

    - - **query** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The query as a string.
- **documents** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]*) – The documents to be reranked as a list of strings.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The reranker model to use.
- **top\_n** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned.
- **return\_documents** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to return the documents in the response. Defaults to false

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ReRankerResponse` object: A response object containing the Similarity Score.

- Return type:

    - [*ReRankerResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ReRankerResponse)

- transcribe(*input\_file*, *model=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.transcribe)

    - Transcribe an audio file to text.

- Parameters:

    - - **input\_file** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* [*BinaryIO*](https://docs.python.org/3/library/typing.html#typing.BinaryIO "(in Python v3.14)")) – File object or path to the audio file.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Name of the model generating the text.

- Returns:

    - Response with the transcribed audio.

- Return type:

    - [*TranscribeResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.TranscribeResponse)

- translate(*prompt*, *model*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.translate)

    - Invokes translate endpoint that returns `TranslateResponse` for a given prompt.

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – Prompt text that needs to be translated.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The model to use for translation.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 makes the output more random, while lower values like 0.2 makes it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `TranslateResponse` object

- Return type:

    - [*TranslateResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.TranslateResponse)

- usage(*aggregation\_duration=None*, *since=None*, *until=None*, *model=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/client.html#ImagineClient.usage)

    - Report usage statistics for the user.

- Parameters:

    - - **aggregation\_duration** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*)
- **since** ([*datetime*](https://docs.python.org/3/library/datetime.html#datetime.datetime "(in Python v3.14)") *|* *None*) – Since date to report usage statistics for.
- **until** ([*datetime*](https://docs.python.org/3/library/datetime.html#datetime.datetime "(in Python v3.14)") *|* *None*) – Until date to report usage statistics for.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Filter usage statistics by model type.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - The usage report as a `UsageResponse` object,

- Return type:

    - [*UsageResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.UsageResponse)

## Asynchronous client

- *class* imagine.ImagineAsyncClient(*endpoint=None*, *api\_key=None*, *max\_retries=3*, *timeout=60*, *verify=False*, *max\_concurrent\_requests=64*, *proxy=None*, *debug=False*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient)

    - Asynchronous Imagine client. Provides methods for communicating with the Imagine API using [`asyncio`](https://docs.python.org/3/library/asyncio.html#module-asyncio "(in Python v3.14)").

- *async* chat(*messages*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*, *tools=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.chat)

    - Invokes the non-streaming version of the chat endpoint that returns `ChatCompletionResponse` for a given prompt.

- Parameters:

    - - **messages** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*Any*](https://docs.python.org/3/library/typing.html#typing.Any "(in Python v3.14)")*]*) – A list of chat-messages comprising the conversation so far.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for chat.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.
- **tools** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*,* [*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*,* [*Any*](https://docs.python.org/3/library/typing.html#typing.Any "(in Python v3.14)")*]* *]* *]* *|* *None*) – A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ChatCompletionResponse` object

- Return type:

    - [*ChatCompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionResponse)

- *async* chat\_stream(*messages*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*, *tools=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.chat_stream)

    - Invokes chat endpoint streaming version that returns `ChatCompletionStreamResponse` for a given prompt.

- Parameters:

    - - **messages** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*Any*](https://docs.python.org/3/library/typing.html#typing.Any "(in Python v3.14)")*]*) – A list of chat-messages comprising the conversation so far.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for chat.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ChatCompletionStreamResponse` object

- Return type:

    - [*AsyncGenerator*](https://docs.python.org/3/library/typing.html#typing.AsyncGenerator "(in Python v3.14)")[[*ChatCompletionStreamResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionStreamResponse), None]

- *async* completion(*prompt*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.completion)

    - Invokes completions endpoint non-streaming version that returns `CompletionResponse` for a given prompt.

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – Prompt text for which completion needs to be generated.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for completion.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `CompletionResponse` object

- Return type:

    - [*CompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionResponse)

- *async* completion\_stream(*prompt*, *model=None*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.completion_stream)

    - Invokes completions endpoint streaming version that returns `CompletionResponse` for a given prompt.

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – Prompt text for which completion needs to be generated.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to use for completion.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `CompletionStreamResponse` object

- Return type:

    - [*AsyncGenerator*](https://docs.python.org/3/library/typing.html#typing.AsyncGenerator "(in Python v3.14)")[[*CompletionStreamResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionStreamResponse), None]

- *async* embeddings(*text*, *model=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.embeddings)

    - An embeddings endpoint that returns embeddings for a single text.

- Parameters:

    - - **text** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* [*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]*) – The text to embed
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The embedding model to use

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `EmbeddingResponse`: A response object containing the embeddings.

- Return type:

    - [*EmbeddingResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.EmbeddingResponse)

- *async* get\_available\_models(*model\_type=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.get_available_models)

    - Returns a list of available models.

- Parameters:

    - **model\_type** (*ModelType*  *|* *None*) – Filter models by model type.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Available models.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")]

- *async* get\_available\_models\_by\_type(*model\_type=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.get_available_models_by_type)

    - Returns a mapping of available models by model type.

- Parameters:

    - **model\_type** (*ModelType*  *|* *None*) – [`imagine.ModelType`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ModelType) Filter models by model type.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Available models grouped by model type.

- Return type:

    - [dict](https://docs.python.org/3/library/stdtypes.html#dict "(in Python v3.14)")[*ModelType*, [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")]]

- *async* get\_chat\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.get_chat_history)

    - Returns a list of Chat (response, request) pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of Chat response, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*ChatCompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionResponse) | [*ChatCompletionRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ChatCompletionRequest)]]

- *async* get\_completion\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.get_completion_history)

    - Returns a list of Completion (response, request) pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of Completion response, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*CompletionResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionResponse) | [*CompletionRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.CompletionRequest)]]

- *async* get\_embedding\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.get_embedding_history)

    - Returns a list of Embedding (response, request) pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of Embedding response, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*EmbeddingResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.EmbeddingResponse) | [*EmbeddingRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.EmbeddingRequest)]]

- *async* get\_reranker\_history(*max\_items=1*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.get_reranker_history)

    - Returns a list of ReRanker response, request pairs made by the user.

- Parameters:

    - **max\_items** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")) – The number of items to retrieve

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - Returns a list of ReRanker response, request pairs made by the user.

- Return type:

    - [list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[list](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")[[*ReRankerResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ReRankerResponse) | [*ReRankerRequest*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ReRankerRequest)]]

- *async* health\_check()[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.health_check)

    - Check the health of the server, including databases ands models.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - A `HealthResponse` object containing status of the server.

- Return type:

    - [*HealthResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.HealthResponse)

- *async* images\_generate(*prompt*, *model=None*, *negative\_prompt='blurry'*, *seed=27*, *seed\_increment=100*, *n=1*, *num\_inference\_steps=20*, *size='512x512'*, *guidance\_scale=6.5*, *cache\_interval=None*, *response\_format='b64\_json'*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.images_generate)

    - Invokes the non-streaming version of the images generate endpoint and returns an `ImageResponse` object.

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The prompt to guide the image generation.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to be used for generation; defaults to None.
- **negative\_prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Characteristics to avoid in the image being generated; defaults to “blurry”.
- **seed** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The initial value used to generate random numbers. Set a unique seed for reproducible image results; defaults to 27.
- **seed\_increment** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images; defaults to 100.
- **n** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Number of images to be generated; defaults to 1.
- **num\_inference\_steps** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference; defaults to 20.
- **size** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The width x height in pixels of the generated image; defaults to 512x512.
- **guidance\_scale** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality; defaults to 6.5.
- **cache\_interval** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – \_description\_; defaults to None.
- **response\_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – “url” or “b64\_json”; defaults to “b64\_json”.

- Raises:

    - [**ImagineException**](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException) – [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ImageResponse` object

- Return type:

    - [*ImageResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImageResponse)

- *async* images\_generate\_stream(*prompt*, *model=None*, *negative\_prompt='blurry'*, *seed=27*, *seed\_increment=100*, *n=1*, *num\_inference\_steps=20*, *size='512x512'*, *guidance\_scale=6.5*, *cache\_interval=None*, *response\_format='b64\_json'*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.images_generate_stream)

    - Invokes the streaming version of the images generate endpoint and returns an Iterable ImageResponse object.

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The prompt to guide the image generation.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The model to be used for generation; defaults to None.
- **negative\_prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Characteristics to avoid in the image being generated; defaults to “blurry”.
- **seed** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The initial value used to generate random numbers. Set a unique seed for reproducible image results; defaults to 27.
- **seed\_increment** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The amount by which the seed value increases with each iteration. Adjust this to create a series of visually consistent, yet unique images; defaults to 100.
- **n** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Number of images to be generated; defaults to 1
- **num\_inference\_steps** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – — The total inference steps taken during image generation. More steps usually lead to a higher quality image at the expense of slower inference; defaults to 20.
- **size** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The width x height in pixels of the generated image; defaults to 512x512.
- **guidance\_scale** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality; defaults to 6.5.
- **cache\_interval** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – \_description\_; defaults to None.
- **response\_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – “url” or “b64\_json”; defaults to “b64\_json”.

- Raises:

    - [**ImagineException**](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException) – [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ImageResponse` object

- Return type:

    - [*AsyncGenerator*](https://docs.python.org/3/library/typing.html#typing.AsyncGenerator "(in Python v3.14)")[[*ImageResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImageResponse), None]

- *async* ping()[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.ping)

    - Ping the API to check if the Imagine server is reachable.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - A `PingResponse` object containing status of the server.

- Return type:

    - [*PingResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.PingResponse)

- *async* reranker(*query*, *documents*, *model=None*, *top\_n=None*, *return\_documents=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.reranker)

    - A ReRanker endpoint that returns similarity score for an input pair

- Parameters:

    - - **query** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The query as a string.
- **documents** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]*) – The documents to be reranked as a list of strings.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – The reranker model to use.
- **top\_n** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned.
- **return\_documents** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to return the documents in the response; defaults to false.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `ReRankerResponse` object: A response object containing the Similarity Score

- Return type:

    - [*ReRankerResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ReRankerResponse)

- *async* transcribe(*input\_file*, *model=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.transcribe)

    - Transcribe an audio file to text.

- Parameters:

    - - **input\_file** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* [*BinaryIO*](https://docs.python.org/3/library/typing.html#typing.BinaryIO "(in Python v3.14)")) – File object or path to the audio file.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Name of the model generating the text.

- Returns:

    - Response with the transcribed audio.

- Return type:

    - [*TranscribeResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.TranscribeResponse)

- *async* translate(*prompt*, *model*, *frequency\_penalty=None*, *presence\_penalty=None*, *repetition\_penalty=None*, *stop=None*, *max\_seconds=None*, *ignore\_eos=None*, *skip\_special\_tokens=None*, *stop\_token\_ids=None*, *max\_tokens=None*, *temperature=None*, *top\_k=None*, *top\_p=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.translate)

    - Invokes translate endpoint that returns `TranslateResponse` for a given prompt.

- Parameters:

    - - **prompt** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – Prompt text that needs to be translated.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")) – The model to use for translation.
- **frequency\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim; defaults to None.
- **presence\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics; defaults to None.
- **repetition\_penalty** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values &gt; 1 encourage the model to use new tokens, while values &lt; 1 encourage the model to repeat tokens; defaults to None.
- **stop** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)")*]* *|* *None*) – Sequences where the API will stop generating further tokens. The returned text will contain the stop sequence; defaults to None.
- **max\_seconds** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – TBD; defaults to None.
- **ignore\_eos** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to ignore the end-of-sequence (EOS) token and continue generating tokens after the EOS token is generated; defaults to None.
- **skip\_special\_tokens** ([*bool*](https://docs.python.org/3/library/functions.html#bool "(in Python v3.14)") *|* *None*) – Whether to skip special tokens in the output; defaults to None.
- **stop\_token\_ids** ([*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*list*](https://docs.python.org/3/library/stdtypes.html#list "(in Python v3.14)")*[*[*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)")*]* *]* *|* *None*) – List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens; defaults to None.
- **max\_tokens** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – The maximum number of tokens that can be generated in translation; defaults to None.
- **temperature** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic; defaults to None.
- **top\_k** ([*int*](https://docs.python.org/3/library/functions.html#int "(in Python v3.14)") *|* *None*) – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens; defaults to None.
- **top\_p** ([*float*](https://docs.python.org/3/library/functions.html#float "(in Python v3.14)") *|* *None*) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or <cite>temperature</cite> but not both; defaults to None.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - `TranslateResponse` object

- Return type:

    - [*TranslateResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.TranslateResponse)

- *async* usage(*aggregation\_duration=None*, *since=None*, *until=None*, *model=None*)[\[source\]](https://docs.qualcomm.com/doc/80-88545-1/topic/async_client.html#ImagineAsyncClient.usage)

    - Report usage statistics for the user.

- Parameters:

    - - **aggregation\_duration** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*)
- **since** ([*datetime*](https://docs.python.org/3/library/datetime.html#datetime.datetime "(in Python v3.14)") *|* *None*) – Since date to report usage statistics for.
- **until** ([*datetime*](https://docs.python.org/3/library/datetime.html#datetime.datetime "(in Python v3.14)") *|* *None*) – Until date to report usage statistics for.
- **model** ([*str*](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.14)") *|* *None*) – Filter usage statistics by model type.

- Raises:

    - `ImagineException` [`imagine.ImagineException`](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.ImagineException)

- Returns:

    - The usage report as a `UsageResponse` object.

- Return type:

    - [*UsageResponse*](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html#imagine.UsageResponse)

##  Next steps 

- Review [code samples of basic functions](https://docs.qualcomm.com/doc/80-88545-1/topic/imagine_clients.html#1_0_basic_usage.md) using the classes, method, and parameters for Imagine clients.
- Review the classes that work as [data transfer objects (DTOs)](https://docs.qualcomm.com/doc/80-88545-1/topic/dtos.html).
- Review the classes to use with [LangChain](https://docs.qualcomm.com/doc/80-88545-1/topic/langchain.html).

Last Published: Apr 17, 2026

[Previous Topic
SDK API reference](https://docs.qualcomm.com/bundle/publicresource/80-88545-1/topics/index_api.md) [Next Topic
Data transfer objects and exceptions](https://docs.qualcomm.com/bundle/publicresource/80-88545-1/topics/dtos.md)