# Add guardrails to an LLM

Add input and output guardrails to the large language model (LLM) to help steer the LLM to only generate contextually appropriate answers and not stray from your organization’s designated purpose.
In this example, a wrapped `ImagineClient` is combined with a retrieval-augmented generation (RAG) system and basic `topical_guards` functionality so that you can specify blocked content and topics. Use the same system to recognize defined inputs like greetings and respond in a pre-defined manner.

Because guardrails add more complexity and processing to the LLM, they can increase cost and latency. Despite the cost increase, the use of guardrails can produce long-term savings by preventing long, useless responses or responses that do not meet your organization’s guidelines. In addition, the use of RAG lets you embed a smaller, faster model as the first call to reduce the usage of the LLM.

## Create a vector store and define the guardrail functions

This section provides examples of all of the set up steps necessary to implement and define the guardrail functions. The [completion examples](https://docs.qualcomm.com/doc/80-88545-1/topic/guarded_llm_example.html#completion-examples) call the functions created in this section to validate the user’s prompt against the guardrails.

### Setup the environment and import libraries

The code in this example imports the libraries and modules to work with the Imagine APIs and LangChain framework.

import os
    from typing import List,Dict,Any
    import shutil
    from dotenv import load_dotenv
    
    from imagine.client import ImagineClient
    from imagine.langchain import ImagineEmbeddings
    
    from imagine.exceptions import (
        ImagineException,
    )
    from imagine.types.chat_completions import (
        ChatCompletionRequest,
        ChatCompletionResponse,
        ChatMessage,
    )
    from imagine.types.completions import (
        CompletionRequest,
        CompletionResponse,
    )
    
    from langchain_core.documents import Document
    from langchain.text_splitter import CharacterTextSplitter
    from langchain_community.vectorstores import Chroma
    Copy to clipboard

### Create a RAG vector store

Create a vector store so that you can store predefined content that is either allowed or disallowed and later perform RAG. The functions involved are:

- `create_documents`: Creates the documents from a directory containing the text files.
- `create_vector_store`: Creates the vector store from the documents.
- `query_vector_store`: Queries the vector store and retrieves the relevant documents.

'''
    Functions to enable RAG - see simple RAG example for further details
    '''
    # Create documents from all the files in the directory
    def create_documents(transcript,type=None):
    
        def get_text_chunks(text,type=None):
            text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=250)
            docs = [Document(page_content=x,metadata={"type": "" if type is None else type}) for x in text_splitter.split_text(text)]
            return docs
        docs = get_text_chunks(transcript,type)
        return docs
    
    # Create documents from all the files in the directory
    def create_vector_store(docs,db_dir,store_name,embedding_fn):
        persistent_directory = os.path.join(db_dir, store_name)
        print(f"Persistent directory: <path/to/your/persistent_directory>")#{persistent_directory}")
        if  os.path.exists(persistent_directory):
            print(f"\n--- Removing old vector store {store_name} ---")
            shutil.rmtree(persistent_directory, ignore_errors=True)
        if not os.path.exists(persistent_directory):
            print(f"\n--- Creating vector store {store_name} ---")
            Chroma.from_documents(docs, embedding_fn, persist_directory=persistent_directory)
            print(f"--- Finished creating vector store {store_name} ---")
        else:
            print(
                f"Vector store {store_name} already exists. No need to initialize.")
    
    # Query Vector store given the store name, query and embedding function
    def query_vector_store(query,db_dir,store_name,embedding_fn, k = 2, threshold = 0.3):
        persistent_directory = os.path.join(db_dir, store_name)
        if os.path.exists(persistent_directory):
            db = Chroma(persist_directory=persistent_directory, embedding_function=embedding_fn)
            
            retriever = db.as_retriever(
                search_type="similarity_score_threshold",
                search_kwargs={"k": k, "score_threshold": threshold},
            )
            
            relevant_docs = retriever.invoke(query)
            return relevant_docs
        else:
            print(f"Vector store {store_name} doesn't exist.")
    Copy to clipboard

### Define the topical guardrails

Topical guardrails are optional guardrail functionality that help keep the LLM on topic and prevent it from responding to disallowed content. This example defines the `_topical_guardrail` function and defines a messages list to send to the LLM with the goal of having the LLM quantify if the content in the user’s prompt is allowed or not. If the user’s prompt or LLM response contains a `{topic}` that is disallowed, then an exception is raised and a message is returned to the user that the question isn’t allowed. For production environments, prompt tuning the `messages` sent to the LLM is essential to ensure the most accurate, relevant, and effective output.

def _topical_guardrail(client,prompt:str,topic:str):
        messages = [
            {
                "role": "system",
                "content": f"You are an expert content checker. You determine whether the user is asking about specific content. You respond Yes if they are asking about that content and No otherwise! Do not answer any question other than Does the user discuss {topic}",
            },
            {"role": "user", "content": f"The user says : {prompt}:"},
            {
                "role": "assistant",
                "content": f"Does the user discuss {topic}?[yes/no] : answer[yes/no]:",
            },
        ]
        
        model = client.default_model_llm
        
        request_body = ChatCompletionRequest(
            messages=messages,model=model,
            stream=False,temperature=0
        ).model_dump(exclude_none=True)
    
        response = client._request("post", "chat/completions", request_body)
        if not response:
            raise ImagineException("No response received")
    
        if not isinstance(response, dict):
            raise ImagineException("Unexpected response body")
        
        return ChatCompletionResponse(**response)
    
        return response
    Copy to clipboard

### Define the input and output checks

The examples in this section demonstrate how to create guardrails by defining the input and output checks. The first part of the code defines `check_input_completion` to function as the input guardrail. It does the following:

- Specifies the maximum number of documents and minimum similarity score to include in its query.
- Queries the vector store for documents similar to the user’s `prompt`.

    - If all of the documents are `greeting`, it responds with the `CompletionResponse`.
    - If any of the `relevant_docs` are similar to the disallowed content, it raises an exception and denies the request.
- Calls the `_topical_guardrail` function to check using direct LLM questioning if the prompt contains a `{topic}` that is in a list of prohibited content. This operation is more expensive than querying the vector store because it requires the LLM to generate a text response.

    - If `yes`, it raises an exception and denies the request.
    - If `no`, the code continues.

def check_input_completion(client,prompt:str,topical_guards:List[str]=None):
    
        relevant_docs = query_vector_store( prompt,db_dir,store_name,embedding_fn, k = 3,threshold=.5)# self.query_vector_store( prompt, k = 3,threshold=.3)
    
        if len(relevant_docs)>=1:
            greeting = True
            for d in relevant_docs:
                greeting &= d.metadata.get('type') == "greeting"
            if greeting:
                return CompletionResponse(**{'choices': [{'finish_reason': 'stop', 'index': 0, 
                                                   'text': ' Hello I am Imagine!'}],
                                      'created': 0.0, 'id': '0', 'model': 'Llama-3-8B', 'object': 'completion', 'generation_time': 0.0, 'usage': {'completion_tokens': 0, 'prompt_tokens': 0, 'total_tokens': 0},
                                      'ts': '2024-09-02T15:45:34.556072946Z'})
            raise ImagineException(f"Input like {prompt} are Not Allowed")
    
        if topical_guards == None:
            topical_guards = self.prohibited_content['input']
        
        for topic in topical_guards:
            topic_rail_response = _topical_guardrail(client,prompt,topic)
            if "Yes" in topic_rail_response.first_content or "yes" in topic_rail_response.first_content:
                raise ImagineException(f"Questions about {topic} are Not Allowed")
        return None

    Copy to clipboard

This part of the code defines the `check_output` function to ensure that the output doesn’t contain content related to a `topic` in the `topical_guards` list. If `yes`, an exception is triggered but other behavior is possible.

def check_output(client,prompt:str,topical_guards:List[str]=None):
    
        for topic in topical_guards:
            topic_rail_response = _topical_guardrail(client,prompt,topic)
            if "Yes" in topic_rail_response.first_content:
                raise ImagineException(f"Questions about {topic} are Not Allowed")
    Copy to clipboard

### Configure the environment variables and endpoint

The examples in this section enable the guardrail system by configuring the necessary credentials and setting up the vector store.

1. Use the `dotenv` library to store your environment variables, which you can set using the `load_dotenv` function.

load_dotenv()
    Copy to clipboard

2. The following example sets the API key, API endpoint, identifies the Chroma vector database, and initializes the guardrails.

api_key = os.getenv("IMAGINE_API_KEY")
    endpoint = os.getenv("IMAGINE_API_ENDPOINT")
    
    store_name = "topical_guards_vector_db"
    
    # enter the endpoint 
    db_dir = f"{os.getenv('IMAGINE_GUARDRAILS_CHROMA_DB')}/{store_name}"
    client = ImagineClient(endpoint, api_key, max_retries=3,timeout = 60,verify = False,)
    embedding_fn = ImagineEmbeddings(api_key=api_key,verify=False,endpoint=endpoint)
    Copy to clipboard

### Create the vector store for disallowed content

This example uses the `create_documents` and `create_vector_store` functions that were created previously to create a list of documents tagged as either `disallowed` or `greeting`. This list is passed to `create_vector_store` to build the Chroma vector database.

docs = ["Cheese","Spanish Cheese","French Cheese","English Cheese","Bees","BeeKeeping", "Bee Keeping","insults"]
    vdbdocs = []
    for d in docs:
        created_docs = create_documents(d,"disallowed")
        for d in created_docs:
            vdbdocs.append(d)
    docs = ["Hello.","Salutations","How are you today?"]
    for d in docs:
        created_docs = create_documents(d,"greeting")
        
        for d in created_docs:
            print(d)
            vdbdocs.append(d)
    create_vector_store(vdbdocs,db_dir,store_name, embedding_fn,)
    Copy to clipboard

### Define the `completion` function

This example shows how you can define a `completion` function that another part of your application can call in response to user prompts. The function calls the `check_input_completion` function that was defined previously to query the vector database and topical guardrails to see if the prompt is a greeting or disallowed content. If the user prompt is a greeting, a response is immediately returned with the `completion` function. If it’s disallowed, it raises an exception. See [define input and output checks](https://docs.qualcomm.com/doc/80-88545-1/topic/guarded_llm_example.html#input-output-checks) for more information about `check_input_completion`.

def completion(client,prompt: str,topical_guards):
        try:
            response = check_input_completion(client,prompt,topical_guards['input'])
            if response is None:
                response = client.completion(prompt)
                check_output(client,response.first_text,topical_guards['output'])
            
            return response
            #check_output(client,prompt,topical_guards['output'])
        except Exception as e:
            #print(e)
            raise e
        
    Copy to clipboard

## Putting it all together: Completion examples with and without guardrails

The following examples build on all of the previous examples in this section to demonstrate how the LLM uses guardrails to prevent disallowed input and output.

### Completion without guardrails

This example shows the native `client.completion` method with no guardrails. Because there are no guardrails, the LLM generates a  `CompletionResponse()` as normal with metadata and the text of generated content.

client.completion("WiFi was first defined in 1997 by the IEEE 802.11 standard ")
    Copy to clipboard

CompletionResponse(id='cmp-73d32696-b038-488b-b8de-e52908fd9ace', object='completion', created=1725364700.0, model='Llama-3-8B', choices=[CompletionResponseChoice(index=0, text='1. Since then, WiFi has become a ubiquitous technology, used by billions of people around the world. WiFi is a wireless networking technology that allows devices to connect to the internet or communicate with each other without the use of cables or wires.\nWiFi uses radio waves to transmit data between devices. It operates on a specific frequency band, typically 2.4 GHz or 5 GHz, and uses a protocol called CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) to manage data transmission', finish_reason=<FinishReason.stop: 'stop'>)], usage=UsageInfo(prompt_tokens=18, total_tokens=118, completion_tokens=100), generation_time=2.6121716499328613)
    Copy to clipboard

### Completion with guardrails and disallowed output

This example calls the `completion(client, prompt, topical_guards)` function that was created previously. The exception is triggered because the output would include WiFi which is explicitly defined as a topical guard for the output. The exception message is that “Questions about WiFi are Not Allowed.”

try:
        completion(client,"WiFi was first defined in 1997 by the IEEE 802.11 standard ",{"input":[],"output":["IEEE standards",'WiFi','WiFi was first defined in']})
    except Exception as e:
        print(e)
    Copy to clipboard

/prj/qct/wise_scratch/environment/envs/imagine-sdk/lib/python3.12/site-packages/langchain_core/_api/deprecation.py:139: LangChainDeprecationWarning: The class `Chroma` was deprecated in LangChain 0.2.9 and will be removed in 0.4. An updated version of the class exists in the langchain-chroma package and should be used instead. To use it run `pip install -U langchain-chroma` and import as `from langchain_chroma import Chroma`.
      warn_deprecated(
    /prj/qct/wise_scratch/environment/envs/imagine-sdk/lib/python3.12/site-packages/langchain_core/vectorstores/base.py:796: UserWarning: No relevant docs were retrieved using the relevance score threshold 0.5
      warnings.warn(
    Copy to clipboard

Questions about WiFi are Not Allowed
    Copy to clipboard

### Completion with guardrails and disallowed input

In this example, the exception is triggered because the prompt includes “cheese” which was defined as disallowed content in the [create vector store for disallowed content](https://docs.qualcomm.com/doc/80-88545-1/topic/guarded_llm_example.html#vector-store) example.  The exception message says that the input isn’t allowed.

try:
        completion(client,"I like cheese, do you like Spanish cheese?",{"input":[],"output":["IEEE standards",'WiFi','WiFi was first defined in']})
    except Exception as e:
        print(e)
    Copy to clipboard

Input like I like cheese, do you like Spanish cheese? are Not Allowed
    Copy to clipboard

The embeddings-based check from the LLM can be quite broad and the matching doesn’t have to be exact. The `similarity_score_threshold` defined earlier controls if a prompt is similar enough to trigger an exception. In this example, the prompt isn’t allowed because it matches `insults` which was defined as disallowed content in the [create vector store for disallowed content](https://docs.qualcomm.com/doc/80-88545-1/topic/guarded_llm_example.html#vector-store) example.

try:
        completion(client,"You are Ugly!",{"input":[],"output":["IEEE standards",'WiFi','WiFi was first defined in']})
    except Exception as e:
        print(e)
    Copy to clipboard

Input like You are Ugly! are Not Allowed
    Copy to clipboard

### Completion with a `greeting` detected

In this example the prompt is identified as a `greeting` and the `check_input_completion` prints the predefined text for the response.

try:
        print(completion(client,"Hello, How are you?",{"input":[],"output":["IEEE standards",'WiFi','WiFi was first defined in']}).first_text)
    except Exception as e:
        print(e)
    Copy to clipboard

Hello I am Imagine!
    Copy to clipboard

### Completion passes all checks

This example shows a completion request for a prompt that is neither a `greeting` nor disallowed content. Because the input guardrail isn’t triggered, the prompt is passed to the LLM as a standard `client.completion` method to generate output. If the output also doesn’t contain disallowed content, then the completion response is printed as it is in this example.

try:
        print(completion(client,"Qualcomm is a large company",{"input":[],"output":["IEEE standards",'WiFi','WiFi was first defined in']}).first_text)
    except Exception as e:
        print(e)
    Copy to clipboard

/prj/qct/wise_scratch/environment/envs/imagine-sdk/lib/python3.12/site-packages/langchain_core/vectorstores/base.py:796: UserWarning: No relevant docs were retrieved using the relevance score threshold 0.5
      warnings.warn(
    Copy to clipboard

that specializes in designing and manufacturing semiconductors, particularly for mobile devices. The company was founded in 1985 and is headquartered in San Diego, California. Qualcomm is known for its Snapdragon processors, which are used in many smartphones and tablets. The company also develops and licenses wireless technology, including CDMA, WCDMA, and LTE.
    Qualcomm has a diverse portfolio of products and services, including:
    1. Snapdragon processors: Qualcomm's Snapdragon processors are used in many smartphones and tablets,
    Copy to clipboard

### Add guardrails to other API calls

The previous examples in this section added guardrails to the `client.completion`, but you can also use guardrails with other API endpoints. This example defines a function to create input guardrails for the `client.chat` method.  If you change the method, ensure that you also change the expected response type object, which  is `ChatCompletionResponse` for chat. Providing the expected response type object is particularly important when overriding the LLM response in favor of defined input responses. Note that when using the embedding endpoint, only input guardrails are relevant because the output is expected to be an embedded vector. Similar reasoning applies to the image generation endpoint.

def check_input_chat(client,prompt:str,topical_guards:List[str]=None):
    
        relevant_docs = query_vector_store( prompt,db_dir,store_name,embedding_fn, k = 3,threshold=.5)# self.query_vector_store( prompt, k = 3,threshold=.3)
    
        if len(relevant_docs)>=1:
            greeting = True
            for d in relevant_docs:
                greeting &= d.metadata.get('type') == "greeting"
            if greeting:
                
                return ChatCompletionResponse(**{'choices': [{'finish_reason': 'stop','index': 0,'message': {'role': 'assistant','content': "Hello, I am Imagine!"}}],
                                                 'created': 0.0, 'id': '0', 'model': 'Llama-3-8B', 'object': 'chat.completion',
                                                 'usage': {'completion_tokens': 0, 'prompt_tokens': 0, 'total_tokens': 0},'ts': '2024-09-03T10:59:55.105961418Z'})
            raise ImagineException(f"Input like {prompt} are Not Allowed")
    
        if topical_guards == None:
            topical_guards = self.prohibited_content['input']
        
        for topic in topical_guards:
            topic_rail_response = _topical_guardrail(client,prompt,topic)
            if "Yes" in topic_rail_response.first_content or "yes" in topic_rail_response.first_content:
                raise ImagineException(f"Questions about {topic} are Not Allowed")
        return None
    Copy to clipboard

This example wraps the `chat` function with the guardrails functionality defined in the previous example.

def chat(client,prompt: str,topical_guards):
        try:
            response = check_input_chat(client,prompt,topical_guards['input'])
            if response is None:
                response = client.chat(messages=[ChatMessage(role="user", content=prompt)])
                check_output(client,response.first_content,topical_guards['output'])
            
            return response
        except Exception as e:
            raise e
    Copy to clipboard

This example calls the `chat` function and raises an exception.

try:
        chat(client,"WiFi was first defined in ",{"input":[],"output":["IEEE standards",'WiFi']})
    except Exception as e:
        print(e)
    Copy to clipboard

/prj/qct/wise_scratch/environment/envs/imagine-sdk/lib/python3.12/site-packages/langchain_core/vectorstores/base.py:796: UserWarning: No relevant docs were retrieved using the relevance score threshold 0.5
      warnings.warn(
    Copy to clipboard

Questions about WiFi are Not Allowed
    Copy to clipboard

In this example the prompt is identified as a “greeting” and  `check_input_chat` prints the predefined text for the response.

chat(client,"Hello! ",{"input":[],"output":["IEEE standards",'WiFi']}).first_content
    Copy to clipboard

'Hello, I am Imagine!'
    Copy to clipboard

## Next steps

- Review how to [use the Qualcomm AI Inference Suite with ChromaDB and LangChain Community for RAG](https://docs.qualcomm.com/doc/80-88545-1/topic/guarded_llm_example.html#rag_with_chromadb.md).
- Learn how to [configure logging to debug applications](https://docs.qualcomm.com/doc/80-88545-1/topic/6_0_logging.html).

Last Published: Apr 17, 2026

[Previous Topic
RAG with ChromaDB and LangChain Community](https://docs.qualcomm.com/bundle/publicresource/80-88545-1/topics/rag_with_chromadb.md) [Next Topic
Configure logging to debug applications](https://docs.qualcomm.com/bundle/publicresource/80-88545-1/topics/6_0_logging.md)