# Low-Rank Adaptation (LoRA) Tutorial

This tutorial will walk you through the process of deploying an LLM with LoRA adapters on a Snapdragon device using the QAIRT Tools Python API. For more details, see [LoRA documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/lora_intro.html).

Note

If you would like to skip the breakdown, you can obtain a simplified version of the tutorial in
the QAIRT SDK from the following path:

> 
> 
> - `examples/QAIRT/python/lora_on_device_inference.py`

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning technique that allows you to adapt
large language models for specific tasks without modifying the entire model. This tutorial demonstrates:

1. Creating QAIRT assets with LoRA adapters
2. Building a LoRA-enabled model for HTP backend
3. Performing inference with different LoRA adapters on device

Configurations:

> 
> 
> - Host OS: Linux (x86\_64) with ADB (Android Debug Bridge) installed.
> - Target Devices: Snapdragon Android Device
> - Processor: Qualcomm NPU
> - Backend: HTP

## Step 1: Setup and Prerequisites

We recommend a machine with at least 64 GB of RAM for timely completion of the workflow. If you do not have sufficient RAM, we recommend increasing your swap memory.

> 
> 
> - This tutorial uses a base LLM model (e.g., Llama 3.2-3B) with LoRA adapters. You can download the base model from Hugging Face: [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)
> using a valid license.
> - The tutorial assumes you have obtained:
> 
>     - N+1 ONNX models - Quantized ONNX base model (without any LoRA branches) and N ONNX graphs for each use case.
> 
>         - For example, if there are three adapters (A, B, and C) and the expected use cases are A, B, C, and A+C, then N equals 4.
>     - M LoRA adapter weights in SafeTensors format - 1 per adapter
>     - LoRA configuration files (adapter configs and attach point mappings)
>     - 1 PyTorch to ONNX names mapping for the base graph
>     - N+1 Quantization encodings - For both base model and LoRA use cases
> - **Example input artifacts (before build):**
> 
> 
> <model_exports>/
>            Llama-3.2-3B_base/
>               onnx/
>                  llama3_2_base.onnx
>                  llama3_2_base.encodings
>                  llama3_2_base.data
>                  config.json
>                  tokenizer.json
>                  llama3_2_base_node_mapping.json
>            Llama-3.2-3B_elementary/
>               onnx/
>                  llama3_2_elementary.onnx
>                  llama3_2_elementary.encodings
>                  llama3_2_elementary.data
>                  elementary.json
>            Llama-3.2-3B_long/
>               onnx/
>                  llama3_2_long.onnx
>                  llama3_2_long.encodings
>                  llama3_2_long.data
>                  long.json
>            Llama-3.2-3B_elementary+long/
>               onnx/
>                  llama3_2_elementary_long.onnx
>                  llama3_2_elementary_long.encodings
>                  llama3_2_elementary_long.data
>                  elementary.json
>                  long.json
>            top_level_lora_meta.yaml
>         Copy to clipboard
> 
> 
>     **Key points about input structure:**
> 
>     - Base model contains the unmodified model without LoRA branches
>     - Each LoRA use case directory contains the model with LoRA branches integrated
>     - Multi-adapter use cases (e.g., `elementary+long`) contain combined LoRA branches
>     - `.data` files contain external weight data for the ONNX models
>     - `top_level_lora_meta.yaml` defines all adapters and use cases
> - Upon completion of model preparation and building, the following directory structure is expected:
> 
> 
>     **Example output structure (after build):**
> 
> 
> <container_output>/
>            models/
>               split_0/
>                  model.bin
>               split_1/
>                  model.bin
>                  elementary.bin
>                  long.bin
>                  function.bin
>                  elementary+long.bin
>               split_2/
>                  model.bin
>                  elementary.bin
>                  long.bin
>                  function.bin
>                  elementary+long.bin
>               split_3/
>                  model.bin
>                  elementary.bin
>                  long.bin
>                  function.bin
>                  elementary+long.bin
>               split_4/
>                  model.bin
>               use_cases.json
>         Copy to clipboard
> 
> 
>     **Key points about output structure:**
> 
>     - `split_N/` directories contain the compiled model splits
>     - `model.bin` is the compiled base model for each split
>     - `<use_case>.bin` files are the compiled LoRA adapters for each use case
>     - Not all splits contain LoRA adapters (e.g., split\_0 and split\_4 may only have base model)
>     - `use_cases.json` defines the available LoRA use cases and their configurations
>     - Use case names like `elementary+long` represent multi-adapter combinations
> - The guide uses a **Snapdragon SD 8 Elite (SM8750) Android device** to demonstrate the workflow.

import os
    from pathlib import Path
    
    import qairt
    from qairt import Device, DevicePlatformType
    from qairt.gen_ai_api.builders.gen_ai_builder_htp import GenAIBuilderHTP
    from qairt.gen_ai_api.builders.llama.builder import LlamaBuilderHTP
    from qairt.gen_ai_api.containers.gen_ai_container import GenAIContainer
    from qairt.gen_ai_api.containers.llm_container import LLMContainer
    from qairt.gen_ai_api.executors.t2t_executor import T2TExecutor
    from qairt.gen_ai_api.gen_ai_builder_factory import GenAIBuilderFactory
    from qairt.modules.lora.lora_config import (
            AdapterRunConfig,
            LoraBuilderInputConfig,
            UseCaseRunConfig,
    )
    
    ############################################################
    # Define paths
    # Path to the base model exports directory
    llama3_exports = "./llama_3.2_3b/<your_path>"
    
    # Path to LoRA configuration file (or you can create config programmatically)
    lora_config_path = "./llama_3.2_3b/lora_config.yaml"
    Copy to clipboard

Tip

Set the environment variable **QAIRT\_TMP\_DIR** to define an alternative default temporary directory path.
This is recommended because temporary artifacts are created during build process which may consume temp memory entirely.

os.environ["QAIRT_TMP_DIR"] = "./llm_scratch/"
    Copy to clipboard

Tip

Building can be time and memory consuming, especially for large models. It may be beneficial to stop/resume the building process between steps.
To aid that workflow, the GenAI Builder API provides a caching mechanism to help store intermediate artifacts. Define your cache root here, and
we will pass it into the factory in step 2.

CACHE_ROOT = "./llama3_cache"
    Copy to clipboard

## Step 2: Understanding LoRA configuration

LoRA configuration consists of three main components:

1. **Adapter configuration**: Defines the LoRA adapters with their parameters

    - name: Identifier for the adapter
    - rank: Rank of the low-rank matrices
    - alpha: Scaling factor for the adapter
    - target\_modules: To which model layers to apply the adapter
2. **Use case configuration**: Defines how adapters are combined for specific tasks

    - name: Identifier for the use case
    - adapter\_names: List of adapters to use
    - adapter\_alphas: Scaling factors for each adapter
    - model\_name: Path to the base model
    - encodings: Path to quantization encodings
3. **Attach point mapping**: Maps framework module names to ONNX node names

## Step 3: Load LoRA configuration

If you have a pre-existing lora\_config.yaml file:

lora_input_config = LoraBuilderInputConfig(
            lora_config_path=lora_config_path,
            create_lora_graph=True,  # Set to True to create LoRA max rank-concatenated graph
            quant_updatable_mode="adapter_only",  # Options: "none", "adapter_only", "all"
            alpha_tensor_name="lora_alpha",
    )
    Copy to clipboard

### Understanding Quant Updatable Mode

The `quant_updatable_mode` parameter controls which tensors can be updated:

- **“none”**: No quantization encodings are updatable
- **“adapter\_only”**: Quantization encodings for only lora/adapter branch (Conv-&gt;Mul-&gt;Conv) change across use-case. The base branch quantization encodings remain the same.
- **“all”**: All quantization encodings are updatable.

## Step 4: Obtain a GenAIBuilder Instance

Create a builder instance, using the optional cache root to store intermediate artifacts.

llama_builder: GenAIBuilderHTP = GenAIBuilderFactory.create(llama3_exports, "HTP", cache_root=CACHE_ROOT)
    Copy to clipboard

The factory will inspect the config.json for the model and determine which builder is appropriate.
For this example it will determine that the LlamaBuilderHTP instance is appropriate, and return a
constructed instance of that subclass. Below we add an assert condition which should confirm that
the builder is of the correct type.

assert isinstance(llama_builder, LlamaBuilderHTP)
    Copy to clipboard

## Step 5: Customize the GenAIBuilder

The builder’s compilation configuration requires customization for the intended device. This is needed
to specialize the Ahead-of-Time (AOT) compilation process for the target device. Below we set the target
for a Snapdragon SD 8 Elite (SM8750) device.

llama_builder.set_targets([f"chipset:SM8750"])
    Copy to clipboard

## Step 6: Set LoRA Configuration

Set the LoRA configuration on the builder:

llama_builder.lora_config = lora_input_config
    Copy to clipboard

### What Happens During a LoRA Build?

When you set lora\_config and build, the following steps occur:

1. **LoRA graph creation**: A max-rank concatenated LoRA graph is created

    - All adapters are combined into a single graph structure
    - The graph supports dynamic adapter selection at runtime
2. **Model transformation**: The base model is transformed with LoRA support

    - Embedding and LM head are split (required for LoRA)
    - Multi-head attention is converted to single-head attention (MHA v2)
3. **Conversion**: Both base model and LoRA adapters are converted

    - Base model uses provided encodings
    - LoRA adapters use separate encodings for fine-grained control
4. **Compilation**: The model is compiled for HTP backend

    - Adapter weights are prepared for efficient on-device switching

## Step 7: Build the GenAIContainer

Once a target is set and LoRA configuration is provided, you can trigger the build process to build the LoRA-enabled model into an LLM container object.

llama_lora_container: GenAIContainer = llama_builder.build()
    Copy to clipboard

Note

The container contains everything that is needed to execute on the prepared target. It can be saved to disk and copied to
another location, where it can be loaded to resume operation. The save/load functionality is demonstrated in Step 11.

### Caching with LoRA

LoRA builds can be time-consuming. Use caching to resume builds:

existing_cache_dir = CACHE_ROOT
    genai_builder: GenAIBuilderHTP = GenAIBuilderFactory.create(llama3_exports, "HTP", cache_root=existing_cache_dir)
    genai_builder.lora_config = lora_input_config
    
    # The builder resumes from the last completed stage if configuration hasn't changed
    Copy to clipboard

## Step 8: Set up an Android Device

Connect your Android device via ADB and set the `ANDROID_SERIAL` environment variable.

Obtain the ADB device ID by running:

adb devices
    Copy to clipboard

The output lists connected devices:

List of devices attached
    abcd1234   device
    Copy to clipboard

Set `ANDROID_SERIAL` to the device ID shown:

export ANDROID_SERIAL=abcd1234
    Copy to clipboard

If your device is connected to a remote machine, see the
[remote device troubleshooting](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_builder.html#remote-device-troubleshooting) section in
the [LLM Inference on HTP](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_builder.html#genai-builder) tutorial.

android_serial = os.getenv("ANDROID_SERIAL")
    android_hostname = os.getenv("ANDROID_HOSTNAME")
    
    device_id = f"{android_serial}@{android_hostname}" if android_hostname else android_serial
    android_device = Device(identifier=device_id, type=DevicePlatformType.ANDROID)
    Copy to clipboard

## Step 9: Create an Executor

The container generated in step 7 is used to create an executor. The executor is responsible for interfacing with the device
and performing inference. Each executor is customized to a target and specific inference mode, in this example,
the executor is a *text to text executor*.

from qairt.gen_ai_api.executors.gen_ai_executor import GenAIExecutor, GenerationExecutionResult
    
    llm: GenAIExecutor = llama_lora_container.get_executor(android_device, clean_up=False)
    
    # Ensure the appropriate executor is returned
    assert isinstance(llm, T2TExecutor)
    Copy to clipboard

## Step 10: Generate Text with LoRA Adapters

You can switch between different LoRA use cases at runtime. Below we define the prompt template according to the specification for
the Llama 3.2-3B chat variant.

prompt_template = (
            "<|begin_of_text|>"
            "<|start_header_id|>{system}<|end_header_id|>{system_prompt}<|eot_id|>"
            "<|start_header_id|>{user}<|end_header_id|>{user_prompt}<|eot_id|>"
            "<|start_header_id|>{assistant}<|end_header_id|>"
    )
    Copy to clipboard

### Example 1: Generate with First Use Case

Configure the executor to use the first use case (task\_specific adapter):

# Configure the executor to use the first use case
    use_case_config_1 = UseCaseRunConfig(
            use_case_name="long",
            adapters=[AdapterRunConfig(adapter_name="long", alpha=1.0)],
    )
    
    prompt_1 = prompt_template.format(
            system="system",
            system_prompt="You are a helpful coding assistant.",
            user="user",
            user_prompt="Write a Python function to calculate fibonacci numbers.",
            assistant="assistant",
    )
    
    # Generate text with the task-specific adapter
    result_1: GenerationExecutionResult = llm.generate(prompt_1, lora_config=use_case_config_1)
    Copy to clipboard

The command above will generate the following output:

print("Response with long use case:")
    print(result_1.generated_text)
    Copy to clipboard

**Fibonacci Function in Python**
    =====================================
    Here is an efficient implementation of the Fibonacci function in Python, using a technique called "memoization" to store previously computed values and avoid redundant calculations.
    ```python
    def fibonacci(n, memo={}):
        """
        Calculate the nth Fibonacci number.
        Args:
            n (int): The position of the Fibonacci number to calculate.
            memo (dict): A dictionary to store previously computed Fibonacci numbers (default is an empty dictionary).
        Returns:
            int: The nth Fibonacci number.
        """
        if n in memo:
            return memo[n]
        if n <= 2:
            return 1
        memo[n] = fibonacci(n-1, memo) + fibonacci(n-2, memo)
        return memo[n]
    ```
    **Example Usage**
    ---------------
    ```python
    print(fibonacci(10))  # Output: 55
    print(fibonacci(20))  # Output: 6765
    ```
    **How it Works**
    ----------------
    The `fibonacci` function takes two arguments: `n`, the position of the Fibonacci number to calculate, and `memo`, a dictionary to store previously computed Fibonacci numbers.
    If `n` is already in the `memo` dictionary, we can simply return the stored value.
    If `n` is 0 or 1, we return 1, since these are the base cases of the Fibonacci sequence.
    For larger values of `n`, we calculate the `n`th Fibonacci number by recursively calling `fibonacci` with `n-1` and `n-2` and summing the results. We store the result in the `memo` dictionary before returning it.
    Note that this function uses a recursive approach, which may not be the most efficient way to calculate Fibonacci numbers for very large values of `n`. For larger values, you may want to consider using an iterative approach or a specialized algorithm like the "matrix exponentiation" method.
    Copy to clipboard

Metrics can be inspected and printed to the console:

print("\nMetrics:")
    print(result_1.metrics)
    Copy to clipboard

Timing (microseconds):
    
    Init = 1508001 us
    Prompt Processing Time = 11205564 us
    Token Generation Time = 13130841 us
    Adapter Switch Time = 152369 us
    
    Tokens per second (toks/sec):
    
    Prompt Processing Rate = 386.3987731933594 toks/sec
    Token Generation Rate = 30.615680694580078 toks/sec
    Copy to clipboard

### Example 2: Generate with Multi-Adapter Use Case

Configure the executor to use multiple adapters:

# Configure the executor to use multiple adapters
    use_case_config_2 = UseCaseRunConfig(
            use_case_name="elementary+long",
            adapters=[
                    AdapterRunConfig(adapter_name="elementary", alpha=1.0),
                    AdapterRunConfig(adapter_name="long", alpha=0.5),
            ],
    )
    
    prompt_2 = prompt_template.format(
            system="system",
            system_prompt="You are a medical coding assistant.",
            user="user",
            user_prompt="Explain how to implement a patient data validation function.",
            assistant="assistant",
    )
    
    # Generate text with multiple adapters
    result_2: GenerationExecutionResult = llm.generate(prompt_2, lora_config=use_case_config_2)
    Copy to clipboard

The command above will generate the following output:

print("\nResponse with elementary+long use case:")
    print(result_2.generated_text)
    Copy to clipboard

Implementing a patient data validation function is crucial to ensure that the medical records are accurate, complete, and adhere to the standards set by regulatory bodies such as the International Classification of Diseases (ICD) or the Centers for Disease Control and Prevention (CDC). Here's how you can implement this function:
    
    ### Understanding Validation Rules
    Before you start coding, it's essential to understand the rules for each data field. For example, for a date of birth, you need to ensure that the day is within the valid range (1-31 for most dates), and the month is valid, and the year is valid.
    
    ### Writing the Validation Function
    The function should take a patient's data as input and return an error code if there's a problem with the data. For instance, if the patient's birthdate has an invalid date, you could return a "date error."
    ```python
    def validate_patient_data(date_of_birth, name, gender, medical_condition):
        # Define a dictionary of possible invalid values for date of birth
        invalid_dates = {'01/31': "Invalid date of birth, the 31st is not a valid day of birth."}
        # Check the date of birth for errors
        if not (1 <= date_of_birth.day <= 31 and 1 <= date_of_birth.month <= 12):
            return invalid_dates.get(date_of_birth.strftime("%m/%d"), "Invalid date")
        # Check the name for invalid characters
        invalid_chars = set("!,;:/?_@#")
        if any(char in invalid_chars for char in name):
            return "Invalid characters in name."
        # Check the gender for invalid choices
        invalid_genders = set("invalid gender")
        if gender in invalid_genders:
            return "Invalid gender choice."
        # Check the medical condition for empty strings
        if not medical_condition:
            return "Invalid medical condition."
        # If no errors, return a success message
        return "Patient data is valid."
    # Example usage
    print(validate_patient_data(date_of_birth="12/20", name="John Doe", gender="Male", medical_condition="Hypertension"))
    ```
    ### Implementing the Function
    Once you understand the rules and have written the function, you can implement it in your coding environment. Make sure to include the validation checks in any code that processes patient data.
    
    ### Reviewing Patient Data
    After implementing the validation function, you should regularly review patient data to ensure it's accurate, complete, and adheres to the standards. This ensures that your medical records are reliable and can be used for diagnosis and treatment purposes.
    Remember, the goal of validation is to protect the integrity of your medical records, ensuring that they are correct and can be trusted for the purpose of healthcare.
    Copy to clipboard

print("\nMetrics:")
    print(result_2.metrics)
    Copy to clipboard

Timing (microseconds):
    
    Init = 1541774 us
    Prompt Processing Time = 12576064 us
    Token Generation Time = 18757799 us
    Adapter Switch Time = 151819 us
    
    Tokens per second (toks/sec):
    
    Prompt Processing Rate = 405.67950439453125 toks/sec
    Token Generation Rate = 30.22791862487793 toks/sec
    Copy to clipboard

### Example 3: Dynamic Alpha Adjustment

You can adjust adapter influence by changing alpha values:

use_case_config_3 = UseCaseRunConfig(
            use_case_name="elementary+long",
            adapters=[
                    AdapterRunConfig(adapter_name="elementary", alpha=0.2),  # Reduced influence
                    AdapterRunConfig(adapter_name="long", alpha=1.2),  # Increased influence
            ],
    )
    
    prompt_3 = prompt_template.format(
            system="system",
            system_prompt="You are a helpful assistant.",
            user="user",
            user_prompt="What are best practices for medical software development?",
            assistant="assistant",
    )
    
    result_3: GenerationExecutionResult = llm.generate(prompt_3, lora_config=use_case_config_3)
    print("\nResponse with adjusted alphas:")
    print(result_3.generated_text)
    Copy to clipboard

Best practices for medical software development are crucial to ensure that software is safe, reliable, and effective in supporting medical professionals and improving patient care. Here are some guidelines:
    1. **Follow HIPAA guidelines**: The Health Insurance Portability and Accountability Act (HIPAA) requires healthcare organizations to implement reasonable safeguards to protect patients' protected health information (PHI). Ensure that your software development practices, testing, and deployment processes meet these requirements.
    2. **Secure data storage and transmission**: Implement robust encryption and secure data storage to protect patient data from unauthorized access or breaches. Use secure protocols for transmitting sensitive data, such as encrypted data transmission or secure file sharing.
    3. **Compliance with regulatory requirements**: Familiarize yourself with regulations like HIPAA, ICD-10, and FDA regulations (e.g., FDA's 510(k) clearance for medical devices). Ensure that your software meets these standards and is designed to comply with regulatory requirements.
    4. **Use secure coding practices**: Follow secure coding practices, such as:
            * Input validation and sanitization
            * Error handling and exception handling
            * Secure data access and update
            * Use secure protocols for communication, like HTTPS
    5. **Validate user input**: Verify that user input is validated and sanitized to prevent errors, data corruption, or security vulnerabilities.
    6. **Test thoroughly**: Perform thorough testing, including:
            * Unit testing
            * Integration testing
            * System testing
            * User acceptance testing
            * Penetration testing (e.g., vulnerability testing)
    7. **Follow coding standards and best practices**: Use established coding standards, such as:
            * Follow the 12-factor app approach (for web applications)
            * Follow the 5S (Sort, Set in Order, Shine, Standardize, Sustain) principles
    8. **Use version control**: Use version control systems, such as Git, to track changes, collaborate, and maintain a history of your codebase.
    9. **Use secure deployment practices**: Implement secure deployment practices, such as:
            * Use secure communication protocols (e.g., HTTPS)
            * Use secure update mechanisms (e.g., secure file transfer)
            * Implement rollback strategies in case of deployment issues
    10. **Continuously monitor and improve**: Regularly review and update your software to ensure it remains secure, stable, and compliant with changing regulations and user needs.
    11. **Collaborate with medical professionals**: Engage with medical professionals to ensure that your software meets their needs and is compatible with their workflows.
    12. **Use medical-specific features**: Incorporate features that cater to medical professionals' needs, such as:
            * Electronic health record (EHR) integration
            * Medical terminology and coding
            * Clinical decision support systems (CDSS)
    13. **Stay up-to-date with industry trends and advancements**: Keep up-to-date with the latest advancements in medical software, regulatory requirements, and user needs.
    14. **Consider user experience**: Design your software with the user in mind, focusing on usability, accessibility, and a positive user experience.
    15. **Document and communicate effectively**: Clearly document your software's capabilities, limitations, and requirements, and communicate effectively with stakeholders, users, and regulatory bodies.
    By following these best practices, you can ensure that your medical software is designed, developed, and maintained to support the needs of medical professionals and improve patient care.
    Copy to clipboard

### Performance Considerations

- Switching between use cases has minimal overhead
- The first inference after switching may be slightly slower
- Multiple adapters in a use case have additive computational cost
- Alpha values can be tuned for optimal task performance

The executor pushes artifacts to the device which can be explicitly removed with the following command:

llm.clean_environment()
    Copy to clipboard

## Step 11 (Optional): Exporting the LoRA Container

You can save the LoRA-enabled container for later use. The saved container includes:

- Compiled base model binaries
- LoRA adapter weights for all use cases
- Configuration files for adapter management
- Quantization parameters

llama_lora_container.save("./llama3_lora_container", exist_ok=True)
    Copy to clipboard

Reload the container from a different environment:

from qairt.gen_ai_api.containers.llm_container import LLMContainer
    
    # Load a previously saved LoRA container
    loaded_container = LLMContainer.load("./llama3_lora_container")
    
    # Create a new executor from the loaded container
    llm_reloaded = loaded_container.get_executor(android_device, clean_up=False)
    
    result_reloaded = llm_reloaded.generate(prompt_1, lora_config=use_case_config_1)
    print("\nResponse from reloaded container:")
    print(result_reloaded.generated_text)
    Copy to clipboard

**Fibonacci Function in Python**
    =====================================
    Here is an efficient implementation of the Fibonacci function in Python, using a technique called "memoization" to store previously computed values and avoid redundant calculations.
    ```python
    def fibonacci(n, memo={}):
        """
        Calculate the nth Fibonacci number.
        Args:
            n (int): The position of the Fibonacci number to calculate.
            memo (dict): A dictionary to store previously computed Fibonacci numbers (default is an empty dictionary).
        Returns:
            int: The nth Fibonacci number.
        """
        if n in memo:
            return memo[n]
        if n <= 2:
            return 1
        memo[n] = fibonacci(n-1, memo) + fibonacci(n-2, memo)
        return memo[n]
    ```
    **Example Usage**
    ---------------
    ```python
    print(fibonacci(10))  # Output: 55
    print(fibonacci(20))  # Output: 6765
    ```
    **How it Works**
    ----------------
    The `fibonacci` function takes two arguments: `n`, the position of the Fibonacci number to calculate, and `memo`, a dictionary to store previously computed Fibonacci numbers.
    If `n` is already in the `memo` dictionary, we can simply return the stored value.
    If `n` is 0 or 1, we return 1, since these are the base cases of the Fibonacci sequence.
    For larger values of `n`, we calculate the `n`th Fibonacci number by recursively calling `fibonacci` with `n-1` and `n-2` and summing the results. We store the result in the `memo` dictionary before returning it.
    Note that this function uses a recursive approach, which may not be the most efficient way to calculate Fibonacci numbers for very large values of `n`. For larger values, you may want to consider using an iterative approach or a specialized algorithm like the "matrix exponentiation" method.
    Copy to clipboard

llm_reloaded.clean_environment()
    Copy to clipboard

Last Published: May 26, 2026

[Previous Topic
GGUF Calibration for Activation Encodings](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/gguf_calibration.md) [Next Topic
Speculative Decoding Tutorial](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/speculative_decoding_tutorial.md)