# Migration Guide: Notebook → Pipeline

This guide helps you migrate from the traditional notebook-based workflow to
the declarative Pipeline API. The notebook approach requires manually
orchestrating dozens of cells across model loading, adaptation, model
preparation, quantization, and export — with each step requiring significant
boilerplate and careful sequencing.

There are two migration paths depending on how much control you need:

- **Level 2 — Beginner (Pipeline)** (recommended for supported models): Define
your entire workflow in a YAML recipe. The pipeline orchestrates every step
automatically. See [Getting Started with Pipeline](https://docs.qualcomm.com/doc/80-87189-2/topic/pipeline_getting_started.html).
- **Level 4 — Advanced** (for unsupported models or custom workflows): Replace
notebook code with cleaner, typed Python APIs from
`qairt.experimental.pipeline`. Same conceptual steps as the notebook,
but using the new building blocks instead of `genai_lib`. See
[Advanced Usage](https://docs.qualcomm.com/doc/80-87189-2/topic/pipeline_expert_usage.html).

## Key Differences

| Aspect | Notebook Workflow | Level 2 — Beginner (Pipeline) | Level 4 — Advanced |
| --- | --- | --- | --- |
| Model loading | Manual HuggingFace load + model-specific class instantiation | `model_loader` stage (automatic) | `QcAutoModelForCausalLM.from_pretrained(..., model_reauthoring=True)` |
| Adaptations | Manual attention replacement, linear-to-conv, forward function overrides | `apply_default_adaptations: true` | `Adapter.apply_adaptations(model, backend="HTP")` |
| Model preparation | Manual `prepare_model()` call with dummy data and converter args | Automatic (handled within stages) | Not required (handled by reauthoring) |
| Quantization | 15–25 cells: QuantSim setup, SeqMSE, LPBQ, calibration, encoding fixes | `quantization` stage + `recipe_name` | `LPBQ_SeqMSE_Recipe().apply(model, tokenizer, generator, ...)` |
| Compilation | Manual GenAI Builder or QNN conversion cells | `genai_builder` stage | `GenAIBuilderFactory.create(...).build()` |
| Configuration | Environment variables and notebook constants spread across cells | Single YAML recipe | Python function arguments |
| Caching | Manual checkpoint saves between cells | `enable_cache: true` | Not built-in (manual) |
| Evaluation | Custom perplexity evaluation code | `pipe.evaluate()` | `run_evaluation(metrics_config, model, tokenizer, ...)` |
| Resumption | Re-run from a specific cell (error-prone) | `LLMPipeline.load(cache_dir)` | Not built-in (manual) |

## Common Notebook Phases

Notebooks for different models follow the same general structure. The table
below maps each phase to its equivalent in both migration paths.

| Notebook Phase | Activity | Level 2 — Beginner (Pipeline) | Level 4 — Advanced |
| --- | --- | --- | --- |
| Environment setup | Model ID, context length, sequence length, backend, output paths | Recipe top-level config (`model_id_or_path`, `backend`, etc.) | Python variables passed to building block constructors |
| HuggingFace model load | Load pretrained weights, tokenizer, run baseline PPL | `model_loader` stage (automatic in `pipe.construct()`) | `QcAutoConfig.from_pretrained()` + `QcAutoModelForCausalLM.from_pretrained()` |
| Model adaptation | Replace attention modules, linear-to-conv, config attribute overrides | `apply_default_adaptations: true` in `model_loader` stage | `Adapter.apply_adaptations(model, backend="HTP")` |
| Model preparation | `prepare_model()` with static dummy inputs and converter args | Handled internally by the pipeline | Not required (reauthoring produces the prepared model) |
| Quantization setup | QuantSim creation, mixed precision config, embedding bitwidth | `stages.quantization` recipe config | `LLMGenerator(model, tokenizer, sequence_length, context_length)` |
| Quantization calibration | SeqMSE, LPBQ, activation calibration | `recipe_name: lpbq_seqmse` (or other recipe) | `LPBQ_SeqMSE_Recipe().apply(model, tokenizer, generator, ...)` |
| Post-quant evaluation | PPL evaluation of quantized model | `pipe.evaluate()` after `pipe.construct()` | `run_evaluation(metrics_config, model, tokenizer, ...)` |
| Export | ONNX + encodings export | `pipe.export(dir)` | `quant_result.export(path)` |
| Compilation | GenAI Builder build, QNN graph compilation | `genai_builder` stage | `GenAIBuilderFactory.create(...).build()` |
| Generation / validation | Manual inference loop | `pipe.generate(prompt, device=device)` | `container.get_executor(device).generate(prompt)` |

## Variable Mapping

The following table maps the most common notebook environment variables to
their YAML recipe equivalents (Level 2 — Beginner). Variable names vary
slightly between models (e.g., `SEQ_LENGTH` vs `ARN`) but map to the same
recipe keys. For Level 4 — Advanced, these variables are passed directly as
Python arguments.

| Notebook Variable | Recipe YAML Key (Level 2 — Beginner) |
| --- | --- |
| `MODEL_ID` | `model_id_or_path` |
| `CONTEXT_LENGTH` | `generator_config.context_length` |
| `SEQ_LENGTH` / `ARN` / `SEQUENCE_LENGTH` | `generator_config.sequence_length` |
| `BACKEND` | `backend` |
| `SOC` / `CHIPSET` / `PLATFORM_GEN` | `soc_details` |
| `NUM_CALIBRATION_BATCHES` | `stages.quantization.technique_kwargs.seqmse.num_batches` |
| `NUM_SEQMSE_BATCHES` | `stages.quantization.technique_kwargs.seqmse.num_batches` |
| `NUM_SEQMSE_CANDIDATES` | `stages.quantization.technique_kwargs.seqmse.num_candidates` |
| `TORCH_DTYPE` / `DTYPE` | `stages.model_loader.hf_pretrained_kwargs.dtype` |
| `DEVICE_MAP="cuda"` | `stages.model_loader.execution_environment: gpu` |
| `TRUST_REMOTE_CODE=True` | `stages.model_loader.hf_tokenizer_kwargs.trust_remote_code: true` |
| `ATTN_IMPLEMENTATION` | `stages.model_loader.hf_pretrained_kwargs.attn_implementation` |
| `NATIVE_KV` | `stages.genai_builder.native_kv` |
| `WEIGHT_SHARING` | `stages.genai_builder.weight_sharing` |
| `HVX_THREADS` | `stages.genai_builder.compile_options.graphs.hvx_threads` |
| `VTCM_SIZE` | `stages.genai_builder.compile_options.graphs.vtcm_size_in_mb` |
| `ACT_PRECISION` / `ACTIVATION_BITWIDTH` | `stages.genai_builder.calibration_options.act_precision` |
| `APPLY_DECODER_SEQMSE` / `APPLY_LM_HEAD_SEQMSE` | Implicit in `recipe_name: lpbq_seqmse` |
| `APPLY_DECODER_LPBQ` / `APPLY_LM_HEAD_LPBQ` | Implicit in `recipe_name: lpbq_seqmse` |

## Code Comparison

### Before: Notebook Model Loading

A typical notebook loads the model, manually replaces attention modules,
sets backend-specific config attributes, and runs `prepare_model()`:

import torch
    from transformers import AutoConfig, AutoTokenizer
    
    model_id = os.getenv("MODEL_ID")
    context_length = int(os.getenv("CONTEXT_LENGTH", 8273))
    arn = int(os.getenv("ARN", 2073))
    
    config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
    
    # Backend-specific config attribute overrides
    setattr(config, "return_new_key_value_only", True)
    setattr(config, "transposed_key_cache", True)
    setattr(config, "input_tokens_per_inference", arn)
    
    # Load model with model-specific class
    model = ModelSpecificClass.from_pretrained(
        model_id, config=config, torch_dtype=torch.float32,
        attn_implementation="eager",
    )
    tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
    tokenizer.model_max_length = context_length
    
    # Manual attention replacement
    from genai_lib.llm.dev.model_adaptation.<model>.adaptation import (
        QcAttention, QcModel,
    )
    modeling_module.Attention = QcAttention
    modeling_module.Model = QcModel
    
    # Linear-to-conv transformation for NSP backend
    from genai_lib.common.dev.model_adaptation.linear_to_conv import replace_linears_with_convs
    model = replace_linears_with_convs(model)
    
    # Model preparation — static shape graph for HTP
    from qti.aisw.preparer_api import prepare_model
    model.num_logits_to_return = arn
    prepared_model = prepare_model(model, dummy_input, ...)
    Copy to clipboard

**After (Level 2 — Beginner):**

from qairt.experimental.pipeline.torch.llm.pipeline import LLMPipeline
    
    pipe = LLMPipeline.from_pretrained(model_id, recipe="recipe.yaml")
    Copy to clipboard

All adaptation, preparation, and loading is handled by the `model_loader`
stage when `pipe.construct()` runs.

### Before: Notebook Quantization

A typical notebook creates a QuantSim instance, applies sequential techniques,
and calibrates — spread across 15–25 cells:

from aimet_torch.v2.quantsim import QuantizationSimModel
    from aimet_torch.v2.seq_mse import apply_seq_mse, SeqMseParams
    from aimet_torch.v2.quantsim.config_utils import set_grouped_blockwise_quantization_for_weights
    
    quantsim = QuantizationSimModel(model, dummy_input=dummy, ...)
    set_matmul_second_input_producer_to_8bit_symmetric(quantsim)
    set_grouped_blockwise_quantization_for_weights(quantsim, ...)
    
    seqmse_params = SeqMseParams(num_batches=20, num_candidates=30)
    apply_seq_mse(quantsim, dataloader, seqmse_params, ...)
    quantsim.compute_encodings(calibration_callback, ...)
    Copy to clipboard

**After (Level 2 — Beginner):**

stages:
      quantization:
        recipe_name: lpbq_seqmse
        technique_kwargs:
          seqmse:
            num_batches: 20
            num_candidates: 30
    Copy to clipboard

### Before: Full Notebook Workflow

**Notebook:** Dozens of cells of sequential code — environment setup,
adaptation, preparation, QuantSim, SeqMSE, LPBQ, calibration, export,
GenAI Builder compilation.

**After (Level 2 — Beginner):**

from qairt.experimental.pipeline.torch.llm.pipeline import LLMPipeline
    from qairt.api.configs.device import Device
    
    pipe = LLMPipeline.from_pretrained(model_id, recipe="recipe.yaml")
    pipe.construct()
    
    device = Device(type="android", identifier="<serial>@<hostname>")
    result = pipe.generate("Hello, world!", device=device)
    result.print()
    Copy to clipboard

Last Published: Jun 19, 2026

[Previous Topic
When to Use This Approach](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/pipeline_expert_usage.md) [Next Topic
Gen AI Builder](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/guides.md)