# Resource Profiler

The Resource Profiler captures **RAM (PSS), swap, GPU memory, and wall-clock time** for any instrumented workflow in the QAIRT toolkit. Use it to identify memory bottlenecks and measure execution times during model conversion, compilation, inference, and other resource-intensive operations.

## Key features

- Zero overhead when disabled — profiling decorators become no-op passthroughs
- Automatically instruments QAIRT workflows that use the `@resource_profile` decorator
- Rich terminal reports and structured JSON output
- Tracks peak memory watermarks to help size your build machine

## Enabling profiling

Profiling is **opt-in at runtime** via programmatic controls. When not enabled, profiling is completely disabled with no performance impact.

**Global enable/disable:**

from qairt.api.profiler.resource_profiler import enable_profiling, disable_profiling
    
    enable_profiling()   # Turn on profiling globally
    # ... run your workflow ...
    disable_profiling()  # Turn off profiling
    Copy to clipboard

**Scoped profiling:**

from qairt.api.profiler.resource_profiler import profiling_scope
    
    with profiling_scope():
        # Only code inside this block is profiled
        ...
    Copy to clipboard

The `profiling_scope()` context manager automatically restores the previous profiling state when exiting the block.

## Using the profiler

Any QAIRT API that is decorated with `@resource_profile` is automatically profiled when profiling is enabled. Enable profiling before running your workflow — either by calling `enable_profiling()` or by wrapping your code in a `profiling_scope()` block.

Built-in profiled operations include:

- **Model compilation** (`compile`) — via `qairt.api.compiler`
- **Model inference** (`inference`) — via `qairt.api.model`
- **Gen AI Builder steps** — all build stages (see [LLM Inference on HTP](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_builder.html#genai-builder))

## Example: Profiling model compilation

from qairt import convert, compile
    from qairt.api.profiler.resource_profiler import ResourceProfiler, profiling_scope
    
    with profiling_scope():
        # Convert and compile the model (automatically profiled)
        model = convert("model.onnx", backend="HTP")
        compiled = compile(model, backend="HTP")
    
    # View the profiling report
    report = ResourceProfiler().report()
    Copy to clipboard

The report is printed automatically to the terminal when `report()` is called.

## Instrumenting custom code

You can profile your own functions and classes using the `@resource_profile` decorator:

from qairt.api.profiler.resource_profiler import resource_profile
    
    # Profile a single function
    @resource_profile(scope="function", event="my_app.preprocess")
    def preprocess_data(data):
        ...
    
    # Profile all public methods of a class
    @resource_profile
    class MyPipeline:
        def step_one(self):
            ...
    
        def step_two(self):
            ...
    Copy to clipboard

Decorator options:

- `scope="class"` (default): Instruments all public methods of a class
- `scope="function"`: Instruments only the decorated function
- `event`: Custom event name (defaults to the qualified function/method name)
- `flush_ram=True`: Run `gc.collect()` before each snapshot for more accurate deltas

## Output

### Metrics captured

For each profiled build step, the profiler records:

| Metric | Description |
| --- | --- |
| RAM (PSS) at entry | Baseline memory before the step runs |
| RAM delta | Memory change during the step (uses peak watermark) |
| Swap at entry | Baseline swap usage |
| Swap delta | Swap change during the step (uses peak watermark) |
| GPU memory | GPU allocation delta (when CUDA is available) |
| Wall-clock time | Elapsed time (`time.perf_counter()`) |

### Terminal report

Calling `ResourceProfiler().report()` displays a rich formatted summary panel followed by a detailed events table:

+-------------------- Memory Profiling Summary ---------------------+
    | Peak RAM:   65562.0 MB  (genaibuilder.convert [4/6])             |
    | Peak Swap:  0.0 MB                                               |
    | Peak GPU:   33.1 MB  (GenAIBuilderHTP.set_transformation_options)|
    | Total Time: 40m 33.1s                                            |
    | Peak Time:  9m 16.7s  (genaibuilder.transform)                   |
    | Total RAM:  -24047.9 MB                                          |
    +------------------------------------------------------------------+
    
                                   Events
    +----+----------------------------+----------+-----------+-----------+-----------+----------+
    | #  | Event                      | Type     | RAM Entry | RAM Delta | RAM Exit  | Duration |
    |    |                            |          |      (MB) |           |      (MB) |          |
    +----+----------------------------+----------+-----------+-----------+-----------+----------+
    | 1  | genaibuilder.transform     | function |  47834.7  | -12472 MB |  35362.1  | 9m 16.7s |
    +----+----------------------------+----------+-----------+-----------+-----------+----------+
    | 2  | genaibuilder.convert [1/6] | function |  42918.0  | +8274.4MB |  51192.4  | 4m 6.6s  |
    +----+----------------------------+----------+-----------+-----------+-----------+----------+
    | 3  | genaibuilder.compile [1/3] | function |  54915.1  | +5315.3MB |  60230.4  | 4m 10.8s |
    +----+----------------------------+----------+-----------+-----------+-----------+----------+
    | .. | ...                        |          |           |           |           |          |
    +----+----------------------------+----------+-----------+-----------+-----------+----------+
    Copy to clipboard

The summary panel shows overall peak metrics, while the events table lists each profiled step with RAM entry/exit, deltas, and duration. When GPU profiling is available, additional GPU columns are displayed.

Tip

The event name in parentheses after each peak metric identifies which step triggered that peak. Use this to pinpoint the most resource-intensive step in your workflow. Repeated events are disambiguated with `[N/M]` suffixes (e.g. `genaibuilder.convert [4/6]`).

### Saving results to JSON

Use `report.dump()` to save profiling results for later analysis:

report = ResourceProfiler().report()
    
    # Write to file
    report.dump("build_profile.json")
    
    # Or get as JSON string
    json_str = report.dump_json()
    Copy to clipboard

JSON output structure:

{
        "system_capacity": {
            "cpu_count": 16,
            "cpu_freq_mhz": 3600.0,
            "total_ram_gb": 64.0,
            "swap_total_gb": 8.0
        },
        "summary": {
            "ram_peak_bytes": 9126805504,
            "ram_peak_event": "compile",
            "swap_peak_bytes": 2254857830,
            "swap_peak_event": "compile",
            "gpu_peak_bytes": 0,
            "gpu_peak_event": ""
        },
        "events": [
            {
                "event": "compile",
                "type": "function",
                "ram_entry_bytes": 2147483648,
                "ram_delta_bytes": 6979321856,
                "swap_entry_bytes": 0,
                "swap_delta_bytes": 2254857830,
                "gpu_entry_bytes": 0,
                "gpu_delta_bytes": 0,
                "gpu_peak_bytes": 0,
                "wall_time_s": 750.0
            }
        ]
    }
    Copy to clipboard

### Programmatic access

report = ResourceProfiler().report(print_report=False)
    
    report.ram_peak       # overall RAM peak in bytes
    report.swap_peak      # overall swap peak in bytes
    report.gpu_peak       # overall GPU peak in bytes
    report.events         # list of ProfileMarker objects
    
    for event in report.events:
        print(f"{event.event_name}: {event.duration_s:.1f}s, RAM +{event.ram_delta_bytes / 1e9:.2f} GB")
    Copy to clipboard

## Recommendations

Based on profiling results, consider the following to optimize your workflow:

- **High RAM peak during compile**: Increase system RAM or swap space. For large models, 64 GB or more of RAM is recommended.
- **High swap usage**: Swap usage indicates the system is running low on RAM. This significantly slows down execution. Add more RAM or reduce model size.
- **Long wall-clock times**: Use caching mechanisms (where available) to skip already-completed steps on subsequent runs.

Last Published: May 26, 2026

[Previous Topic
Utilities](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/guides.md) [Next Topic
Logging Configuration](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/qairt-logging-utility.md)