# Tutorial: Turning on various optimization on HTP and HTP MCP Backends

Note

This page applies to Auto platforms (SA-series SoCs). For standard
Android/Windows/Linux targets, see the main [Tutorials](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tutorials.html) page.

The following tutorial will explain how to turn on and prepare optimized graphs on HTP and HTP MCP Backends.

The sections of the tutorial are as follows:

1. [Optimization levels](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_auto_optimization.html#optimization-levels)
2. [P points](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_auto_optimization.html#p-points)
3. [HTP Performance Estimates](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_auto_optimization.html#htp-performance-estimates)

## Optimization levels

For automotive, HTP supports different graph optimization levels. Level 3 optimization (O=3) *may* yield the most optimal graph. However, experimentation is required as the highest level of optimization
is not always guaranteed to give the best performance.

When creating serialized context binary with qnn-context-binary-generator, backend extension
parameters can be specified using in the “–config\_file” argument.  Its full documentation can
be found in [QNN HTP Backend Extensions](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_backend.html#qnn-htp-backend-extensions) section.

As shown in the sample HTP backend JSON config below, to enable a graph for O=3, specify the
optimization “O” value as 3:

**htp\_context.json**

{
        "graphs": [
              {
                  "vtcm_mb": 8,
                  "O": 3,
                  "graph_names": [
                      "qnn_model"
                      ]
              },
        ],
        "devices":[
            {
                "dsp_arch": "v68",
                "soc_id": 62,
                "pd_session": "unsigned",
                "device_id": 0
            }
        ]
    }
    Copy to clipboard

When preparing a graph using O=3, specifying the correct device “soc\_id” matching the target to use could turn on additional
algorithm(s) which may further improve inference performance. Please consult the above htp-target-table to select the appropriate DSP architecture (“dsp\_arch”) value and SoC ID (“soc\_id”) value.

In terms of C API, the value 3 for “O” is from QNN\_HTP\_GRAPH\_OPTIMIZATION\_TYPE\_FINALIZE\_OPTIMIZATION\_FLAG field of exhale\_struct\_structQnnHtpGraph\_\_OptimizationOption\_\_t. More details can be found in [QNN HTP Backend API](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_backend.html#qnn-htp-backend-api) section.

To prepare a context binary with HTP optimization related parameters, use qnn-context-binary-generator with –config\_file argument and give path to htp\_context.json.

${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator
        --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so
        --model model.so
        --binary_file model.serialized
        --profiling_level basic
        --config_file htp_context.json
    Copy to clipboard

Similarly for HTP MCP backend, to enable a graph for O=3, specify the optimization “O” value as 3:

**graph\_prepare (2 files)**

> 
> 
> 1. graph\_prepare.json:
> 
> 
> 
> 
> > 
> > 
> > {
> >         "backend_extensions": {
> >             "shared_library_path": "libQnnHtpMcpNetRunExtensions.so",
> >             "config_file_path"   : "graph_prepare.conf"
> >         }
> >     }
> >     Copy to clipboard
> 
> 
> 2. graph\_prepare.conf (linked to from graph\_prepare.json above):
> 
> 
> 
> 
> > 
> > 
> > {
> >         "graphs": [
> >             {
> >                 "graph_names": [
> >                     "qnn_model"
> >                 ],
> >                 "O": 3
> >             }
> >         ]
> >     }
> >     Copy to clipboard

Next, to generate the serialized context binary, specify the **graph\_prepare.json** file using the –config\_file flag as follows:

$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
    $ cp ${QNN_SDK_ROOT}/lib/hexagon-v68/unsigned/libQnnHtpMcpV68.elf network.elf
    $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
                  --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtpMcp.so \
                  --model ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model_libs/x86_64-linux-clang/libInception_v3_quantized.so \
                  --config_file graph_prepare.json \
                  --binary_file Inception_v3_quantized_qpc.serialized
    Copy to clipboard

## P points

### Overview

P points are an advanced O=3 optimization feature that may yield better
performance for your model. They are available exclusively when `O=3`
(optimization level 3) is enabled and give you a way to experiment with
non-default compiler configurations when preparing a context binary.

Each P value selects a different pre-defined point in the compiler’s
internal configuration space, adjusting parameters that affect how the
HTP executes operations — for example, tradeoffs between latency and
DRAM bandwidth.

The best P point for a given model would depend on your latency requirements
and DDR bandwidth and can be chosen by experimenting across all
possible P points. There is no universal “best” P value, as the right choice
depends on the characteristics of your network.

Once a graph compiles successfully with a P point, the execution output
is bit-accurate with a graph compiled without P points.

### Valid values

Valid values for P point are: 0 (default value that does not change any
parameters), 1, 2, 3, 4, 5, 6, 8, 13, 15, 16, 17, 19, 20, 21, 22, 23.

In other words, the valid values are 0 to 23 with the following values
**excluded**: 7, 9, 10, 11, 12, 14, 18.

Note

The set of valid P point values and the behavior of each value may
change from release to release. Always re-validate your chosen P value
when upgrading to a new SDK version.

### Workflow: where P points fit in the lifecycle

P points affect the **offline graph preparation** step only. The
workflow is:

1. **Convert your model** to a QNN model (`.so` or `.bin`) using
`qnn-tensorflow-converter`, `qnn-onnx-converter`, or equivalent.
*(P points have no effect at this stage.)*
2. **Prepare the context binary** using `qnn-context-binary-generator`
on x86, with `O=3` and your chosen `P` value in the HTP backend
extensions config. This is where P points take effect — they change
how the compiler generates the serialized context binary.
3. **Test the prepared graph** on your target hardware before deploying
to production. Because P point behavior can vary by network, always
validate functional correctness and performance after changing P.
4. **Deploy** the context binary to the target device. At runtime, the
P value has already been baked into the binary — no runtime flag is
needed.

### Caveats and warnings

1. **Always test before deploying.** When a model is prepared using a
P point, the prepared graph must be validated for both performance and
functional correctness before production deployment.

    - The supported set of P point values may change from release to
release.
    - The behavior of a given P point value may change from release to
release.
    - A given network may fail to prepare or execute correctly for a
given P point value.
2. **P points are independent.** Unlike O levels, where higher values
generally yield better performance at the cost of longer compile time,
there is no ordering or relationship between P point values. P=2 is
not “better” or “worse” than P=1 in any general sense.
3. **P only takes effect when O=3.** Setting `P` without `O=3` has
no effect.
4. **Specifying more than one P point results in undefined behavior.**
The outcome is not guaranteed and may change across releases.
5. **Performance varies by network.** A P value that improves one model
may have no effect or degrade performance on another. See
[Sample performance difference between two networks](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_backend.html#htp-p-points-optimization-figure)
as reference.

**Sample performance difference between two networks**

![Sample performance difference between two networks](data:image/png;base64,UklGRhIVAABXRUJQVlA4TAYVAAAvHARcACY807a9biTnL501khAiZIiQIUKGDBkiRMiQIUKECBkyRMiQIUKECJGh6pzvnPMBDR6c7zSWpyjvHdYfaG3EKo68O/JrGgqFKhQDyEsolAoBa0jKq2eiiSAzllWYmnUR2VXy0npvxlCe63ezV47DKkby0ppMtd7kXb2e8uqoNabHxiIUTdocRd5BPloDmfW78t51QxNtFSnvvemXVfJeIhmNWUepqjkTdK3jJjK3beuD7V+63+/nMAzDMAyHDw7DYRgOw+GDYTgMwzC7ZWzbnEnWX3r3fDYGg4XBYLCxsLGwMRhsDDYWNjY2BoOFseo/LVaSbLfN2YTIJa6MJdxXvHztcPIZosc3DN1/uv90/+n+0/2n+0/3X3wAHxvn4YBqYNrIHODyU30LUTkBNEFD23mumTgkX3FZ7q9YXU5FBWcLwIE6smsBv06gVRYvxFlfUI8Wmk6T7XWvCgBgQai3pfWeJbWqf4CSQ5GOCiE+1gSHYy6i2X2aABtq6HNKD01HCwDwRFOp6pGtAU7zRLNqf0c2+4M7bdIA2F3mTPAmj2Zt4ohBmmcwZ9GAHlquTiValCwo/upivq0NN1kMwHrk1jRLiDkQjUStg5VzEIWhKtwrBDmbtVgdoNtuxacLkZjqTewpcAe6nV4NsbbFmTrsvuryqbNCQt1z/QMaAJdXW0YDukGIfLYfyoFD6u2r2SSABeSS1xBzjQQ2SKC5POLwUa388+ICWtwn0r5PiwMIw2G1JC3AzkXkSZ5VOIT25Big1gmI2XCIbWIE131YLAAAnPYVWczUgf2vy98qi+AIzJbZf7MKc+6V+wcIEc42wV3yLPFgRWQF7KiB7kgjsEezJtW6kQjN5dBgBQfVCSG+dQkWJO89SngT+43+AelTzQnLhUbcpRBX44s7bULXoLS/ePKzocn2wFgYpDW3BHU4e3kA9Yf4dKFvN4uHFdJLaFvHb9XhCKjTnWuG+NkyrJBeDUCQahU4jJYEkc42D6DRtkBbuMg/WCG4R561Hy00nPJqK6j1MlSralvG3cEI7RTO4JsREbeMu/2U1mcvSNQmhtMp4dQyyPtk4omLEG1P/4DA6dQQpnX2cQLvkp4HfyvKSjYpLODjEJs/Gr2cBgBbPuKQ/Y8LIp1t3qg7jpLwOytR47n88Iw902j7NPUuFuT2bKqVuacO0itwzwSikZd+kkoIyprNsF0wZzZUJGrZstO8xFRyMEjsJ7WLDdRD89/4vOH7l4QA7q8FCPAevpsg0tmWfqaUV28pY0erUe/i+Y4eKyTUXK1CL4Uz+D5BjlaOQwlUv8P++9hxhdYs5IXnLag50B0aodm/hrzWzUAOK6RXA4DOFrYHdfIilW1qb2EvW7FVjBZxWqxa9RXOcLXEq8L3ITlkVPs78iwMbuFcCVQ92eUG3vGqolPNdEpW3hyDv6fYFhm9nBYgW3aHhxfpbMuaC3EsFCfp9Q+oVvjF/U2SeM8b/snzg8slOHuAGwaHk79xxl3SLJ43HHSY08tVvw9N+wO44ddhkMOhBGcY8nBoQf6f5w2Tr2tdZ+kILUBytw1cL0jUIp1tefwLKMVJyo3WcajZDTzoMEinEN0zz4IwyCCsZciXt3DiZPiXEp4Ao5fjw2PxIJ92SiEFDPC1Z3DY3GHIbYte15oYtkJ+VwOAW8YhUYt0timgn/eMBbHtMspd+OmoV+hemUxJ+7rsErdJwLZ11scwSzjoMOGXy9rSpQLVEdFP5d860+0I1zORinB/4jmHpBDXHg7Z/7jorHXUm1wLYL4tL8LZpoLZEQvIm4yGFdERmk8XxjcM3X+6/3Qvebf81v9Pa19+Y3bZYCDZcp7CQyGdkpSDQUlktQLtp7RiXwHOPk69Z3uCEwAA1BlqjhqcHhGUFzS8S+pCaLuc5a3jsnXZcp7ygjLbKhpAA7qxzgrm0sjBIMsLt2xddGVbJcsjhrh8q89caq9WJNyydWwG8aEE31ZHyVYL6LZbbn+MmM4cX++Ow2Y355TAdHf+dQDrESCPMqXL+goAkKiR3tPx5PvCkVYDXhUALvrnDwDA2ccJgnM6NW0Du2Hf6yyOumxdxLNNnk/nMTLZVrGCNO84Rw40JB7a9A8IbCCLA4h+6Qw9Q7yiuHTpMhlZSwAxNQo2pLoAAIA6SC97oy9bF/Fsi6n0bEuE5EiHUwN59Ym9ZWwVipHtT0znJ73ZSoF9WwVpf3FhkUn7Uf1tYYirfdVxB1SmdJn0uaT5tke2hGcrlps7UZxqQtEjoxTkVwDv5drA1Y96RYx8tsWn1BHKtooFR7aE15B8wCkgqmafkE5hX2H7BnI6MqXLpGXL5LcAoUAsDLwDBll7l3DK1kU822IRyraK3aVjddsFYpHo9+fX/gWAvEQUcryJkys1MkcypcukVeFeKEtyWlXmFlbZuohnWywi2VazREUtvfvIANA50CDJlC6T44mmKQirbF3Esy3HE03Nuy4m90Czf7YiZstyPsKaQNQoW4G8dOkyGWucwXV5Qwuah/6BSGdbgnS2Vbikb6mw0bcw+uOlZMfqDXcEOcndfJ3cx384QF66dJmM5w3PPx1v8nAg7YCn/Y0msfFt/Eaksy0hQtlWte/MXCYqYFKdaooLlzkscyfyKFO6rArkOPs4RGePRFkyx7WT/OyWCPK1b6ZHX7Yu0tmWmmWiRp1tVazbbtlbZgkRzRYtT4BuR2g+1WNloXed1rDx3+QotJWfiWTpMjlms+ZtHDa7EZ8uIhlk1TP2R1u2LuLZlhaRbMc3DN1/uv90/3VFbAdcN44h28+7wObo+w9XUuvbsp2zgv2qF/yLoaTWr2X7RsPuYLijeDlFtV4t25utkN/lVxRM1JTU+rRs+3lxAd+MiG2bl35cUa0/y3bys4nKyX2FP+xVUkyX7fiGoftP95/uP/1sPjbOY1Aby7b+oFjYsKPp7vxu0N3gTr+6tiUHANBtd3wAuZrvQ+0pwBwDAVidQjQb30hMYT6lfQEAlx+ISe+sNtkAYocY18gC5BFbF8Oqry58fFON0IBuaFrSKcTAnq62Z9IpvkIIBzTa1lyJEZhjQLYQUpykgZgdT9T6CukUIn7CsRzKXi1IdmR74HVe2+JMZZoG4m4YRLOp7u3FMMcKFx9xleLlEFfgbtT98VRqr010MPT4/9sGg5rQxQ440MAAdeb7D0cqycYXZA997ACzI/+xIVuuTNO23QbfjNny/hjkbNH9twBgThNijsG9zCYNxUmkcs+3z1ghv8sYcJEzcNhqS1idot7lzGmey7/uzj1w5G+3DtaI4FBhvjkz/D31LhbQScgazDEoTobTHw8P4EiHu56JVbiXRrRLdgQq05SsBzXHHvgc2cNTiVeuaDZE445cBDTa6tOFiC8D3su1ImVYneLR7YLsYQU88b1kOhWOgw5Lp4g/6VJ7tSHXmPlDb5NMk1GDlSxmGtCNR+9cnQoD5sCGw6cLH/++4DBa0oQMLRQvh5vh7xnJjYa374u4AyrTYRhaEL9JUxNyB7gu8br9ZBj6CqL/hwB59W9oAX4cEc1Lwt+HAb/1flYH6HaE65moCVnMEJ39rmVWp8IEgjiHT4N2ha0LgKH7D+PIyvDpweorAADsOuuZkivYDjIOZQyx1OpzeRYEMJ/SikhXafyIq1gA9pOoYdK/nsYSIVGTWk2GcShmiBHNsaIntbKqoJRFzygI+599HDNRlMWjVmDZoA4xMbm/z6oLSln0jMprn2zBoa3AzXA3xOzIH/aGEWK80XClpjIoY9EzujPQt2nMEY4QXb3rr326NBFDC0XvtYYK86lwQty3YAVPocqgiEXPqJ+7enCgucQuz/4mTeTqa1yw6sxUnAwrxCv8rNQGRSx6RufIelM892cgOZL+9TU9Ee60SRPZZTHD7xROiLPluyFrKcPqFAWZxmpFT2NpIzgDqEyHE+IciMfZx6kGylj0LHx3vIUV0quFyI40oHt6gb9XWZSx6Fm4kg99OFwXtQ/Eq23mJYuT4d3DqY8NhSx6FoZsWfT9HzQQ5h74ccRH75xPqSpKWfQsPN0N7vA+VkINxAr8i4FkOdfwL+IcPg3ahbouAIbuP4wjK8OnP6hztuymGVDO6nzJ1nnaYoAjv+CAmKwztuympQKYT/H2IaZQnIzFOmPLbhoCxazO18E6/pNvUArSmluCP+yNwaLw25FfY2Y+FWbLbrETO5Z6dChGhZ837Q9GL6fuAvPond9/OJpj4IkWXstusRQzlno8yJYhLKkepLka98CcygvMYqZSW1xaNuhbdou1mLHU48TJBxqm55MeAMirvcCsQLOvw5XpMFp2i7VYsdSjUCO6dZ51EQC/6G2trPoMLcB8KoyW3WIuRiz1KCoD22BPny7MqkAFKE4ideNNsRczlnr8iGOh/pCOaohYhXupPOZY0Zueq1jBU2oCG5Z6TO4ebZNFFONReT9G9rK9lqlMqwiMWOqxbfPSK3uLCe+vSNTUnQ7W8TttQtwMy0YkWnaLidix1KM5Jn6TpqpzjZn+AST/gsNt2S1mYsdSj2i6O78bWBA+56fjHD4N2hU5vYOh+w/jyGp06pESt3JSZ2i/jxUEOe/3BOg+65laiCoAQPc7PKQk9d270yl2BcT6oj72uhPEFP5jg1YCAJYN5pVHTL7NwyGPONRQkid9W9G3+wx4omkiBhGDtG/rgUHBnDoNTOftG5ACYnZcYk2w1s2Q1xCYY3BdcgXaSRqILwNzKlbn7RuQypbxihSXG1rCxUdcpXg5ckl1OneAe6lYnbVvQIoNstGbbxbfSe3pH9AUNrIj/7EhW65M0238TM8b+x8XXf1DN3hX1fljnbRvQOpdzpwWWYG/1xh+bJEzeK9fW52i3aXoOQdK7PKTzLfLYgaWDTrd3/eJJSRtzCwbGgNzDIqTSOd0qUZnQ1xvY783uH+J9SxBZZpCHqVG0jXyLP+v1uCpAH9PQ/oGiZ9JV5k6ad+A4THHwOkeA7UGK/Bbl0ynIqCd5/iBhsrUWfsGDMfjG594pCuMEmoNFjMHGo/euToVriDNO+qnwo+jytRp+wYMQw5uBsQY1BDcaHj7vmQ/eWGoAjF+1lCbOm/fgOEABhUlX4DNvCT8fZhObwzVpyvO4dOgXe7rAmDo/sM4sjJ8+oOiqhZa0jubXcIs7NrvY6OXVlTqUfIAUUGisRZapaZlAlidkjxA1I+oqoXm49Uo1adWSr7/TDoVC7BtqUfJA0T9iKZaaKLR3GPaX1zZMXGpR9V0gGTLg1FUC42U9K46ovTr2TNxqUfVdIBky4NRVQsN+woAR367gcqOhUs9qqcDJFsejJ5aaFI9njTZ24gBVzNuqUfJA0T9iMJaaPVpg1eLBbswbalH6QNE/YjCWmivfXKiFguwaalHqgNE9YjCWmgTJ1diAjYt9UhzgKgfUVcLLfnYK8eIbwrEpqUeqQ4Q1SOqaqHlxG/SjAUYtdQjzQGifqyLWmhh6v6aryphLMCopR4pDpA4h0+Ddv2uC4Ch+w/jyMrw6YjKAQB02x0fIP4segHOPk4zILH07C5HuJ4p2aggewOJ5TA3w1toEQDmU5KNCjK1dAoxsOfrZNIpcvE7p+0FrUGiRiwhl6hJNSrIVCK9dtU1ZuZTNBB3wyAxq0poTFaXHdm+L3GnVJmWR6wbJRHGL4Y8k148euf3H47mGH+HQyXZ+AK8lyOae+ZTfcx+sQMONIjFaJHKa3MCCw4p38CmG4uZSm1xadmgcs8W3TNWyO8i/tXoJLI7cwzuZT61OEm38TbPAKd5JBsVZNSrV+Diw5Vp2l0soJMQsyPLBsPDocJ8c2Zw+UG9y4MDzSUbFWRUQwswn0I6u27wrkIq/7AXWR5mR6AyTefI/36xIdmoILMqQHGSQjolcetBoi0cRleG1Smk2ZBvVJBRmWPFf7jKYVPahWvM/OGHZl7aoCHbqCCrWoG74VqmMk0PEZHlN4YWipfDzfD3NOQaFWRVHazjd9qEuBmWDY3CHeC6xPfVmaRA06ggg7rGTP8AkrFjdxH56lBrmdWpcIBInMOnQXUH6B0M3X8YR1ajU/ef7j/df7r/dP9pSeIbhu4/3X9d7Lt043kFh2tgNjqjGoqoCnPqw5fPPvhblRSjZdscgzOnGQ1utFi8sq+61gXNlp75ez5Gg5cesYBO0rogbsy88umMBr9xpvjjmhe89Mj80xUuLABk6y1ziBMnAwCkUxITgkRNqnHAwYhMR6pbDwcUUeqGRGY6nf1mS98Q0Qjw+TVo9q+S9jKllNmOwF96tGcu6m72Oj20ROPyn1r8cUbznl+vZZE+LY5v2GC0XR6pie3yW6y2y59pYX9Q8p/4Pyi+4c1XcQ5v245vGLr/dP/p/rvYIH2Fs4+j5P7KXVPaWLZZjUOR1sI8NB03zvR8Z30Ms7Brv49FzIFoJGq8mzTctA2sR25Ns4SI1bSvrNYp7wbdzUENwjlfSs7Hz4sLuNMmgZeydETnxZTZzp0dUnN1fgJYQW4ZJfpsq3MSMfkXQ87H9RBTWC4hVsHmCACAXPBwBURFBpK7+weIO6D+AUXFLtnOjdpcBAAY/ST6bKt17rnfwP9ZCT9/hkPZK3Oq+RFXmT+/hDf5FCENVQD7Uf8muXGpUkNcy/y95/q2on9/zsq+x8V3I88bPtBQVOyS7RzADd842fIlyCN1ttX741BRUmSsLbmDISJuGb8Xj1gR+COOTfAeOjqJohHiZ8uwQn4X8YtHvcmV9WpWybbgZw3EO3IReaTNtnp/HOpq/OsAALL+CkQjLxgk5srrK1hwSCXNvTK0ykLzbe+0SVExTLZzxGQmTs4jbbbVur4FACo5etleCc8bvn/pF7mAb0ZFxRrZpkSbbbWug3V81VetRHOqWYXvQ3LICHHtkX5KytnCpfbus32DomKXbEujzLZq5+UBieDneE+Xcg/pBLjhG4vuKX7y/K00cAe8sjePrXhPp/JtyLC5vB52XVRUTJJtCpTZVtuAGGcfh8+7QPpUcy0jfnnLPfOsaDpyhuxlk5yl+baQTikqhsm2NLpsq3f4+XYJs2Az5cXuUUR/8z/yqbMAiRr5BhgqaLo7/zrABzykRMzyi+EMQ1GxSbYpMGS2V0Y/CRV1rF/LdpDW0vJspaao1qtluwoAq1OoqBgu22zGgnDWM1FRMVy24xuG7j/df7r/dP/p/rtYTAg=)

### Setting P points (HTP backend)

P points are set via the `finalize_config` option in the HTP backend
extensions config, which corresponds to
`QNN_HTP_GRAPH_CONFIG_OPTION_FINALIZE_CONFIG` in
`QnnHtpGraph_CustomConfig_t`.

The following example enables `P=1` with `O=3` on the HTP backend:

**htp\_context.json**

{
        "graphs": [
            {
            "graph_names": [
                "<network-name>"
            ],
            "O": 3,
            "vtcm_mb": 8,
            "hvx_threads": 4,
            "finalize_config": {"P": 1}
            }
        ],
        "devices": [
            {
                "device_id": 0,
                "soc_id": 62,
                "dsp_arch": "v68"
            }
        ]
    }
    Copy to clipboard

To prepare a context binary with HTP optimization related parameters, use qnn-context-binary-generator with –config\_file argument and give path to htp\_context.json.

${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
        --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
        --model model.so \
        --binary_file model.serialized \
        --profiling_level basic \
        --config_file htp_context.json
    Copy to clipboard

### Setting P points (HTP MCP backend)

For the HTP MCP (multicore) backend, `finalize_config` is passed
through the `graph_prepare.conf` file and corresponds to
`QNN_HTP_MCP_GRAPH_CONFIG_OPTION_FINALIZE_CONFIG` in
`QnnHtpMcpGraph_CustomConfig_t`.

**graph\_prepare.json**

> 
> 
> 1. graph\_prepare.json:
> 
> 
> 
> {
>         "backend_extensions": {
>             "shared_library_path": "libQnnHtpMcpNetRunExtensions.so",
>             "config_file_path"   : "graph_prepare.conf"
>         }
>     }
>     Copy to clipboard
> 
> 2. graph\_prepare.conf:
> 
> 
> 
> {
>         "graphs": [
>             {
>                 "graph_names": [
>                     "qnn_model"
>                 ],
>                 "O": 3,
>                 "finalize_config": {"P": 1}
>             }
>         ]
>     }
>     Copy to clipboard

Next, to generate the serialized context binary, specify the **graph\_prepare.json** file using the –config\_file flag as follows:

$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
    $ cp ${QNN_SDK_ROOT}/lib/hexagon-v68/unsigned/libQnnHtpMcpV68.elf network.elf
    $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
                  --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtpMcp.so \
                  --model ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model_libs/x86_64-linux-clang/libInception_v3_quantized.so \
                  --config_file graph_prepare.json \
                  --binary_file Inception_v3_quantized_qpc.serialized
    Copy to clipboard

### HTP Performance Estimates

QNN can provide performance estimates for a graph using operator costs (i.e. execution cycle
predictions) to simulate the target hardware. These estimates project the HTP performance
and a confidence level for a graph. The information provided by the estimates is provided
“as is” with no warranty of any kind. Every effort has been made to ensure the accuracy of
the information provided by the estimates.

However, there are no representations being made regarding the use of the estimates provided in terms of its correctness, accuracy vs. silicon,
reliability, or otherwise. The estimates may vary and even show regressions on a per-graph
basis across QNN releases. The information provided by the estimates is provided for
informational purposes only and should not be relied upon for any other purpose.

The following is a non-exhaustive list of assumptions and approximations made for the
performance estimates:

1. Performance estimates may vary across SDK versions for a graph.
2. Each execution cycle prediction of an op is perturbed by an amount that is reflective of
the actual errors we see while training models used for calculating the estimates. The
whole graph is then simulated using the lower and upper execution cycles estimate for
each op to produce the overall lower and upper estimates respectively. The lower and
upper estimates provided by these performance estimates are approximations of the
simulation accuracy vs. target silicon; they are not accuracy error bounds.
3. Performance estimates may assume kernel operator costs for operators which it currently
does not model, which includes all operators derived from ‘custom ops’.
4. Performance estimates assumes burst performance mode, the HTP has full bandwidth to the
DDR and that no other cores are using the DDR during the execution of the graph being
simulated.

> 
> 
> - This is may not be true in reality, as the HTP has to share the DDR with other cores and other devices on the SoC (e.g. CPU, GPU, camera, etc.), which may or may not be active during the execution of the graph.

**Generating performance estimates:**

Generation of performance estimates requires the correct soc\_id in the HTP backend extensions config.
For e.g., the following json file uses soc\_id = 52 which is the soc\_id for the SA8650, SA8775 and SA8255 targets:

**htp\_context.json**

{
        "graphs": [
            {
                "graph_names": [
                    "graph1_name"
                ],
                "vtcm_mb": 8,
                "hvx_threads": 4
            }
        ],
        "devices": [
            {
                "device_id": 0,
                "soc_id": 52,
                "dsp_arch": "v73"
            }
        ]
    }
    Copy to clipboard

When running the qnn-context-binary-generator on HTP backend, specify the profiling level parameter and the htp\_context.json (containing the right soc\_id).

$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
        --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
        --model model.so \
        --binary_file model.serialized \
         --profiling_level basic \
        --config_file htp_context.json
    Copy to clipboard

Passing the HTP profiling reader (–reader libQnnHtpProfilingReader.so) to qnn-profile-viewer is important to
get the correct layout of the performance estimates.

$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-profile-viewer \
        --input_log output/qnn-profiling-data.log \
        --reader libQnnHtpProfilingReader.so
    Copy to clipboard

Similarly, use the following commands for HTP MCP backend:

$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
        --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtpMcp.so \
        --model model.so \
        --binary_file model.serialized \
        --profiling_level basic
    
    $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-profile-viewer \
        --input_log output/qnn-profiling-data.log \
        --reader libQnnHtpMcpProfilingReader.so
    Copy to clipboard

**Performance estimate output:**

- Simulated (Accelerator exec cycles): Simulated execution cycles.
- Simulated (Accelerator exec cycles (lower estimate)): Lower estimate of the simulated execution cycles.
- Simulated (Accelerator exec cycles (upper estimate)): Upper estimate of the simulated execution cycles.
- Bandwidth Stats per HTP in bytes:

> 
> 
> 1. Input Fill: Total reads from DDR for graph input related tensors (weights, bias, activations). This value counts all bytes of operators which do not have predecessors.
>     2. Intermediate Fill : Total reads from DDR for compiler generated data transfer to satisfy VTCM size constraints. This value counts all bytes of compiler generated fill operators which have predecessors and successors and originate on the same HTP.
>     3. Intermediate Spill : Total writes to DDR for compiler generated data transfer to satisfy VTCM size constraints. This value counts all bytes of compiler generated spill operators which have predecessors and successors and originate on the same HTP.
>     4. Inter HTP Fill : Total reads from DDR for fills which were generated by a different HTP core. This value counts all bytes of compiler generated fill operators which do not have a predecessor, but have a successor.
>     5. Inter HTP Spill : Total reads from DDR for spills which were generated by a different HTP core. This value counts all bytes of compiler generated spill operators which do not have a successor, but have a predecessor.
>     6. Output Spill : Total writes to DDR for graph output related tensors. This value counts all bytes of operators which do not have successors.

**Sample profiler output Finalize Stats:**

1. With performance estimates:

Finalize Stats:
    Accelerator (finalize) time : 193364  us
    Performance Estimates :
        Mode : Burst
        Simulated (Accelerator exec cycles) : 991608  cycles
        Simulated (Accelerator exec cycles (lower estimate)) : 921188  cycles
        Simulated (Accelerator exec cycles (upper estimate)) : 1094620  cycles
        Bandwidth Stats :
            HTP ID : 0
                Input Fill : 24524800  bytes
                Intermediate Fill : 0  bytes
                Intermediate Spill : 0  bytes
                Inter HTP Fill : 0  bytes
                Inter HTP Spill : 0  bytes
                Output Spill : 2048  bytes
    Copy to clipboard

2. Without performance estimates:

Finalize Stats:
    Accelerator (finalize) time : 193364  us
    Copy to clipboard

Last Published: Jun 04, 2026

[Previous Topic
Control which HTP device executes a model](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/htp_auto_single_nsp.md) [Next Topic
QNN HTP Shared Buffer Tutorial](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/htp_shared_buffer_tutorial.md)