# Pose estimation and display with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-display.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-display.html)

The use cases implement the HRNet LiteRT model to process a single camera stream with
        pose estimation.

## Use qtivoverlay plugin to apply pose estimation overlay

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! waylandsink fullscreen=true sync=false \
    split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/hrnet_pose_quantized.tflite ! queue ! \
    qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify poses of people in scenes from a video stream, which is coming through
                    a camera source.
2. Overlay the available poses using overlaylib.
3. Display the results.

Figure : Pipeline for pose estimation overlay
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="940" height="349.974597930908203" viewbox="0 0 940 349.974597930908203">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".5" y=".499774932861328" width="939" height="348.974609375" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
      <path d="M932,1c3.859741210939319,0,7,3.140233993530273,7,7v333.974597930908203c0,3.8597412109375-3.140258789060681,7-7,7H8c-3.859771728515625,0-7-3.1402587890625-7-7V8c0-3.859766006469727,3.140228271484375-7,7-7h924M932,0H8C3.581771850585938,0,0,3.581764221191406,0,8v333.974597930908203c0,4.418212890625,3.581771850585938,8,8,8h924c4.418334960939319,0,8-3.581787109375,8-8V8c0-4.418235778808594-3.581665039060681-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(741.492431640625 326.066074371337891)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="721.241394846414551" y="313.974597930908203" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(840.07421875 326.066074371337891)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="819.823176259271349" y="313.974597930908203" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <g>
          <rect x="20" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
          <text transform="translate(73.429725646972656 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
        </g>
        <g>
          <line x1="180" y1="45.000003814697266" x2="199.976654052734375" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="198.955352783203125 48.490436553955078 205 45.000003814697266 198.955352783203125 41.509574890136719 198.955352783203125 48.490436553955078"></polygon>
        </g>
        <g>
          <line x1="285" y1="68.974582672119141" x2="285" y2="88.951221466064453" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="281.50958251953125 87.929927825927734 285 93.974582672119141 288.49041748046875 87.929927825927734 281.50958251953125 87.929927825927734"></polygon>
        </g>
        <g>
          <rect x="205" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
          <text transform="translate(273.910198211669922 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
        </g>
        <g>
          <rect x="205" y="93.974581043215949" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(229.527484893798828 123.650539398193359)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
        </g>
        <g>
          <line x1="285" y1="143.974582672119141" x2="285" y2="163.951221466064453" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="281.50958251953125 162.929920196533203 285 168.974582672119141 288.49041748046875 162.929920196533203 281.50958251953125 162.929920196533203"></polygon>
        </g>
        <g>
          <rect x="205" y="168.974581043215949" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(250.820453643798828 198.650547027587891)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
        </g>
        <g>
          <line x1="285" y1="218.974582672119141" x2="285" y2="238.951221466064453" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="281.50958251953125 237.929920196533203 285 243.974567413330078 288.49041748046875 237.929920196533203 281.50958251953125 237.929920196533203"></polygon>
        </g>
        <g>
          <rect x="205" y="243.974581043215949" width="160" height="49.999999999998181" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(222.722797393798828 273.650547027587891)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
        </g>
        <g>
          <line x1="365" y1="45.000003814697266" x2="384.97662353515625" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="383.955322265625 48.490436553955078 390 45.000003814697266 383.955322265625 41.509574890136719 383.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect x="390" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(427.296905517578125 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
        </g>
        <g>
          <line x1="550" y1="45.000003814697266" x2="569.97662353515625" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="568.955322265625 48.490436553955078 575 45.000003814697266 568.955322265625 41.509574890136719 568.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect x="575" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(620.437530517578125 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
        </g>
        <g>
          <line x1="735" y1="45.000003814697266" x2="754.97662353515625" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="753.955322265625 48.490436553955078 760 45.000003814697266 753.955322265625 41.509574890136719 753.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect x="760" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
          <text transform="translate(796.09771728515625 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">waylandsink</tspan></text>
        </g>
      </g>
      <g>
        <polyline points="365 271.338306427001953 470 271.338306427001953 470 74.922946929931641" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="473.49041748046875 75.944240570068359 470 69.899585723876953 466.50958251953125 75.944240570068359 473.49041748046875 75.944240570068359"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ol class="ol" type="a" id="single-camera-stream-with-pose-estimation-and-display__ol_m2f_zgm_vbc"><br>                                            <li class="li">One is sent to the qtimetamux plugin to retain the<br>                                                video stream.</li><br><br>                                            <li class="li">The other is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-pose-estimation-and-display__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The HRNet model uses this tensor stream for<br>                                            inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_bwn_s5l_vbc"><br>                                    <li class="li">Loads the HRNet model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        pose estimation results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_gr1_w5l_vbc"><br>                                    <li class="li">Receives the inference tensors from a HRNet model on its<br>                                        sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the pose estimation<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-pose-estimation-and-display__ol_lyh_txn_vbc"><br>                                            <li class="li">Loads the HRNet submodule.</li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives the video and text streams with pose results<br>                                        corresponding to the video stream on its sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with the contents of video stream on<br>                                        its sink pad.</li><br><br>                                    <li class="li">Adds poses from data sinkpad to GST buffer meta (meta<br>                                        muxing) on its source pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the poses on the VideoFrame using CL.</li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70022-50/topic/waylandsink.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_pxz_4xn_vbc"><br>                                    <li class="li">Receives the video stream on its sinkpad.</li><br><br>                                    <li class="li">Submits the video stream to Weston.</li><br><br>                                    <li class="li">Weston renders the following on a local display device:<ol class="ol" type="a" id="single-camera-stream-with-pose-estimation-and-display__ol_uxg_yxn_vbc"><br>                                            <li class="li">The video stream captured from the camera.</li><br><br>                                            <li class="li">The poses generated for several people in that<br>                                                scene.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |

##  Use qtivcomposer to mix original frame with pose estimation mask

Run the use case on the target
                device:

    gst-launch-1.0 -e --gst-debug=2 \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer ! queue ! waylandsink fullscreen=true sync=false \
    split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=/etc/models/hrnet_pose_quantized.tflite ! queue ! qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json \
     ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify poses of people in the scenes from a video stream, which is coming
                    through a camera source.
2. Compose the poses and video stream using qtivcomposer.
3. Display the results.

Figure : Pipeline for pose estimation mask using qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="755" height="349.974590301513672" viewbox="0 0 755 349.974590301513672">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".5" y=".499797821044922" width="754" height="348.974609375" rx="7.499999999999944" ry="7.499999999999944" style="fill: #fafafa;"></rect>
      <path d="M747,1c3.85980224609375,0,7,3.140201568603516,7,7v333.974590301513672c0,3.85980224609375-3.14019775390625,7-7,7H8c-3.85980224609375,0-7-3.14019775390625-7-7V8c0-3.859798431396484,3.14019775390625-7,7-7h739M747,0H8C3.581695556640625,0,0,3.581701278686523,0,8v333.974590301513672c0,4.418304443359375,3.581695556640625,8,8,8h739c4.41827392578125,0,8-3.581695556640625,8-8V8c0-4.418298721313477-3.58172607421875-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(564.54132080078125 326.066082000732422)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="544.290339966666579" y="313.974590301513672" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(663.123138427734375 326.066082000732422)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="642.872121379523378" y="313.974590301513672" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(73.429729461669922 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <line x1="180" y1="45.000011444091797" x2="199.976654052734375" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="198.955352783203125 48.490444183349609 205 45.000011444091797 198.955352783203125 41.50958251953125 198.955352783203125 48.490444183349609"></polygon>
      </g>
      <g>
        <line x1="285" y1="68.974590301513672" x2="285" y2="88.951229095458984" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 87.929935455322266 285 93.974590301513672 288.49041748046875 87.929935455322266 281.50958251953125 87.929935455322266"></polygon>
      </g>
      <g>
        <rect x="205" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(273.910202026367188 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="205" y="93.97459047333632" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(229.527496337890625 123.650547027587891)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="285" y1="143.974590301513672" x2="285" y2="163.951229095458984" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 162.929943084716797 285 168.974590301513672 288.49041748046875 162.929943084716797 281.50958251953125 162.929943084716797"></polygon>
      </g>
      <g>
        <rect x="205" y="168.97459047333632" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(250.820465087890625 198.650554656982422)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="285" y1="218.974590301513672" x2="285" y2="238.951244354248047" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 237.929943084716797 285 243.974590301513672 288.49041748046875 237.929943084716797 281.50958251953125 237.929943084716797"></polygon>
      </g>
      <g>
        <rect x="205" y="243.97459047333632" width="160" height="49.999999999999091" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(222.722808837890625 273.650554656982422)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="365" y1="45.000011444091797" x2="384.97662353515625" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="383.955322265625 48.490444183349609 390 45.000011444091797 383.955322265625 41.50958251953125 383.955322265625 48.490444183349609"></polygon>
      </g>
      <g>
        <rect x="390" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(421.140655517578125 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="550" y1="45.000011444091797" x2="569.97662353515625" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="568.955322265625 48.490444183349609 575 45.000011444091797 568.955322265625 41.50958251953125 568.955322265625 48.490444183349609"></polygon>
      </g>
      <g>
        <rect x="575" y="20.000014873722648" width="160" height="50" rx="4.000000000000019" ry="4.000000000000019" style="fill: #007884;"></rect>
        <text transform="translate(611.09771728515625 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">waylandsink</tspan></text>
      </g>
    </g>
    <g>
      <polyline points="365 268.974590301513672 470 268.974590301513672 470 75.847972869873047" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
      <polygon points="473.49041748046875 76.869258880615234 470 70.824611663818359 466.50958251953125 76.869258880615234 473.49041748046875 76.869258880615234"></polygon>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_gqx_gzn_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-pose-estimation-and-display__ol_hqx_gzn_vbc"><br>                                            <li class="li">One is sent to the qtimetamux plugin to retain the<br>                                                video stream.</li><br><br>                                            <li class="li">The other is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_iqx_gzn_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-pose-estimation-and-display__ul_jqx_gzn_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The HRNet model uses this tensor stream for<br>                                            inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_kqx_gzn_vbc"><br>                                    <li class="li">Loads the HRNet model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        pose estimation results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_lqx_gzn_vbc"><br>                                    <li class="li">Receives the inference tensors from a HRNet model on its<br>                                        sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the pose estimation<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-pose-estimation-and-display__ol_mqx_gzn_vbc"><br>                                            <li class="li">Loads the HRNet submodule.</li><br><br>                                            <li class="li">Produces results as video frames with poses<br>                                                drawn.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | <ol class="ol"><br>                                    <li class="li"> Receives the original video stream and the video stream of<br>                                        poses on its sinkpads.</li><br><br>                                    <li class="li">On its sourcepad, produces the GST buffers with the contents<br>                                        composed of video streams from its sinkpads.</li><br><br>                                </ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70022-50/topic/waylandsink.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-display__ol_pqx_gzn_vbc"><br>                                    <li class="li">Receives the video stream on its sinkpad.</li><br><br>                                    <li class="li">Submits the video stream to Weston.</li><br><br>                                    <li class="li">Weston renders the following on a local display device:<ol class="ol" type="a" id="single-camera-stream-with-pose-estimation-and-display__ol_qqx_gzn_vbc"><br>                                            <li class="li">The video stream captured from the camera.</li><br><br>                                            <li class="li">The poses generated for several people in that<br>                                                scene.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |

**Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/tensorflow-lite-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Image segmentation and encode with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-segmentation-and-encode.md) [Next Topic
Pose estimation and encode with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-pose-estimation-and-encode.md)