# Pose estimation and encode with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-encode.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-encode.html)

The use cases implement the HRNet LiteRT model to process a single camera stream with
        pose estimation and encode the stream as an H.264 bitstream.

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

## Use qtivoverlay plugin to apply pose estimation overlay

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! v4l2h264enc capture-io-mode=4 output-io-mode=5 ! \
    h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/hrnet_pose_quantized.tflite ! queue ! \
    qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify poses of people in the scenes from the video stream coming through the
                    camera source.
2. Overlay the available poses using overlaylib.
3. Encode the stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and store it as an MP4 file.

Figure : Pipeline for pose estimation and encode using qtioverlay
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1179.272262573244006" height="357.535699844360352" viewbox="0 0 1179.272262573244006 357.535699844360352">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".499923706054688" y=".500055313110352" width="1178.27197265625" height="356.53515625" rx="7.499999999999885" ry="7.499999999999885" style="fill: #fafafa;"></rect>
      <path d="M1171.272262573244006,1c3.85986328125,0,7,3.140153884887695,7,7v341.535699844360352c0,3.859832763671875-3.14013671875,7-7,7H8c-3.859846115112305,0-7-3.140167236328125-7-7V8c0-3.859846115112305,3.140153884887695-7,7-7h1163.272262573244006M1171.272262573244006,0H8C3.581692695617676,0,0,3.581846237182617,0,8v341.535699844360352c0,4.41815185546875,3.581692695617676,8,8,8h1163.272262573244006c4.418334960941138,0,8-3.58184814453125,8-8V8c0-4.418153762817383-3.581665039058862-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(989.016128540039062 333.627191543579102)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="968.765087611898707" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1087.597915649414062 333.627191543579102)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1067.346869024755506" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000030883573345" y="20.00006906611361" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
        <text transform="translate(53.429763793945312 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068092529188107" y="20.00006906611361" width="120" height="50" rx="3.999999999999998" ry="3.999999999999998" style="fill: #007884;"></rect>
        <text transform="translate(214.978309631347656 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068092529188107" y="96.841128549261157" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595199584960938 125.348001480102539)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.5340576171875" y1="45.000070571899414" x2="160.510696411132812" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489410400390625 48.490503311157227 165.5340576171875 45.000070571899414 159.489410400390625 41.509637832641602 159.489410400390625 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="311.602123351993214" y="20.00006906611361" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
        <text transform="translate(328.89906120300293 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
      </g>
      <g>
        <line x1="286.068099975585938" y1="45.000070571899414" x2="306.044723510743097" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023452758789062 48.490503311157227 311.068099975585938 45.000070571899414 305.023452758789062 41.509637832641602 305.023452758789062 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="457.13615417480105" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(482.573715209960938 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
      </g>
      <g>
        <line x1="431.602127075195312" y1="45.000070571899414" x2="451.578750610351562" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="450.557479858398438 48.490503311157227 456.602127075195312 45.000070571899414 450.557479858398438 41.509637832641602 450.557479858398438 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="602.670184997608885" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(617.146804809570312 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <line x1="577.136154174804688" y1="45.000070571899414" x2="597.112777709960938" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="596.091476440429688 48.490503311157227 602.136154174804688 45.000070571899414 596.091476440429688 41.509637832641602 596.091476440429688 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="748.204215820416721" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(770.411300659179688 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="722.670211791992188" y1="45.000070571899414" x2="742.646835327148438" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="741.625534057617188 48.490503311157227 747.670211791992188 45.000070571899414 741.625534057617188 41.509637832641602 741.625534057617188 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="893.738246643224556" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(922.359420776367188 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="868.204208374023438" y1="45.000070571899414" x2="888.180831909179688" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="887.159591674804688 48.490503311157227 893.204208374023438 45.000070571899414 887.159591674804688 41.509637832641602 887.159591674804688 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="1039.272277466034211" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(1073.838790893554688 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="1013.738204956054688" y1="45.000070571899414" x2="1033.714889526365369" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1032.693649291994006 48.490503311157227 1038.738204956054688 45.000070571899414 1032.693649291994006 41.509637832641602 1032.693649291994006 48.490503311157227"></polygon>
      </g>
      <g>
        <line x1="226.068099975585938" y1="70.461648941040039" x2="226.068099975585938" y2="90.438287734985352" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 89.416994094848633 226.068099975585938 95.461648941040039 229.558517456054688 89.416994094848633 222.577667236328125 89.416994094848633"></polygon>
      </g>
      <g>
        <rect x="146.068092529188107" y="173.776685994341278" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(191.888168334960938 202.283563613891602)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="226.068099975585938" y1="147.397211074829102" x2="226.068099975585938" y2="167.373849868774414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 166.352548599243164 226.068099975585938 172.397211074829102 229.558517456054688 166.352548599243164 222.577667236328125 166.352548599243164"></polygon>
      </g>
      <g>
        <rect x="146.068092529188107" y="251.535640314990815" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(163.790512084960938 280.042520523071289)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="226.068099975585938" y1="225.156167984008789" x2="226.068099975585938" y2="245.132806777954102" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 244.111505508423761 226.068099975585938 250.156152725219727 229.558517456054688 244.111505508423761 222.577667236328125 244.111505508423761"></polygon>
      </g>
      <g>
        <polyline points="306.068099975585938 276.535638809204102 371.602127075195312 276.535638809204102 371.602127075195312 75.485010147094727" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="375.092544555664062 76.506303787231445 371.602127075195312 70.461648941040039 368.111679077148438 76.506303787231445 375.092544555664062 76.506303787231445"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-pose-estimation-and-encode__ol_m2f_zgm_vbc"><br>                                            <li class="li">One stream is sent to the qtimetamux plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-pose-estimation-and-encode__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The HRNet model uses this tensor stream for<br>                                            inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_bwn_s5l_vbc"><br>                                    <li class="li">Loads the HRNet model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        pose estimation results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_gr1_w5l_vbc"><br>                                    <li class="li">Receives the inference tensors from a HRNet model on its<br>                                        sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later. </li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the pose estimation<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-pose-estimation-and-encode__ol_lyh_txn_vbc"><br>                                            <li class="li">Loads the HRNet submodule.</li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives the video stream and text stream with pose results<br>                                        corresponding to video stream on its sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with the contents of video stream on<br>                                        its sink pad.</li><br><br>                                    <li class="li">Adds poses from data sinkpad to GST buffer meta (meta<br>                                        muxing) on its source pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the poses on the VideoFrame using CL. </li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_b25_j14_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad. </li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives the buffers and creates containers format specification<br>                                buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Use qtivcomposer to mix original frame with pose estimation mask

Run the use case on the target
                device:

    gst-launch-1.0 -e --gst-debug=2 \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=/etc/models/hrnet_pose_quantized.tflite ! queue ! qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json \
     ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Classify scenes from the video stream coming through a camera source.
2. Compose the poses and video stream together using qtivcomposer.
3. Encode this stream as an H.264 bitstream.
4. Multiplex in an MP4 container and storing it as an MP4 file.

Figure : Pipeline for pose estimation and encode using qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1053.934104919439051" height="355.535463333129883" viewbox="0 0 1053.934104919439051 355.535463333129883">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500267028808594" y=".499940872192383" width="1052.93359375" height="354.53515625" rx="7.499999999999957" ry="7.499999999999957" style="fill: #fafafa;"></rect>
      <path d="M1045.934104919439051,1c3.85986328125,0,7,3.140132904052734,7,7v339.535463333129883c0,3.85986328125-3.14013671875,7-7,7H8c-3.859870910644531,0-7-3.14013671875-7-7V8c0-3.859867095947266,3.140129089355469-7,7-7h1037.934104919439051M1045.934104919439051,0H8C3.581733703613281,0,0,3.581731796264648,0,8v339.535463333129883c0,4.41827392578125,3.581733703613281,8,8,8h1037.934104919439051c4.418212890619543,0,8-3.58172607421875,8-8V8c0-4.418268203735352-3.581787109380457-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(863.426353454589844 331.626955032348633)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="843.175314784286456" y="319.535463333129883" width="16" height="16" rx="1.999999999999986" ry="1.999999999999986" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(962.008140563964844 331.626955032348633)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="941.757096197143255" y="319.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <rect x="20.000055258294196" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
    <text transform="translate(53.429786682128906 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
    <rect x="166.068116903908958" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
    <text transform="translate(214.97833251953125 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
    <rect x="146.068116903908958" y="96.841006478947747" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
    <text transform="translate(170.595218658447266 125.347871780395508)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
    <g>
      <line x1="140.000053405761719" y1="44.999948501586914" x2="160.510719299316406" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="159.489433288574219 48.490381240844727 165.534080505372003 44.999948501586914 159.489433288574219 41.509515762329102 159.489433288574219 48.490381240844727"></polygon>
    </g>
    <rect x="312.064964669495566" y="19.999946995800201" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
    <text transform="translate(333.205726623535156 49.675605297088623)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
    <g>
      <line x1="286.068107604980469" y1="44.999948501586914" x2="306.044761657714844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="305.023460388183594 48.490381240844727 311.068107604980469 44.999948501586914 305.023460388183594 41.509515762329102 305.023460388183594 48.490381240844727"></polygon>
    </g>
    <g>
      <line x1="452.064964294433594" y1="44.999948501586914" x2="472.041587829589844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="471.020286560058594 48.490381240844727 477.064964294433594 44.999948501586914 471.020286560058594 41.509515762329102 471.020286560058594 48.490381240844727"></polygon>
    </g>
    <rect x="477.331980080897665" y="19.999946995800201" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
    <text transform="translate(491.808616638183594 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
    <rect x="622.866010903704591" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
    <text transform="translate(645.073081970214844 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
    <g>
      <line x1="597.331993103027344" y1="44.999948501586914" x2="617.308616638183594" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="616.287315368652344 48.490381240844727 622.331993103027344 44.999948501586914 616.287315368652344 41.509515762329102 616.287315368652344 48.490381240844727"></polygon>
    </g>
    <rect x="768.400041726512427" y="19.999946995800201" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
    <text transform="translate(797.021202087402344 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
    <g>
      <line x1="742.865989685058594" y1="44.999948501586914" x2="762.842674255371094" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="761.821372985839844 48.490381240844727 767.865989685058594 44.999948501586914 761.821372985839844 41.509515762329102 761.821372985839844 48.490381240844727"></polygon>
    </g>
    <rect x="913.934072549323901" y="19.999946995800201" width="120.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
    <text transform="translate(948.500633239746094 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
    <g>
      <line x1="888.400047302246094" y1="44.999948501586914" x2="908.376670837402344" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="907.355369567871094 48.490381240844727 913.400047302246094 44.999948501586914 907.355369567871094 41.509515762329102 907.355369567871094 48.490381240844727"></polygon>
    </g>
    <g>
      <line x1="226.068107604980469" y1="70.461526870727539" x2="226.068107604980469" y2="90.438165664671942" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="222.577690124511719 89.416872024536133 226.068107604980469 95.461526870727539 229.558555603027344 89.416872024536133 222.577690124511719 89.416872024536133"></polygon>
    </g>
    <rect x="146.068116903908958" y="173.776563924028778" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
    <text transform="translate(191.888187408447266 202.283449172973633)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
    <g>
      <line x1="226.068107604980469" y1="147.39708137512207" x2="226.068107604980469" y2="167.373720169067383" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="222.577690124511719 166.352434158325195 226.068107604980469 172.39708137512207 229.558555603027344 166.352434158325195 222.577690124511719 166.352434158325195"></polygon>
    </g>
    <rect x="146.068116903908958" y="249.535518244678315" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
    <text transform="translate(163.790531158447266 278.042390823364258)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
    <g>
      <line x1="226.068107604980469" y1="223.776567459106445" x2="226.068107604980469" y2="243.753206253051758" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="222.577690124511719 242.731904983520508 226.068107604980469 248.776552200317383 229.558555603027344 242.731904983520508 222.577690124511719 242.731904983520508"></polygon>
    </g>
    <g>
      <polyline points="306.068107604980469 274.535524368286133 382.064964294433594 274.535524368286133 382.064964294433594 75.484888076782227" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
      <polygon points="385.555381774902344 76.506181716918945 382.064964294433594 70.461526870727539 378.574546813964844 76.506181716918945 385.555381774902344 76.506181716918945"></polygon>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_gxv_t14_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-pose-estimation-and-encode__ol_hxv_t14_vbc"><br>                                            <li class="li">One stream is sent to the qtivcomposer plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_ixv_t14_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-pose-estimation-and-encode__ul_jxv_t14_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The HRNet model uses this tensor stream for<br>                                            inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_kxv_t14_vbc"><br>                                    <li class="li">Loads the HRNet model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        pose estimation results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_lxv_t14_vbc"><br>                                    <li class="li">Receives the inference tensors from a HRNet model on its<br>                                        sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later. </li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the pose estimation<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-pose-estimation-and-encode__ol_mxv_t14_vbc"><br>                                            <li class="li">Loads the HRNet submodule.</li><br><br>                                            <li class="li">Produces results as video frames with poses<br>                                                drawn.</li><br><br>                                            <li class="li">Sends them to the sinkpad of the qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
|  | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_wjj_x14_vbc"><br>                                    <li class="li">Receives the original video stream and video stream of poses<br>                                        on its sinkpads.</li><br><br>                                    <li class="li">On its sourcepad, produces the GST buffers with the contents<br>                                        composed of video streams from its sinkpads.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_oxv_t14_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the poses on the VideoFrame using CL.</li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-pose-estimation-and-encode__ol_pxv_t14_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream that<br>                                        it's receiving on its sinkpad. </li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bit stream to the GStreamer<br>                                buffer meta. |
| mp4mux | Receives the buffers and creates containers format specification<br>                                buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

**Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/tensorflow-lite-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Pose estimation and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-pose-estimation-and-display.md) [Next Topic
Video super resolution and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/video-super-resolution-and-display-with-litert.md)