# Object detection and encode with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-encode.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-encode.html)

The use cases use a YOLOv5 LiteRT model to identify the object in a scene. The use
        case is to either overlay or compose the bounding boxes over the detected objects, and then
        encode this stream as an H.264 bitstream.

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

## Use qtivoverlay plugin to apply bounding box overlay

Run the use case on the target device:

    gst-launch-1.0 -e qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split split. ! \
    queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! \
    queue ! filesink location=/etc/media/video.mp4 split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external \
    external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/yolov5.tflite ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov5 labels=/etc/labels/yolov5.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case,  use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify the object scenes from a video stream, which is coming through a camera
                    source.
2. Overlay bounding boxes over the detected objects using overlaylib.
3. Encode the stream as a H.264 bitstream.
4. Multiplex the stream in an MP4 container and stored as an MP4 file.

Figure : Pipeline for bounding box overlay and encode
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1180" height="357.535699844360352" viewbox="0 0 1180 357.535699844360352">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".5" y=".500055313110352" width="1179" height="356.53515625" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
      <path d="M1172,1c3.85986328125,0,7,3.140153884887695,7,7v341.535699844360352c0,3.859832763671875-3.14013671875,7-7,7H8c-3.859846115112305,0-7-3.140167236328125-7-7V8c0-3.859846115112305,3.140153884887695-7,7-7h1164M1172,0H8C3.581692695617676,0,0,3.581846237182617,0,8v341.535699844360352c0,4.41815185546875,3.581692695617676,8,8,8h1164c4.418334960939319,0,8-3.58184814453125,8-8V8c0-4.418153762817383-3.581665039060681-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(982.564453125 333.627176284790039)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="962.31348127842648" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1081.146240234375 333.627176284790039)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1060.89526269127964" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.363876708768657" y="20.00006906611361" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
        <text transform="translate(53.793609619140625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.431938354383419" y="20.00006906611361" width="120" height="50" rx="3.999999999999998" ry="3.999999999999998" style="fill: #007884;"></rect>
        <text transform="translate(215.342155456542969 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.431938354383419" y="96.841128549261157" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.95904541015625 125.348001480102539)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.897903442382812" y1="45.000070571899414" x2="160.874542236328125" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.853256225585938 48.490503311157227 165.897903442382812 45.000070571899414 159.853256225585938 41.509637832641602 159.853256225585938 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="311.965969177188526" y="20.00006906611361" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
        <text transform="translate(329.262907028198242 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
      </g>
      <g>
        <line x1="286.43194580078125" y1="45.000070571899414" x2="306.408569335938409" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.387298583984375 48.490503311157227 311.43194580078125 45.000070571899414 305.387298583984375 41.509637832641602 305.387298583984375 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="457.499999999996362" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(482.93756103515625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
      </g>
      <g>
        <line x1="431.965972900390625" y1="45.000070571899414" x2="451.942596435546875" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="450.92132568359375 48.490503311157227 456.965972900390625 45.000070571899414 450.92132568359375 41.509637832641602 450.92132568359375 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="603.034030822804198" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(617.510650634765625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <line x1="577.5" y1="45.000070571899414" x2="597.47662353515625" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="596.455322265625 48.490503311157227 602.5 45.000070571899414 596.455322265625 41.509637832641602 596.455322265625 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="748.568061645612033" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(770.775146484375 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="723.0340576171875" y1="45.000070571899414" x2="743.01068115234375" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="741.9893798828125 48.490503311157227 748.0340576171875 45.000070571899414 741.9893798828125 41.509637832641602 741.9893798828125 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="894.102092468419869" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(922.7232666015625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="868.56805419921875" y1="45.000070571899414" x2="888.544677734375" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="887.5234375 48.490503311157227 893.56805419921875 45.000070571899414 887.5234375 41.509637832641602 887.5234375 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="1039.636123291229524" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(1074.20263671875 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="1014.10205078125" y1="45.000070571899414" x2="1034.078735351560681" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1033.057495117189319 48.490503311157227 1039.10205078125 45.000070571899414 1033.057495117189319 41.509637832641602 1033.057495117189319 48.490503311157227"></polygon>
      </g>
      <g>
        <line x1="226.43194580078125" y1="70.461648941040039" x2="226.43194580078125" y2="90.438287734985352" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.941513061523438 89.416994094848633 226.43194580078125 95.461648941040039 229.92236328125 89.416994094848633 222.941513061523438 89.416994094848633"></polygon>
      </g>
      <g>
        <rect x="146.431938354383419" y="173.776685994341278" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(192.25201416015625 202.283563613891602)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="226.43194580078125" y1="147.397211074829102" x2="226.43194580078125" y2="167.373849868774414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.941513061523438 166.352548599243164 226.43194580078125 172.397211074829102 229.92236328125 166.352548599243164 222.941513061523438 166.352548599243164"></polygon>
      </g>
      <g>
        <rect x="146.431938354383419" y="251.535640314990815" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(164.15435791015625 280.042520523071289)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="226.43194580078125" y1="225.156167984008789" x2="226.43194580078125" y2="245.132806777954102" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.941513061523438 244.111505508423761 226.43194580078125 250.156152725219727 229.92236328125 244.111505508423761 222.941513061523438 244.111505508423761"></polygon>
      </g>
      <g>
        <polyline points="306.43194580078125 276.535638809204102 371.965972900390625 276.535638809204102 371.965972900390625 75.485010147094727" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="375.456390380859375 76.506303787231445 371.965972900390625 70.461648941040039 368.47552490234375 76.506303787231445 375.456390380859375 76.506303787231445"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode__ol_m2f_zgm_vbc"><br>                                            <li class="li">One stream is sent to qtimetamux plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as an input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_ufn_2lm_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_ky5_grn_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-encode__ol_jcd_wnk_5bc"><br>                                            <li class="li">Loads the YOLOv5 submodule. </li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives video stream and text stream with bounding box<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with contents of the video stream from<br>                                        its sink pad.</li><br><br>                                    <li class="li">Adds bounding boxes as GstVideoRegionOfInterest from data<br>                                        sinkpad to GST buffers meta (meta muxing) on its source<br>                                        pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the bounding boxes on the VideoFrame using CL. </li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_wsc_bsn_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Use qtivcomposer to mix original frame with bounding box mask

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
    \external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/yolov5.tflite ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov5 labels=/etc/labels/yolov5.json ! \
    video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the
                use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify object scenes from a video stream, which is coming through a camera
                    source.
2. Using qtivcomposer, compose bounding boxes over the objects detected and the
                    original video stream.
3. Encode this stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and stored as an MP4 file.

Figure : Pipeline for bounding box mask and encode with qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1451.056915283208582" height="160.4840087890625" viewbox="0 0 1451.056915283208582 160.4840087890625">
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <rect x=".500030517578125" y=".50042724609375" width="1450.056640625" height="159.4833984375" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
      <path d="M1443.056915283208582,1c3.859802246090112,0,7,3.14019775390625,7,7v144.4840087890625c0,3.85980224609375-3.140197753909888,7-7,7H8c-3.859832763671875,0-7-3.14019775390625-7-7V8c0-3.85980224609375,3.140167236328125-7,7-7h1435.056915283208582M1443.056915283208582,0H8C3.581817626953125,0,0,3.58184814453125,0,8v144.4840087890625c0,4.4183349609375,3.581817626953125,8,8,8h1435.056915283208582c4.418334960930224,0,8-3.5816650390625,8-8V8c0-4.41815185546875-3.581665039069776-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(1261.550140380859375 136.57550048828125)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="1241.299095551647042" y="124.4840087890625" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1360.131927490234375 136.57550048828125)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1339.880876964503841" y="124.4840087890625" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_3" data-name="Layer 3">
    <g>
      <rect x="64.009374944371302" y="20.000034932025301" width="94.085224091331838" height="84.570987947263347" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="158.0946044921875" y1="62.2855224609375" x2="171.788223266601562" y2="62.2855224609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="170.621017456054688 66.27459716796875 177.529205322264716 62.2855224609375 170.621017456054688 58.29644775390625 170.621017456054688 66.27459716796875"></polygon>
      </g>
      <line x1="48.574773890908546" y1="62.285528905656975" x2="64.009374944371302" y2="62.285528905656975" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <g>
        <line x1="788.53387451171875" y1="72.7698974609375" x2="802.227508544921875" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="801.060302734375 76.75897216796875 807.968475341796875 72.7698974609375 801.060302734375 68.78082275390625 801.060302734375 76.75897216796875"></polygon>
      </g>
      <path d="M690.820141910025995,20.000034932025301h89.713734934081003c4.415320195951949,0,8,3.584679804048001,8,8v69.565137401088577c0,3.818614043792579-3.100230115765385,6.918844159557921-6.918844159557921,6.918844159557921h-90.794890774523083c-4.415320195951949,0-8-3.584679804047994-8-8V28.000034932025301c0-4.415320195951999,3.584679804048051-8,8-8Z" style="fill: #2a2aea;"></path>
      <text transform="translate(69.582820892333984 66.610363006591797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtiqmmfsrc</tspan></text>
      <rect x="177.985937085704791" y="20.000034932025301" width="64.167217158848871" height="84.570987947263347" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="242.153152465820312" y1="72.7698974609375" x2="255.846771240234375" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="254.679580688476562 76.75897216796875 261.587753295898438 72.7698974609375 254.679580688476562 68.78082275390625 254.679580688476562 76.75897216796875"></polygon>
      </g>
      <text transform="translate(198.979305267333984 66.610363006591797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      <text transform="translate(692.974910736083984 66.567394256591797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
      <path d="M1216.957639514377661,20.000034932025301h89.713734934081003c4.415320195952063,0,8,3.584679804048001,8,8v69.565137401088577c0,3.818614043792579-3.100230115765271,6.918844159557921-6.918844159557921,6.918844159557921h-90.794890774523083c-4.415320195952063,0-8-3.584679804047994-8-8V28.000034932025301c0-4.415320195951999,3.584679804047937-8,8-8Z" style="fill: #2a2aea;"></path>
      <text transform="translate(1227.252254486083984 66.567394256591797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
      <g>
        <line x1="242.153152465820312" y1="31.6297607421875" x2="677.07916259765625" y2="31.6297607421875" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="675.911972045898438 35.61883544921875 682.820144653320312 31.6297607421875 675.911972045898438 27.64068603515625 675.911972045898438 35.61883544921875"></polygon>
      </g>
      <g>
        <line x1="789.152496337890625" y1="31.6297607421875" x2="1203.216644287109375" y2="31.6297607421875" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1202.049468994140625 35.61883544921875 1208.957611083984375 31.6297607421875 1202.049468994140625 27.64068603515625 1202.049468994140625 35.61883544921875"></polygon>
      </g>
      <rect x="261.587755298020056" y="41.055775532222469" width="126.856481920905935" height="63.42824096044933" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="388.444236755371094" y1="72.7698974609375" x2="402.137853622436523" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="400.970663070678711 76.75897216796875 407.878838539123535 72.7698974609375 400.970663070678711 68.78082275390625 400.970663070678711 76.75897216796875"></polygon>
      </g>
      <rect x="407.952096441529648" y="41.055775532222469" width="95.142361440675813" height="63.42824096044933" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="503.094459533691406" y1="72.7698974609375" x2="516.788078308105469" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="515.620880126953125 76.75897216796875 522.529060363769531 72.7698974609375 515.620880126953125 68.78082275390625 515.620880126953125 76.75897216796875"></polygon>
      </g>
      <g>
        <line x1="663.385543823242188" y1="72.7698974609375" x2="677.07916259765625" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="675.911972045898438 76.75897216796875 682.820144653320312 72.7698974609375 675.911972045898438 68.78082275390625 675.911972045898438 76.75897216796875"></polygon>
      </g>
      <rect x="522.529058935668218" y="41.055775532222469" width="140.856481920895021" height="63.42824096044933" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(269.542293548583984 77.093761444091797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      <text transform="translate(421.342586517333984 77.094738006591797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      <text transform="translate(530.679732799530029 77.093761444091797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      <rect x="807.968477897569755" y="41.055775532222469" width="114.005008262241972" height="63.42824096044933" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="922.2174072265625" y1="72.7698974609375" x2="935.9110107421875" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="934.74383544921875 76.75897216796875 941.652008056640625 72.7698974609375 934.74383544921875 68.78082275390625 934.74383544921875 76.75897216796875"></polygon>
      </g>
      <rect x="941.651998131057553" y="41.055775532222469" width="114.005008262241972" height="63.42824096044933" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="1055.657012939453125" y1="72.7698974609375" x2="1069.350616455078125" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1068.183441162109375 76.75897216796875 1075.091583251953125 72.7698974609375 1068.183441162109375 68.78082275390625 1068.183441162109375 76.75897216796875"></polygon>
      </g>
      <g>
        <line x1="1189.523040771484375" y1="72.7698974609375" x2="1203.216644287109375" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1202.049468994140625 76.75897216796875 1208.957611083984375 72.7698974609375 1202.049468994140625 68.78082275390625 1202.049468994140625 76.75897216796875"></polygon>
      </g>
      <path d="M1343.343211044359123,20.000034932025301h79.713734934079184c4.415320195952063,0,8,3.584679804048001,8,8v69.565137401088691c0,3.818614043792522-3.100230115765271,6.918844159557807-6.918844159557693,6.918844159557807h-80.794890774521491c-4.415320195952063,0-8-3.584679804047994-8-8V28.000034932025301c0-4.415320195951999,3.584679804047937-8,8-8Z" style="fill: #007884;"></path>
      <text transform="translate(1357.766482770442963 66.567394256591797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      <g>
        <line x1="1314.671356201171875" y1="72.7698974609375" x2="1328.365020751953125" y2="72.7698974609375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1327.197784423828125 76.75897216796875 1334.105987548828125 72.7698974609375 1327.197784423828125 68.78082275390625 1327.197784423828125 76.75897216796875"></polygon>
      </g>
      <g>
        <line x1="1314.671356201171875" y1="31.6297607421875" x2="1328.365020751953125" y2="31.6297607421875" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1327.197784423828125 35.61883544921875 1334.105987548828125 31.6297607421875 1327.197784423828125 27.64068603515625 1327.197784423828125 35.61883544921875"></polygon>
      </g>
      <rect x="1075.335518364543532" y="41.055775532222469" width="114.005008262243791" height="63.42824096044933" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(819.447566986083984 77.093761444091797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      <text transform="translate(960.860652923583984 77.094738006591797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      <text transform="translate(1100.959285736083984 77.093761444091797)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      <path d="M34.376152213897512,47.361236914959591h-9.327682494184046l-4.663841247091113,5.596609496511519h-5.596609496511519c-2.060614596715823,0-3.731072997672527,1.670458400956704-3.731072997672527,3.731072997674346v16.789828489530919c0,2.060616375827522,1.670458400956704,3.731072997674346,3.731072997672527,3.731072997674346h29.848583981389311c2.060616375829341,0,3.731072997672527-1.670456621846824,3.731072997672527-3.731072997674346v-16.789828489530919c0-2.060614596717642-1.670456621843186-3.731072997674346-3.731072997672527-3.731072997674346h-5.5966094965097l-4.663841247092932-5.596609496511519Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
      <circle cx="29.712310966805489" cy="62.285528905656975" r="5.596609496511973" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></circle>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_wqx_ntn_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode__ol_xqx_ntn_vbc"><br>                                            <li class="li">One stream is sent to qtimetamux plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_yqx_ntn_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode__ul_zqx_ntn_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_arx_ntn_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_brx_ntn_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-encode__ol_crx_ntn_vbc"><br>                                            <li class="li">Loads the YOLOv5 submodule. </li><br><br>                                            <li class="li">Produces video frames with only bounding boxes that<br>                                                can be overlaid on objects.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_wgh_rtn_vbc"><br>                                    <li class="li">Receives the original video stream and video stream with<br>                                        bounding boxes on its sinkpads</li><br><br>                                    <li class="li">On its sourcepads, produces content that's composed of video<br>                                        streams processed from its sinkpads.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode__ol_frx_ntn_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

**Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/tensorflow-lite-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Object detection and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-object-detection-and-display.md) [Next Topic
Image segmentation and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-segmentation-and-display.md)