# Object detection and display with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-display.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-display.html)

The use cases use a YOLOv5 LiteRT model to identify the object in a scene. The use
        case is to either overlay or compose the bounding boxes over the detected objects, and then
        display the results.

## Use qtivoverlay plugin to apply bounding box overlay

Run the use case on the target device:

    gst-launch-1.0 -e --gst-debug=2 qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! \
    queue ! tee name=split split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! waylandsink fullscreen=true split. ! queue ! \
    qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/yolov5.tflite ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov5 labels=/etc/labels/yolov5.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case,  use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identifies object scenes in the scene from a video stream, which is coming
                    through a camera source.
2. Overlays bounding boxes over the detected objects using overlaylib.
3. Displays the results.

Figure : Pipeline for bounding box overlay
                
                <?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="940" height="349.974597930908203" viewbox="0 0 940 349.974597930908203">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".5" y=".499774932861328" width="939" height="348.974609375" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
      <path d="M932,1c3.859741210939319,0,7,3.140233993530273,7,7v333.974597930908203c0,3.8597412109375-3.140258789060681,7-7,7H8c-3.859771728515625,0-7-3.1402587890625-7-7V8c0-3.859766006469727,3.140228271484375-7,7-7h924M932,0H8C3.581771850585938,0,0,3.581764221191406,0,8v333.974597930908203c0,4.418212890625,3.581771850585938,8,8,8h924c4.418334960939319,0,8-3.581787109375,8-8V8c0-4.418235778808594-3.581665039060681-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(750.4930419921875 326.066089630126953)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="730.242044548313061" y="313.974597930908203" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(849.0748291015625 326.066089630126953)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="828.82382596116986" y="313.974597930908203" width="16" height="16" rx="1.999999999999972" ry="1.999999999999972" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <g>
          <rect x="20" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
          <text transform="translate(73.429725646972656 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
        </g>
        <g>
          <line x1="180" y1="45.000003814697266" x2="199.976654052734375" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="198.955352783203125 48.490436553955078 205 45.000003814697266 198.955352783203125 41.509574890136719 198.955352783203125 48.490436553955078"></polygon>
        </g>
        <g>
          <line x1="285" y1="68.974582672119141" x2="285" y2="88.951221466064453" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="281.50958251953125 87.929927825927734 285 93.974582672119141 288.49041748046875 87.929927825927734 281.50958251953125 87.929927825927734"></polygon>
        </g>
        <g>
          <rect x="205" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
          <text transform="translate(273.910198211669922 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
        </g>
        <g>
          <rect x="205" y="93.974581043215949" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(229.527484893798828 123.650539398193359)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
        </g>
        <g>
          <line x1="285" y1="143.974582672119141" x2="285" y2="163.951221466064453" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="281.50958251953125 162.929920196533203 285 168.974582672119141 288.49041748046875 162.929920196533203 281.50958251953125 162.929920196533203"></polygon>
        </g>
        <g>
          <rect x="205" y="168.974581043215949" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(250.820453643798828 198.650547027587891)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
        </g>
        <g>
          <line x1="285" y1="218.974582672119141" x2="285" y2="238.951221466064453" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="281.50958251953125 237.929920196533203 285 243.974567413330078 288.49041748046875 237.929920196533203 281.50958251953125 237.929920196533203"></polygon>
        </g>
        <g>
          <rect x="205" y="243.974581043215949" width="160" height="49.999999999998181" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(222.722797393798828 273.650547027587891)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
        </g>
        <g>
          <line x1="365" y1="45.000003814697266" x2="384.97662353515625" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="383.955322265625 48.490436553955078 390 45.000003814697266 383.955322265625 41.509574890136719 383.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect x="390" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(427.296905517578125 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
        </g>
        <g>
          <line x1="550" y1="45.000003814697266" x2="569.97662353515625" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="568.955322265625 48.490436553955078 575 45.000003814697266 568.955322265625 41.509574890136719 568.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect x="575" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
          <text transform="translate(620.437530517578125 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
        </g>
        <g>
          <line x1="735" y1="45.000003814697266" x2="754.97662353515625" y2="45.000003814697266" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
          <polygon points="753.955322265625 48.490436553955078 760 45.000003814697266 753.955322265625 41.509574890136719 753.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect x="760" y="20.000005443602277" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
          <text transform="translate(796.09771728515625 48.506705760955811)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">waylandsink</tspan></text>
        </g>
      </g>
      <g>
        <polyline points="365 271.338306427001953 470 271.338306427001953 470 74.922946929931641" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="473.49041748046875 75.944240570068359 470 69.899585723876953 466.50958251953125 75.944240570068359 473.49041748046875 75.944240570068359"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

Table : Pipeline processing stages for bounding box overlay

| Process | Description |
| --- | --- |
| qtiqmmfsrc | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ol_m2f_zgm_vbc"><br>                                            <li class="li">One stream is sent to <a title="The qtimetamux plugin uses frame matching techniques to associate or attach ML string based postprocessing results (output from the postprocessing plugin) or CV information to original frame such as GstMeta." class="xref cursorpointer" onclick="Window.BookmapComponent.navigateFile('qtimetamux.html')">qtimetamux</a><br>                                                plugin to retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** |
| qtimlvconverter | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** |
| qtimltflite | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_ufn_2lm_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** |
| qtimlpostprocess | <ol class="ol"><br>                                    <li class="li"> Receives the inference tensors from object detection. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-display__ol_jcd_wnk_5bc"><br>                                            <li class="li">Loads the YOLOv5 submodule. </li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| qtimetamux | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives video stream and text stream with bounding box<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with contents of video stream from its<br>                                        sink pad.</li><br><br>                                    <li class="li">Adds bounding boxes as GstVideoRegionOfInterest from data<br>                                        sinkpad to GST buffers meta (meta muxing) on its source<br>                                        pad.</li><br><br>                                </ol> |
| qtivoverlay | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the bounding boxes on the VideoFrame using CL. </li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| **Output** |
| Waylandsink | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_dyv_1lm_vbc"><br>                                    <li class="li">Receives the video stream on its sinkpad.</li><br><br>                                    <li class="li">Submits the video stream to Weston. </li><br><br>                                    <li class="li">Weston renders the video stream and bounding boxes generated<br>                                        for the objects in that scene on a local display<br>                                        device.</li><br><br>                                </ol> |

## Use qtivcomposer to mix original frame with bounding box mask

Run the use case on the target
                device:

    gst-launch-1.0 -e --gst-debug=2 qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split split. ! \
    queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true split. ! queue ! qtimlvconverter ! queue ! \
    qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=/etc/models/yolov5.tflite ! queue ! qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov5 labels=/etc/labels/yolov5.json \
     ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case,  use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identifies object scenes in the scene from a video stream, which is coming
                    through a camera source.
2. Composes the following using qtivcomposer:
    1. Bounding boxes over objects detected.
    2. Original video stream.
3. Displays the results.

Figure : Pipeline for bounding box mask with qtivcomposer
                
                <?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="755" height="349.974590301513672" viewbox="0 0 755 349.974590301513672">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".5" y=".499797821044922" width="754" height="348.974609375" rx="7.499999999999944" ry="7.499999999999944" style="fill: #fafafa;"></rect>
      <path d="M747,1c3.85980224609375,0,7,3.140201568603516,7,7v333.974590301513672c0,3.85980224609375-3.14019775390625,7-7,7H8c-3.85980224609375,0-7-3.14019775390625-7-7V8c0-3.859798431396484,3.14019775390625-7,7-7h739M747,0H8C3.581695556640625,0,0,3.581701278686523,0,8v333.974590301513672c0,4.418304443359375,3.581695556640625,8,8,8h739c4.41827392578125,0,8-3.581695556640625,8-8V8c0-4.418298721313477-3.58172607421875-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(564.49249267578125 325.765972137451172)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="544.241457639080181" y="313.674465179443359" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(663.07427978515625 325.765972137451172)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="642.823239051933342" y="313.674465179443359" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(73.429729461669922 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <line x1="180" y1="45.000011444091797" x2="199.976654052734375" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="198.955352783203125 48.490444183349609 205 45.000011444091797 198.955352783203125 41.50958251953125 198.955352783203125 48.490444183349609"></polygon>
      </g>
      <g>
        <line x1="285" y1="68.974590301513672" x2="285" y2="88.951229095458984" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 87.929935455322266 285 93.974590301513672 288.49041748046875 87.929935455322266 281.50958251953125 87.929935455322266"></polygon>
      </g>
      <g>
        <rect x="205" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(273.910202026367188 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="205" y="93.97459047333632" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(229.527496337890625 123.650547027587891)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="285" y1="143.974590301513672" x2="285" y2="163.951229095458984" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 162.929943084716797 285 168.974590301513672 288.49041748046875 162.929943084716797 281.50958251953125 162.929943084716797"></polygon>
      </g>
      <g>
        <rect x="205" y="168.97459047333632" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(250.820465087890625 198.650554656982422)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="285" y1="218.974590301513672" x2="285" y2="238.951244354248047" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 237.929943084716797 285 243.974590301513672 288.49041748046875 237.929943084716797 281.50958251953125 237.929943084716797"></polygon>
      </g>
      <g>
        <rect x="205" y="243.97459047333632" width="160" height="49.999999999999091" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(222.722808837890625 273.650554656982422)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="365" y1="45.000011444091797" x2="384.97662353515625" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="383.955322265625 48.490444183349609 390 45.000011444091797 383.955322265625 41.50958251953125 383.955322265625 48.490444183349609"></polygon>
      </g>
      <g>
        <rect x="390" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(421.140655517578125 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="550" y1="45.000011444091797" x2="569.97662353515625" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="568.955322265625 48.490444183349609 575 45.000011444091797 568.955322265625 41.50958251953125 568.955322265625 48.490444183349609"></polygon>
      </g>
      <g>
        <rect x="575" y="20.000014873722648" width="160" height="50" rx="4.000000000000019" ry="4.000000000000019" style="fill: #007884;"></rect>
        <text transform="translate(611.09771728515625 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">waylandsink</tspan></text>
      </g>
    </g>
    <g>
      <polyline points="365 268.974590301513672 470 268.974590301513672 470 75.847972869873047" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
      <polygon points="473.49041748046875 76.869258880615234 470 70.824611663818359 466.50958251953125 76.869258880615234 473.49041748046875 76.869258880615234"></polygon>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

Table : Pipeline processing stages for bounding box mask with qtivcomposer

| Process | Description |
| --- | --- |
| qtiqmmfsrc | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_z2v_xlm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ol_afv_xlm_vbc"><br>                                            <li class="li">One stream is sent to qtivcomposer plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to a ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** |
| qtimlvconverter | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_bfv_xlm_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ul_cfv_xlm_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** |
| qtimltflite | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_dfv_xlm_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_efv_xlm_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-display__ol_ffv_xlm_vbc"><br>                                            <li class="li">Loads the YOLOv5 submodule. </li><br><br>                                            <li class="li">Produces video frames with only bounding boxes that<br>                                                can be overlaid on objects.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| qtivcomposer | <ol class="ol"><br>                                    <li class="li">Receives the original video stream and video stream with<br>                                        bounding boxes on its sinkpads.</li><br><br>                                    <li class="li">On its sourcepads, produces content that's composed of the<br>                                        video streams processed from its sinkpads.</li><br><br>                                </ol> |
| **Output** |
| Waylandsink | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_ifv_xlm_vbc"><br>                                    <li class="li">Receives the video stream on its sinkpad.</li><br><br>                                    <li class="li">Submits the video stream to Weston. </li><br><br>                                    <li class="li">Weston displays the following on a local display device:<ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-display__ol_nv1_xzm_vbc"><br>                                            <li class="li">The video stream is captured from the camera.</li><br><br>                                            <li class="li">The bounding boxes are drawn over the allowed number<br>                                                of objects identified in that scene.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |

**Parent Topic:** LiteRT use cases

Last Published: Feb 20, 2026

Previous Topic
 
Audio classification decode and display with LiteRT Next Topic

Object detection and encode with LiteRT