# Camera encode, object detection, and display

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/camera-encode-file-detection-yolov8-overlay-display.html](https://docs.qualcomm.com/doc/80-70022-50/topic/camera-encode-file-detection-yolov8-overlay-display.html)

The **gst-camera-two-stream-encode-file-detection-display.py** application encodes
        the camera stream and saves it to a file. The application uses a YOLOX LiteRT model to
        identify the objects in a scene from a camera stream. The application overlays the bounding
        boxes over the detected objects and displays the results.

Note: This application isn't supported on the Ubuntu Server.

Figure : Pipeline for camera encode and object detection
            
            <!--?xml version="1.0" encoding="UTF-8"?-->
<svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" width="1179.636054039003284" height="356.767681121826172" viewbox="0 0 1179.636054039003284 356.767681121826172">
  <g>
    <rect x=".499945640563965" y=".499950408935547" width="1178.6357421875" height="355.767578125" rx="7.499999999999883" ry="7.499999999999883" style="fill: #fafafa;"></rect>
    <path d="M1171.636054039003284,1c3.859741210941138,0,7,3.140153884887695,7,7v340.767681121826172c0,3.85986328125-3.140258789058862,7-7,7H7.999999046325684c-3.859845161437988,0-6.999999046325684-3.14013671875-6.999999046325684-7V8C1,4.140153884887695,4.140153884887695,1,7.999999046325684,1h1163.6360549926776M1171.636054039003284,0H7.999999046325684C3.581691741943359,0,0,3.581691741943359,0,8v340.767681121826172c0,4.418304443359375,3.581691741943359,8,7.999999046325684,8h1163.6360549926776c4.41845703125,0,8-3.581695556640625,8-8V8c0-4.418308258056641-3.58154296875-8-8-8h0Z" style="fill: #d2d7e1;"></path>
  </g>
  <g>
    <g>
      <rect x="20.363822349333532" y="19.999943074481052" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
      <text transform="translate(53.79355525970459 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
    </g>
    <g>
      <rect x="166.431883994948294" y="19.999943074481052" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(215.342101097106934 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
    </g>
    <g>
      <rect x="153.931883994948294" y="96.841002557628599" width="145" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(170.958991050720215 125.347873687744141)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
    </g>
    <g>
      <line x1="140.897849082946777" y1="44.999942779541016" x2="160.87448787689209" y2="44.999942779541016" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="159.853201866149902 48.490375518798828 165.897849082946777 44.999942779541016 159.853201866149902 41.509513854980469 159.853201866149902 48.490375518798828"></polygon>
    </g>
    <g>
      <rect x="311.965914817752491" y="19.999943074481052" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
      <text transform="translate(329.262852668762207 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
    </g>
    <g>
      <line x1="286.431891441345215" y1="44.999942779541016" x2="306.408514976502374" y2="44.999942779541016" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="305.38724422454834 48.490375518798828 311.431891441345215 44.999942779541016 305.38724422454834 41.509513854980469 305.38724422454834 48.490375518798828"></polygon>
    </g>
    <g>
      <rect x="457.499945640561236" y="19.999943074481052" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(482.937506675720215 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
    </g>
    <g>
      <line x1="431.96591854095459" y1="44.999942779541016" x2="451.94254207611084" y2="44.999942779541016" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="450.921271324157715 48.490375518798828 456.96591854095459 44.999942779541016 450.921271324157715 41.509513854980469 450.921271324157715 48.490375518798828"></polygon>
    </g>
    <g>
      <rect x="603.033976463368163" y="19.999943074481052" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(617.51059627532959 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
    </g>
    <g>
      <line x1="577.499945640563965" y1="44.999942779541016" x2="597.476569175720215" y2="44.999942779541016" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="596.455267906188965 48.490375518798828 602.499945640563965 44.999942779541016 596.455267906188965 41.509513854980469 596.455267906188965 48.490375518798828"></polygon>
    </g>
    <g>
      <rect x="748.568007286175998" y="19.999943074481052" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
      <text transform="translate(770.775092124938965 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
    </g>
    <g>
      <line x1="723.034003257751465" y1="44.999942779541016" x2="743.010626792907715" y2="44.999942779541016" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="741.989325523376465 48.490375518798828 748.034003257751465 44.999942779541016 741.989325523376465 41.509513854980469 741.989325523376465 48.490375518798828"></polygon>
    </g>
    <g>
      <rect x="894.102038108983834" y="19.999943074481052" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
      <text transform="translate(922.723212242126465 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
    </g>
    <g>
      <line x1="868.567999839782715" y1="44.999942779541016" x2="888.544623374938965" y2="44.999942779541016" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="887.523383140563965 48.490375518798828 893.567999839782715 44.999942779541016 887.523383140563965 41.509513854980469 887.523383140563965 48.490375518798828"></polygon>
    </g>
    <g>
      <rect x="1039.636068931793488" y="19.999943074481052" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(1074.202582359313965 48.506644725799561)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
    </g>
    <g>
      <line x1="1014.101996421813965" y1="44.999942779541016" x2="1034.078680992124646" y2="44.999942779541016" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="1033.057440757753284 48.490375518798828 1039.101996421813965 44.999942779541016 1033.057440757753284 41.509513854980469 1033.057440757753284 48.490375518798828"></polygon>
    </g>
    <g>
      <line x1="226.431891441345215" y1="70.461528778076172" x2="226.431891441345215" y2="90.438167572021484" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="222.941458702087402 89.416873931884766 226.431891441345215 95.461528778076172 229.922308921813965 89.416873931884766 222.941458702087402 89.416873931884766"></polygon>
    </g>
    <g>
      <rect x="153.931883994948294" y="173.77656000270872" width="145" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(192.251959800720215 202.283443450927734)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
    </g>
    <g>
      <line x1="226.431891441345215" y1="147.397075653076172" x2="226.431891441345215" y2="167.373714447021484" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="222.941458702087402 166.352428436279297 226.431891441345215 172.397075653076172 229.922308921813965 166.352428436279297 222.941458702087402 166.352428436279297"></polygon>
    </g>
    <g>
      <rect x="153.931883994948294" y="251.535514323359166" width="145" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(164.154303550720215 280.042400360107422)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
    </g>
    <g>
      <line x1="226.431891441345215" y1="225.156032562255859" x2="226.431891441345215" y2="245.132671356201172" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <polygon points="222.941458702087402 244.111370086669922 226.431891441345215 250.156047821044922 229.922308921813965 244.111370086669922 222.941458702087402 244.111370086669922"></polygon>
    </g>
    <g>
      <polyline points="298.931891441345215 276.535503387451172 371.96591854095459 276.535503387451172 371.96591854095459 75.484889984130859" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
      <polygon points="375.45633602142334 76.506175994873047 371.96591854095459 70.461528778076172 368.475470542907715 76.506175994873047 375.45633602142334 76.506175994873047"></polygon>
    </g>
  </g>
  <g>
    <g>
      <text transform="translate(990.129157066345215 332.859157562255859)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect x="969.878148833648083" y="320.767681121826172" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
    </g>
    <g>
      <text transform="translate(1088.710944175720215 332.859157562255859)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
      <rect x="1068.459930246503063" y="320.767681121826172" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
    </g>
  </g>
</svg>

For information about the plugins used in this pipeline, see [Pipeline flow](https://docs.qualcomm.com/doc/80-70022-50/topic/camera-encode-file-detection-yolov8-overlay-display.html#camera-encode-file-detection-yolov8-overlay-display__section_mty_hyk_bdc).

## Run the application on the target device

1. Ensure that you complete the [Prerequisites](https://docs.qualcomm.com/doc/80-70022-50/topic/prerequisites-for-python-sample-applications.html).
2. Run the camera encode and object detection script on the target
                    device:

        gst-camera-two-stream-encode-file-detection-display.pyCopy to clipboard
3. To display the available help options, run the following
                    command:

        gst-camera-two-stream-encode-file-detection-display.py -hCopy to clipboard

The following are the default files in the Python script:

Table : Default model and label files for
                    gst-camera-two-stream-encode-file-detection-display.py

| Files | Directory |
| :--- | :--- |
| Detection model (YOLOv8) | /etc/models/yolox\_quantized.tflite |
| Detection labels (same for both models) | /etc/labels/yolox.json |

## Expected output

The output is saved at /etc/media/test.mp4.

## Pipeline flow

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | Collects two video streams from the camera:<ul class="ul" id="camera-encode-file-detection-yolov8-overlay-display__ul_wcz_11l_bdc"><br>                                    <li class="li">One stream is saved to a file.</li><br><br>                                    <li class="li">The second stream is used for detection. It's split using<br>                                        tee and sent to the following:<ul class="ul" id="camera-encode-file-detection-yolov8-overlay-display__ul_nsj_htk_bdc"><br>                                            <li class="li">qtimetamux to retain the video stream.</li><br><br>                                            <li class="li">qtimlvconverter to convert the video stream to input<br>                                                tensors for the classification inference. </li><br><br>                                        </ul><br></li><br><br>                                </ul> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | Encodes H.264 video. |
| h264parse | Parses H.264 video. |
| mp4mux | Multiplexes the video data. |
| filesink | Saves the video data to a file. |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="camera-encode-file-detection-yolov8-overlay-display__ol_i5w_4wl_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="camera-encode-file-detection-yolov8-overlay-display__ol_zdw_qwl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as an input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="camera-encode-file-detection-yolov8-overlay-display__ol_u1l_cxl_vbc"><br>                                    <li class="li">Loads the model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="camera-encode-file-detection-yolov8-overlay-display__ol_ky5_grn_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model.</li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        such as video or text that the multimedia plugins can<br>                                        process later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="camera-encode-file-detection-yolov8-overlay-display__ol_jcd_wnk_5bc"><br>                                            <li class="li">Loads the YOLOv8 submodule.</li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) | <ol class="ol" id="camera-encode-file-detection-yolov8-overlay-display__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives video stream and text stream with bounding box<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with contents of the video stream from<br>                                        its sink pad.</li><br><br>                                    <li class="li">Adds bounding boxes as GstVideoRegionOfInterest from data<br>                                        sinkpad to GST buffers meta (meta muxing) on its source<br>                                        pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="camera-encode-file-detection-yolov8-overlay-display__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the bounding boxes on the VideoFrame using CL.</li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70022-50/topic/waylandsink.html) | <ol class="ol" id="camera-encode-file-detection-yolov8-overlay-display__ol_cgt_mwl_vbc"><br>                                    <li class="li">Receives the video in its sinkpad</li><br><br>                                    <li class="li">Submits the video stream to Weston. </li><br><br>                                    <li class="li">Weston renders the video stream on a local display<br>                                        device.</li><br><br>                                </ol> |

## Related information

[Object detection](https://docs.qualcomm.com/doc/80-70022-50/topic/gst-ai-object-detection.html)

**Parent Topic:** [Run Python-based applications](https://docs.qualcomm.com/doc/80-70022-50/topic/python-sample-applications.html)

Last Published: Feb 20, 2026

[Previous Topic
Transform and encode a camera stream](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/camera-transform-downscale-and-rotate-encode.md) [Next Topic
Object detection, classification, and segmentation](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/object-detection-classification-and-segmentation-python-sample-app.md)