# Object detection and encode with Neural Processing SDK 

Source: [https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd.html](https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd.html)

The use cases implement a yolox.dlc object detection model with
        Qualcomm Neural Processing SDK to identify an object from a camera stream. The use case is
        to overlay or compose the bounding boxes over the detected objects, and then encode the
        stream as a H.264 bitstream.

Download [YOLOX](https://aihub.qualcomm.com/iot/models/yolox?searchTerm=yolox%29) Qualcomm AI runtime w8a8 precision model
            from AI hub. The YOLOX model uses the YOLOv8 postprocessing module.

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

## Use qtivoverlay plugin to apply detection overlay

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/yolox-yolo-x-w8a8.dlc tensors="<boxes,scores,class_idx>" ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 70.0}" results=5 module=yolov8 labels=/etc/labels/yolox.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify object scenes from a video stream, which is coming through a camera
                    source.
2. Overlay bounding boxes over the detected objects using overlaylib.
3. Encode this stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and stored as an MP4 file.

Figure : Pipeline for bounding box overlay and encode
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" width="1073.568058013916016" height="448.228012084960938" viewbox="0 0 1073.568058013916016 448.228012084960938">
  <defs>
    <style>.svg-1 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-1 .cls-2 { fill: #fff; font-size: 16px }
.svg-1 .cls-2,.svg-1 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-1 .cls-4 { fill: #007884 }
.svg-1 .cls-5 { fill: #d2d7e1 }
.svg-1 .cls-6 { fill: #2a2aea }
.svg-1 .cls-3 { font-size: 14px }
.svg-1 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g>
    <rect class="cls-7" x=".500186920166016" y=".500228881835938" width="1072.5673828125" height="447.22802734375" rx="7.499999999999946" ry="7.499999999999946"></rect>
    <path class="cls-5" d="M1065.568058013916016,1c3.85986328125,0,7,3.140132904052734,7,7v432.228012084960938c0,3.85986328125-3.14013671875,7-7,7H8.000003814697266c-3.859870910644531,0-7.000003814697266-3.14013671875-7.000003814697266-7V8c0-3.859867095947266,3.140132904052734-7,7.000003814697266-7h1057.56805419921875M1065.568058013916016,0H8.000003814697266C3.581737518310547,0,0,3.581729888916016,0,8v432.228012084960938c0,4.418243408203125,3.581737518310547,8,8.000003814697266,8h1057.56805419921875c4.418212890630457,0,8-3.581756591796875,8-8V8c0-4.418270111083984-3.581787109369543-8-8-8h0Z"></path>
  </g>
  <g>
    <g>
      <text class="cls-3" transform="translate(881.060031890869141 424.319470802449359)"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect class="cls-6" x="860.809052765556771" y="412.228029799924116" width="16" height="16" rx="2" ry="2"></rect>
    </g>
    <g>
      <text class="cls-3" transform="translate(979.641819000244141 424.319470802449359)"><tspan x="0" y="0">Open source</tspan></text>
      <rect class="cls-4" x="959.390834178415389" y="412.228029799924116" width="16" height="16" rx="2" ry="2"></rect>
    </g>
  </g>
  <g>
    <rect class="cls-4" x="19.999953651705255" y="110.692399021678284" width="160" height="50" rx="4.000000000000007" ry="4.000000000000007"></rect>
    <text class="cls-2" transform="translate(88.910175323486328 139.199109474324359)"><tspan x="0" y="0">tee</tspan></text>
  </g>
  <g>
    <rect class="cls-6" x="19.999953651705255" y="187.53345850482583" width="160" height="49.999999999999091" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(44.527057647705078 216.040326515339984)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
  </g>
  <g>
    <rect class="cls-6" x="205.533984474510362" y="110.692399021678284" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(222.83091926574707 139.199109474324359)"><tspan x="0" y="0">qtimetamux</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="179.999958038330078" y1="135.692395607136859" x2="199.793003082275391" y2="135.692395607136859"></line>
    <polygon points="198.625797271728516 139.681470314168109 205.533969879150391 135.692395607136859 198.625797271728516 131.703320900105609 198.625797271728516 139.681470314168109"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="351.068015297318198" y="110.692399021678284" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(372.630573272705078 139.199109474324359)"><tspan x="0" y="0">qtivoverlay</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="325.533969879150391" y1="135.692395607136859" x2="344.793003082275391" y2="135.692395607136859"></line>
    <polygon points="343.625797271728516 139.681470314168109 350.533969879150391 135.692395607136859 343.625797271728516 131.703320900105609 343.625797271728516 139.681470314168109"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="496.602046120126033" y="110.692399021678284" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(511.078647613525391 139.199109474324359)"><tspan x="0" y="0">v4l2h264enc</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="471.067996978759766" y1="135.692395607136859" x2="490.327030181884766" y2="135.692395607136859"></line>
    <polygon points="489.159854888916016 139.681470314168109 496.067996978759766 135.692395607136859 489.159854888916016 131.703320900105609 489.159854888916016 139.681470314168109"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="642.136076942933869" y="110.692399021678284" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(664.343143463134766 139.199109474324359)"><tspan x="0" y="0">h264parse</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="616.602054595947266" y1="135.692395607136859" x2="635.861087799072266" y2="135.692395607136859"></line>
    <polygon points="634.693851470947266 139.681470314168109 641.602054595947266 135.692395607136859 634.693851470947266 131.703320900105609 634.693851470947266 139.681470314168109"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="787.670107765741704" y="110.692399021678284" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(816.291324615478516 139.199109474324359)"><tspan x="0" y="0">mp4mux</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="762.136051177978516" y1="135.692395607136859" x2="781.395084381103516" y2="135.692395607136859"></line>
    <polygon points="780.227909088134766 139.681470314168109 787.136051177978516 135.692395607136859 780.227909088134766 131.703320900105609 780.227909088134766 139.681470314168109"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="933.204138588551359" y="110.692399021678284" width="120.000000000005457" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(967.770694732666016 139.199109474324359)"><tspan x="0" y="0">filesink</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="907.670108795166016" y1="135.692395607136859" x2="926.929141998291016" y2="135.692395607136859"></line>
    <polygon points="925.761905670166016 139.681470314168109 932.670108795166016 135.692395607136859 925.761905670166016 131.703320900105609 925.761905670166016 139.681470314168109"></polygon>
  </g>
  <g>
    <line class="cls-1" x1="99.999958038330078" y1="161.153973976277484" x2="99.999958038330078" y2="181.792462745808734"></line>
    <polygon points="96.010883331298828 180.625287452839984 99.999958038330078 187.533460060261859 103.989017486572266 180.625287452839984 96.010883331298828 180.625287452839984"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="19.999953651705255" y="264.469015949905042" width="160" height="50" rx="4.000000000000007" ry="4.000000000000007"></rect>
    <text class="cls-2" transform="translate(64.679508209228516 294.144864479207172)"><tspan x="0" y="0">qtimlsnpe</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="99.999958038330078" y1="238.089536110066547" x2="99.999958038330078" y2="258.728024879597797"></line>
    <polygon points="96.010883331298828 257.560834327839984 99.999958038330078 264.469022194050922 103.989017486572266 257.560834327839984 96.010883331298828 257.560834327839984"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="19.999953651705255" y="342.227970270555488" width="160" height="50.000000000000909" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(37.722370147705078 370.734830299519672)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="99.999958038330078" y1="314.469022194050922" x2="99.999958038330078" y2="336.486989418172016"></line>
    <polygon points="96.010883331298828 335.319791237019672 99.999958038330078 342.227971473836078 103.989017486572266 335.319791237019672 96.010883331298828 335.319791237019672"></polygon>
  </g>
  <g>
    <polyline class="cls-1" points="179.999957374904625 367.227969436171406 265.533984474514 367.227977065565938 265.533984474514 166.177340774062031"></polyline>
    <polygon points="269.02440195498275 167.198642043594191 265.533984474514 161.153994826797316 262.04356699404525 167.198642043594191 269.02440195498275 167.198642043594191"></polygon>
  </g>
  <rect class="cls-4" x="19.999953651705255" y="19.769234376033637" width="160" height="50" rx="4.000000000000007" ry="4.000000000000007"></rect>
  <text class="cls-2" transform="translate(78.082019805908203 48.275952736043109)"><tspan x="0" y="0">filesrc</tspan></text>
  <g>
    <line class="cls-1" x1="179.999957374904625" y1="44.769229201797316" x2="199.258960060452409" y2="44.769229201797316"></line>
    <polygon points="198.09178476748275 48.758303908828566 204.999957374904625 44.769229201797316 198.09178476748275 40.780185012344191 198.09178476748275 48.758303908828566"></polygon>
  </g>
  <rect class="cls-4" x="204.999953651705255" y="19.769234376033637" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(253.703109741210938 48.275952736043109)"><tspan x="0" y="0">qtdemux</tspan></text>
  <g>
    <line class="cls-1" x1="364.999957374904625" y1="44.769229201797316" x2="384.258960060452409" y2="44.769229201797316"></line>
    <polygon points="383.09178476748275 48.758303908828566 389.999957374904625 44.769229201797316 383.09178476748275 40.780185012344191 383.09178476748275 48.758303908828566"></polygon>
  </g>
  <rect class="cls-4" x="389.999953651705255" y="19.769234376033637" width="160.000000000000909" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(432.207004547119141 48.275952736043109)"><tspan x="0" y="0">h264parse</tspan></text>
  <g>
    <line class="cls-1" x1="549.9999268573265" y1="44.769229201797316" x2="569.2589600604515" y2="44.769229201797316"></line>
    <polygon points="568.09178476748275 48.758303908828566 574.9999268573265 44.769229201797316 568.09178476748275 40.780185012344191 568.09178476748275 48.758303908828566"></polygon>
  </g>
  <g>
    <polyline class="cls-1" points="99.999713234279625 104.951419143203566 99.999728493068687 90.461612014297316 654.99998789248275 90.461520461562031 654.99998789248275 70.230716018203566"></polyline>
    <polygon points="103.988787941310875 103.784213332656691 99.999713234279625 110.692385940078566 96.010653786037437 103.784213332656691 103.988787941310875 103.784213332656691"></polygon>
  </g>
  <rect class="cls-4" x="574.999953651706164" y="19.769234376033637" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(609.378879547119141 48.275952736043109)"><tspan x="0" y="0">v4l2h264dec</tspan></text>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70023-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_m2f_zgm_vbc"><br>                                            <li class="li">One stream is sent to qtimetamux plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_ufn_2lm_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_ky5_grn_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_jcd_wnk_5bc"><br>                                            <li class="li">Loads the YOLOv8 submodule. </li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives video stream and text stream with the bounding box<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with the contents of the video stream<br>                                        from its sink pad.</li><br><br>                                    <li class="li">Adds the bounding boxes as<br>                                            <code class="ph codeph">GstVideoRegionOfInterest</code> from data<br>                                        sinkpad to GST buffers meta (meta muxing) on its source<br>                                        pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70023-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the bounding boxes on the VideoFrame using CL. </li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70023-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wsc_bsn_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Use qtivcomposer to mix original frame with detection mask

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/yolox-yolo-x-w8a8.dlc tensors="<boxes,scores,class_idx>" ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 70.0}" results=5 module=yolov8 labels=/etc/labels/yolox.json ! video/x-raw,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify object scenes from a video stream, which is coming through a camera
                    source.
2. Compose bounding boxes over objects detected and original video stream using
                    qtivcomposer.
3. Encode this stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and stored as an MP4 file.

Figure : Pipeline for bounding box mask and encode with qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" width="947.865921020507812" height="448.549873352050781" viewbox="0 0 947.865921020507812 448.549873352050781">
  <defs>
    <style>.svg-2 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-2 .cls-2 { fill: #fff; font-size: 16px }
.svg-2 .cls-2,.svg-2 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-2 .cls-4 { fill: #007884 }
.svg-2 .cls-5 { fill: #d2d7e1 }
.svg-2 .cls-6 { fill: #2a2aea }
.svg-2 .cls-3 { font-size: 14px }
.svg-2 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g>
    <rect class="cls-7" x=".500198364257812" y=".50006103515625" width="946.8662109375" height="447.5498046875" rx="7.500000000000007" ry="7.500000000000007"></rect>
    <path class="cls-5" d="M939.865921020507812,1c3.8597412109375,0,7,3.140228271484375,7,7v432.549873352050781c0,3.859766006469727-3.1402587890625,7-7,7H8c-3.859771728515625,0-7-3.140233993530273-7-7V8c0-3.859771728515625,3.140228271484375-7,7-7h931.865921020507812M939.865921020507812,0H8C3.581764221191406,0,0,3.581756591796875,0,8v432.549873352050781c0,4.418235778808594,3.581764221191406,8,8,8h931.865921020507812c4.4183349609375,0,8-3.581764221191406,8-8V8c0-4.418243408203125-3.5816650390625-8-8-8h0Z"></path>
  </g>
  <g>
    <g>
      <text class="cls-3" transform="translate(757.439041137695312 424.64129638671875)"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect class="cls-6" x="737.188051817740416" y="412.549824622436063" width="16" height="16" rx="2" ry="2"></rect>
    </g>
    <g>
      <text class="cls-3" transform="translate(856.020858764648438 424.64129638671875)"><tspan x="0" y="0">Open source</tspan></text>
      <rect class="cls-4" x="835.769833230593576" y="412.549824622436063" width="16" height="16" rx="1.999999999999986" ry="1.999999999999986"></rect>
    </g>
  </g>
  <g>
    <rect class="cls-4" x="19.999986593745234" y="111.014308285106381" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(88.910202026367188 139.52099609375)"><tspan x="0" y="0">tee</tspan></text>
  </g>
  <g>
    <rect class="cls-6" x="19.999986593745234" y="187.855367768253927" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(44.527084350585938 216.36224365234375)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
  </g>
  <g>
    <rect class="cls-6" x="205.996834359330933" y="111.014308285106381" width="140" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(227.137601852416992 140.689971923828125)"><tspan x="0" y="0">qtivcomposer</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="179.999984741210938" y1="136.014312744140625" x2="199.258987426757812" y2="136.014312744140625"></line>
    <polygon points="198.091812133789062 140.003387451171875 204.999984741210938 136.014312744140625 198.091812133789062 132.025238037109375 198.091812133789062 140.003387451171875"></polygon>
  </g>
  <g>
    <line class="cls-1" x1="345.996841430664062" y1="136.014312744140625" x2="365.255844116210938" y2="136.014312744140625"></line>
    <polygon points="364.088668823242188 140.003387451171875 370.996841430664062 136.014312744140625 364.088668823242188 132.025238037109375 364.088668823242188 140.003387451171875"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="371.263849770733032" y="111.014308285106381" width="119.999999999999091" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(385.740478515625 139.52099609375)"><tspan x="0" y="0">v4l2h264enc</tspan></text>
  </g>
  <g>
    <rect class="cls-4" x="516.797880593539958" y="111.014308285106381" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(539.004959106445312 139.52099609375)"><tspan x="0" y="0">h264parse</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="491.263870239257812" y1="136.014312744140625" x2="510.522842407226562" y2="136.014312744140625"></line>
    <polygon points="509.355667114257812 140.003387451171875 516.263870239257812 136.014312744140625 509.355667114257812 132.025238037109375 509.355667114257812 140.003387451171875"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="662.331911416347793" y="111.014308285106381" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(690.953079223632812 139.52099609375)"><tspan x="0" y="0">mp4mux</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="636.797866821289062" y1="136.014312744140625" x2="656.056900024414062" y2="136.014312744140625"></line>
    <polygon points="654.889724731445312 140.003387451171875 661.797866821289062 136.014312744140625 654.889724731445312 132.025238037109375 654.889724731445312 140.003387451171875"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="807.865942239159267" y="111.014308285106381" width="120.000000000009095" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(842.432510375976562 139.52099609375)"><tspan x="0" y="0">filesink</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="782.331924438476562" y1="136.014312744140625" x2="801.590957641601562" y2="136.014312744140625"></line>
    <polygon points="800.423721313476562 140.003387451171875 807.331924438476562 136.014312744140625 800.423721313476562 132.025238037109375 800.423721313476562 140.003387451171875"></polygon>
  </g>
  <g>
    <line class="cls-1" x1="99.999984741210938" y1="161.47589111328125" x2="99.999984741210938" y2="180.734893798828125"></line>
    <polygon points="96.01092529296875 179.567718505859375 99.999984741210938 186.47589111328125 103.989044189453125 179.567718505859375 96.01092529296875 179.567718505859375"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="19.999986593745234" y="264.790925213334958" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(64.679428100585938 293.297805786132812)"><tspan x="0" y="0">qtimlsnpe</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="99.999984741210938" y1="238.411453247070312" x2="99.999984741210938" y2="257.67047119140625"></line>
    <polygon points="96.01092529296875 256.503265380859375 99.999984741210938 263.411453247070312 103.989044189453125 256.503265380859375 96.01092529296875 256.503265380859375"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="19.999986593745234" y="342.549879533984495" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(37.722396850585938 371.056747436523438)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="99.999984741210938" y1="316.170394897460938" x2="99.999984741210938" y2="335.429412841796875"></line>
    <polygon points="96.01092529296875 334.262222290039062 99.999984741210938 341.170402526855469 103.989044189453125 334.262222290039062 96.01092529296875 334.262222290039062"></polygon>
  </g>
  <g>
    <polyline class="cls-1" points="179.999984741210938 367.549873352050781 275.996963500976562 367.549880981445312 275.996963500976562 167.216888427734375"></polyline>
    <polygon points="279.986038208007812 168.384063720703125 275.996963500976562 161.47589111328125 272.007919311523438 168.384063720703125 279.986038208007812 168.384063720703125"></polygon>
  </g>
  <rect class="cls-4" x="20.000223287972403" y="20.045578671308249" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(78.082275390625 48.552276611328125)"><tspan x="0" y="0">filesrc</tspan></text>
  <g>
    <line class="cls-1" x1="180.000228881835938" y1="45.04559326171875" x2="199.259231567382812" y2="45.04559326171875"></line>
    <polygon points="198.092056274414062 49.034637451171875 205.000228881835938 45.04559326171875 198.092056274414062 41.0565185546875 198.092056274414062 49.034637451171875"></polygon>
  </g>
  <rect class="cls-4" x="205.000223287972403" y="20.045578671308249" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(253.70338249206543 48.552276611328125)"><tspan x="0" y="0">qtdemux</tspan></text>
  <g>
    <line class="cls-1" x1="365.000228881835938" y1="45.04559326171875" x2="384.259231567382812" y2="45.04559326171875"></line>
    <polygon points="383.092056274414062 49.034637451171875 390.000228881835938 45.04559326171875 383.092056274414062 41.0565185546875 383.092056274414062 49.034637451171875"></polygon>
  </g>
  <rect class="cls-4" x="390.000223287972403" y="20.045578671308249" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(432.207244873046875 48.552276611328125)"><tspan x="0" y="0">h264parse</tspan></text>
  <g>
    <line class="cls-1" x1="550.000198364257812" y1="45.04559326171875" x2="569.259231567382812" y2="45.04559326171875"></line>
    <polygon points="568.092056274414062 49.034637451171875 575.000198364257812 45.04559326171875 568.092056274414062 41.0565185546875 568.092056274414062 49.034637451171875"></polygon>
  </g>
  <g>
    <polyline class="cls-1" points="99.999984741210938 105.2733154296875 99.999984741210938 90.552734375 655.000198364257812 90.552734375 655.000198364257812 70.091156005859375"></polyline>
    <polygon points="103.989044189453125 104.10614013671875 99.999984741210938 111.014312744140625 96.01092529296875 104.10614013671875 103.989044189453125 104.10614013671875"></polygon>
  </g>
  <rect class="cls-4" x="575.000223287972403" y="20.045578671308249" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(609.379165649414062 48.552276611328125)"><tspan x="0" y="0">v4l2h264dec</tspan></text>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70023-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wqx_ntn_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_xqx_ntn_vbc"><br>                                            <li class="li">One stream is sent to qtimetamux plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_yqx_ntn_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ul_zqx_ntn_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_arx_ntn_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_brx_ntn_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model.</li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_crx_ntn_vbc"><br>                                            <li class="li">Loads the YOLOv8 submodule. </li><br><br>                                            <li class="li">Produces video frames with only bounding boxes that<br>                                                can be overlaid on objects.</li><br><br>                                            <li class="li">Sends them to sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70023-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wgh_rtn_vbc"><br>                                    <li class="li">Receives the original video stream and video stream with<br>                                        bounding boxes on its sinkpads</li><br><br>                                    <li class="li">On its sourcepads, produces content that's composed of the<br>                                        video streams processed from its sinkpads.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70023-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_frx_ntn_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream its<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

**Parent Topic:** [Qualcomm Neural Processing SDK use cases](https://docs.qualcomm.com/doc/80-70023-50/topic/qualcomm-neural-processing-sdk-use-cases.html)

Last Published: Mar 27, 2026

[Previous Topic
Object detection and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.md) [Next Topic
Image segmentation and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/single-camera-stream-with-image-segmentation-and-display-with-deeplabv3-quantized.md)