# Object detection and encode with Neural Processing SDK 

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd.html)

The use cases implement a yolox.dlc object detection model with
        Qualcomm Neural Processing SDK to identify an object from a camera stream. The use case is
        to overlay or compose the bounding boxes over the detected objects, and then encode the
        stream as a H.264 bitstream.

Download [YOLOX](https://aihub.qualcomm.com/iot/models/yolox?searchTerm=yolox%29) Qualcomm AI runtime w8a8 precision model
            from AI hub. The YOLOX model uses the YOLOv8 postprocessing module.

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

## Use qtivoverlay plugin to apply detection overlay

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/yolox-yolo-x-w8a8.dlc layers="</Mul_5, /Concat_15, /Cast_1>" ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 70.0}" results=5 module=yolov8 labels=/etc/labels/yolox.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify object scenes from a video stream, which is coming through a camera
                    source.
2. Overlay bounding boxes over the detected objects using overlaylib.
3. Encode this stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and stored as an MP4 file.

Figure : Pipeline for bounding box overlay and encode
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1179.272262573244006" height="357.535699844360352" viewbox="0 0 1179.272262573244006 357.535699844360352">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".499923706054688" y=".500055313110352" width="1178.27197265625" height="356.53515625" rx="7.499999999999885" ry="7.499999999999885" style="fill: #fafafa;"></rect>
      <path d="M1171.272262573244006,1c3.85986328125,0,7,3.140153884887695,7,7v341.535699844360352c0,3.859832763671875-3.14013671875,7-7,7H8c-3.859846115112305,0-7-3.140167236328125-7-7V8c0-3.859846115112305,3.140153884887695-7,7-7h1163.272262573244006M1171.272262573244006,0H8C3.581692695617676,0,0,3.581846237182617,0,8v341.535699844360352c0,4.41815185546875,3.581692695617676,8,8,8h1163.272262573244006c4.418334960941138,0,8-3.58184814453125,8-8V8c0-4.418153762817383-3.581665039058862-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(987.072891235351562 333.627191543579102)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="966.821861664928292" y="321.535699844360352" width="16" height="16" rx="2.000000000000014" ry="2.000000000000014" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1085.654678344726562 333.627191543579102)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1065.403643077781453" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000030883573345" y="20.00006906611361" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
        <text transform="translate(53.429763793945312 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068092529188107" y="20.00006906611361" width="120" height="50" rx="3.999999999999998" ry="3.999999999999998" style="fill: #007884;"></rect>
        <text transform="translate(214.978309631347656 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068092529188107" y="96.841128549261157" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595199584960938 125.348001480102539)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.5340576171875" y1="45.000070571899414" x2="160.510696411132812" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489410400390625 48.490503311157227 165.5340576171875 45.000070571899414 159.489410400390625 41.509637832641602 159.489410400390625 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="311.602123351993214" y="20.00006906611361" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
        <text transform="translate(328.89906120300293 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
      </g>
      <g>
        <line x1="286.068099975585938" y1="45.000070571899414" x2="306.044723510743097" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023452758789062 48.490503311157227 311.068099975585938 45.000070571899414 305.023452758789062 41.509637832641602 305.023452758789062 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="457.13615417480105" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(482.573715209960938 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
      </g>
      <g>
        <line x1="431.602127075195312" y1="45.000070571899414" x2="451.578750610351562" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="450.557479858398438 48.490503311157227 456.602127075195312 45.000070571899414 450.557479858398438 41.509637832641602 450.557479858398438 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="602.670184997608885" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(617.146804809570312 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <line x1="577.136154174804688" y1="45.000070571899414" x2="597.112777709960938" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="596.091476440429688 48.490503311157227 602.136154174804688 45.000070571899414 596.091476440429688 41.509637832641602 596.091476440429688 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="748.204215820416721" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(770.411300659179688 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="722.670211791992188" y1="45.000070571899414" x2="742.646835327148438" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="741.625534057617188 48.490503311157227 747.670211791992188 45.000070571899414 741.625534057617188 41.509637832641602 741.625534057617188 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="893.738246643224556" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(922.359420776367188 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="868.204208374023438" y1="45.000070571899414" x2="888.180831909179688" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="887.159591674804688 48.490503311157227 893.204208374023438 45.000070571899414 887.159591674804688 41.509637832641602 887.159591674804688 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="1039.272277466034211" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(1073.838790893554688 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="1013.738204956054688" y1="45.000070571899414" x2="1033.714889526365369" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1032.693649291994006 48.490503311157227 1038.738204956054688 45.000070571899414 1032.693649291994006 41.509637832641602 1032.693649291994006 48.490503311157227"></polygon>
      </g>
      <g>
        <line x1="226.068099975585938" y1="70.461648941040039" x2="226.068099975585938" y2="90.438287734985352" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 89.416994094848633 226.068099975585938 95.461648941040039 229.558517456054688 89.416994094848633 222.577667236328125 89.416994094848633"></polygon>
      </g>
      <g>
        <rect x="146.068092529188107" y="173.776685994341278" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(190.747543334960938 202.283563613891602)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlsnpe</tspan></text>
      </g>
      <g>
        <line x1="226.068099975585938" y1="147.397211074829102" x2="226.068099975585938" y2="167.373849868774414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 166.352548599243164 226.068099975585938 172.397211074829102 229.558517456054688 166.352548599243164 222.577667236328125 166.352548599243164"></polygon>
      </g>
      <g>
        <rect x="146.068092529188107" y="251.535640314990815" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(163.790512084960938 280.042520523071289)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="226.068099975585938" y1="225.156167984008789" x2="226.068099975585938" y2="245.132806777954102" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 244.111505508423761 226.068099975585938 250.156152725219727 229.558517456054688 244.111505508423761 222.577667236328125 244.111505508423761"></polygon>
      </g>
      <g>
        <polyline points="306.068099975585938 276.535638809204102 371.602127075195312 276.535638809204102 371.602127075195312 75.485010147094727" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="375.092544555664062 76.506303787231445 371.602127075195312 70.461648941040039 368.111679077148438 76.506303787231445 375.092544555664062 76.506303787231445"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_m2f_zgm_vbc"><br>                                            <li class="li">One stream is sent to qtimetamux plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_ufn_2lm_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_ky5_grn_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_jcd_wnk_5bc"><br>                                            <li class="li">Loads the YOLOv8 submodule. </li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives video stream and text stream with the bounding box<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with the contents of the video stream<br>                                        from its sink pad.</li><br><br>                                    <li class="li">Adds the bounding boxes as<br>                                            <code class="ph codeph">GstVideoRegionOfInterest</code> from data<br>                                        sinkpad to GST buffers meta (meta muxing) on its source<br>                                        pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the bounding boxes on the VideoFrame using CL. </li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wsc_bsn_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Use qtivcomposer to mix original frame with detection mask

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/yolox-yolo-x-w8a8.dlc layers="</Mul_5, /Concat_15, /Cast_1>" ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 70.0}" results=5 module=yolov8 labels=/etc/labels/yolox.json ! video/x-raw,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify object scenes from a video stream, which is coming through a camera
                    source.
2. Compose bounding boxes over objects detected and original video stream using
                    qtivcomposer.
3. Encode this stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and stored as an MP4 file.

Figure : Pipeline for bounding box mask and encode with qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1053.934104919439051" height="357.535463333129883" viewbox="0 0 1053.934104919439051 357.535463333129883">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500267028808594" y=".499940872192383" width="1052.93359375" height="356.53515625" rx="7.499999999999957" ry="7.499999999999957" style="fill: #fafafa;"></rect>
      <path d="M1045.934104919439051,1c3.85986328125,0,7,3.140132904052734,7,7v341.535463333129883c0,3.85986328125-3.14013671875,7-7,7H8c-3.859870910644531,0-7-3.14013671875-7-7V8c0-3.859867095947266,3.140129089355469-7,7-7h1037.934104919439051M1045.934104919439051,0H8C3.581733703613281,0,0,3.581731796264648,0,8v341.535463333129883c0,4.41827392578125,3.581733703613281,8,8,8h1037.934104919439051c4.418212890619543,0,8-3.58172607421875,8-8V8c0-4.418268203735352-3.581787109380457-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(856.862510681152344 333.627168655395508)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="836.611544611447243" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(955.444297790527344 333.627168655395508)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="935.193326024300404" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000055258294196" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(53.429786682128906 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068116903908958" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(214.97833251953125 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068116903908958" y="96.841006478947747" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595218658447266 125.347871780395508)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.534080505372003" y1="44.999948501586914" x2="160.510719299316406" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489433288574219 48.490381240844727 165.534080505372003 44.999948501586914 159.489433288574219 41.509515762329102 159.489433288574219 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="312.064964669495566" y="19.999946995800201" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(333.205726623535156 49.675605297088623)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="286.068107604980469" y1="44.999948501586914" x2="306.044761657714844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023460388183594 48.490381240844727 311.068107604980469 44.999948501586914 305.023460388183594 41.509515762329102 305.023460388183594 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="452.064964294433594" y1="44.999948501586914" x2="472.041587829589844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="471.020286560058594 48.490381240844727 477.064964294433594 44.999948501586914 471.020286560058594 41.509515762329102 471.020286560058594 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="477.331980080897665" y="19.999946995800201" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(491.808616638183594 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <rect x="622.866010903704591" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(645.073081970214844 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="597.331993103027344" y1="44.999948501586914" x2="617.308616638183594" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="616.287315368652344 48.490381240844727 622.331993103027344 44.999948501586914 616.287315368652344 41.509515762329102 616.287315368652344 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="768.400041726512427" y="19.999946995800201" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(797.021202087402344 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="742.865989685058594" y1="44.999948501586914" x2="762.842674255371094" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="761.821372985839844 48.490381240844727 767.865989685058594 44.999948501586914 761.821372985839844 41.509515762329102 761.821372985839844 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="913.934072549323901" y="19.999946995800201" width="120.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(948.500633239746094 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="888.400047302246094" y1="44.999948501586914" x2="908.376670837402344" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.355369567871094 48.490381240844727 913.400047302246094 44.999948501586914 907.355369567871094 41.509515762329102 907.355369567871094 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="226.068107604980469" y1="70.461526870727539" x2="226.068107604980469" y2="90.438165664671942" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 89.416872024536133 226.068107604980469 95.461526870727539 229.558555603027344 89.416872024536133 222.577690124511719 89.416872024536133"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="173.776563924028778" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(190.747562408447266 202.283449172973633)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlsnpe</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="147.39708137512207" x2="226.068107604980469" y2="167.373720169067383" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 166.352434158325195 226.068107604980469 172.39708137512207 229.558555603027344 166.352434158325195 222.577690124511719 166.352434158325195"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="251.535518244678315" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.817874908447266 280.042390823364258)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvdetection</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="225.156038284301758" x2="226.068107604980469" y2="245.132692337036133" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 244.111391067505792 226.068107604980469 250.156038284301758 229.558555603027344 244.111391067505792 222.577690124511719 244.111391067505792"></polygon>
      </g>
      <g>
        <polyline points="306.068107604980469 276.535524368286133 382.064964294433594 276.535524368286133 382.064964294433594 75.484888076782227" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="385.555381774902344 76.506181716918945 382.064964294433594 70.461526870727539 378.574546813964844 76.506181716918945 385.555381774902344 76.506181716918945"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wqx_ntn_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_xqx_ntn_vbc"><br>                                            <li class="li">One stream is sent to qtimetamux plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_yqx_ntn_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ul_zqx_ntn_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_arx_ntn_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_brx_ntn_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model.</li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_crx_ntn_vbc"><br>                                            <li class="li">Loads the YOLOv8 submodule. </li><br><br>                                            <li class="li">Produces video frames with only bounding boxes that<br>                                                can be overlaid on objects.</li><br><br>                                            <li class="li">Sends them to sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_wgh_rtn_vbc"><br>                                    <li class="li">Receives the original video stream and video stream with<br>                                        bounding boxes on its sinkpads</li><br><br>                                    <li class="li">On its sourcepads, produces content that's composed of the<br>                                        video streams processed from its sinkpads.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd__ol_frx_ntn_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream its<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Known issue

The current model in AI hub isn't giving the expected output. The issue will be fixed
                in a future release.

**Parent Topic:** [Qualcomm Neural Processing SDK use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/qualcomm-neural-processing-sdk-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Object detection and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.md) [Next Topic
Image segmentation and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-segmentation-and-display-with-deeplabv3-quantized.md)