# Image classification and encode with Neural Processing SDK

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.html)

The use cases implement the InceptionV3 image classification model with Qualcomm
        Neural Processing SDK to classify scenes from a single camera stream and either overlay or
        compose the classification labels. The streams are then encoded.

You can use any publicly available classification model with LiteRT and convert it to
                `.dlc` format. For instructions, see [TensorFlow Model Conversion](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/model_conv_tensorflow.html).

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

## Use qtivoverlay plugin to apply classification overlay

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! video/x-raw,format=NV12_Q08C,width=1280,height=720,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/inceptionv3.dlc ! queue ! qtimlpostprocess \
    settings="{\"confidence\": 40.0}" results=2 module=mobilenet-softmax labels=/etc/labels/classification.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Classify scenes from a video stream coming through a camera source.
2. Overlay the classification labels using overlaylib.
3. Encode this stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and store it as an MP4 file.

Figure : Pipeline for classification overlay and encode
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1179.272262573244006" height="357.535699844360352" viewbox="0 0 1179.272262573244006 357.535699844360352">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".499923706054688" y=".500055313110352" width="1178.27197265625" height="356.53515625" rx="7.499999999999885" ry="7.499999999999885" style="fill: #fafafa;"></rect>
      <path d="M1171.272262573244006,1c3.85986328125,0,7,3.140153884887695,7,7v341.535699844360352c0,3.859832763671875-3.14013671875,7-7,7H8c-3.859846115112305,0-7-3.140167236328125-7-7V8c0-3.859846115112305,3.140153884887695-7,7-7h1163.272262573244006M1171.272262573244006,0H8C3.581692695617676,0,0,3.581846237182617,0,8v341.535699844360352c0,4.41815185546875,3.581692695617676,8,8,8h1163.272262573244006c4.418334960941138,0,8-3.58184814453125,8-8V8c0-4.418153762817383-3.581665039058862-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(978.444961547851562 333.627191543579102)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="958.193951702738559" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1077.026748657226562 333.627191543579102)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1056.77573311559172" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000030883573345" y="20.00006906611361" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
        <text transform="translate(53.429763793945312 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068092529188107" y="20.00006906611361" width="120" height="50" rx="3.999999999999998" ry="3.999999999999998" style="fill: #007884;"></rect>
        <text transform="translate(214.978309631347656 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068092529188107" y="96.841128549261157" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595199584960938 125.348001480102539)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.5340576171875" y1="45.000070571899414" x2="160.510696411132812" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489410400390625 48.490503311157227 165.5340576171875 45.000070571899414 159.489410400390625 41.509637832641602 159.489410400390625 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="311.602123351993214" y="20.00006906611361" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
        <text transform="translate(328.89906120300293 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
      </g>
      <g>
        <line x1="286.068099975585938" y1="45.000070571899414" x2="306.044723510743097" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023452758789062 48.490503311157227 311.068099975585938 45.000070571899414 305.023452758789062 41.509637832641602 305.023452758789062 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="457.13615417480105" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(482.573715209960938 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
      </g>
      <g>
        <line x1="431.602127075195312" y1="45.000070571899414" x2="451.578750610351562" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="450.557479858398438 48.490503311157227 456.602127075195312 45.000070571899414 450.557479858398438 41.509637832641602 450.557479858398438 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="602.670184997608885" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(617.146804809570312 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <line x1="577.136154174804688" y1="45.000070571899414" x2="597.112777709960938" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="596.091476440429688 48.490503311157227 602.136154174804688 45.000070571899414 596.091476440429688 41.509637832641602 596.091476440429688 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="748.204215820416721" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(770.411300659179688 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="722.670211791992188" y1="45.000070571899414" x2="742.646835327148438" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="741.625534057617188 48.490503311157227 747.670211791992188 45.000070571899414 741.625534057617188 41.509637832641602 741.625534057617188 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="893.738246643224556" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(922.359420776367188 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="868.204208374023438" y1="45.000070571899414" x2="888.180831909179688" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="887.159591674804688 48.490503311157227 893.204208374023438 45.000070571899414 887.159591674804688 41.509637832641602 887.159591674804688 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="1039.272277466034211" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(1073.838790893554688 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="1013.738204956054688" y1="45.000070571899414" x2="1033.714889526365369" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1032.693649291994006 48.490503311157227 1038.738204956054688 45.000070571899414 1032.693649291994006 41.509637832641602 1032.693649291994006 48.490503311157227"></polygon>
      </g>
      <g>
        <line x1="226.068099975585938" y1="70.461648941040039" x2="226.068099975585938" y2="90.438287734985352" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 89.416994094848633 226.068099975585938 95.461648941040039 229.558517456054688 89.416994094848633 222.577667236328125 89.416994094848633"></polygon>
      </g>
      <g>
        <rect x="146.068092529188107" y="173.776685994341278" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(190.747543334960938 202.283563613891602)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlsnpe</tspan></text>
      </g>
      <g>
        <line x1="226.068099975585938" y1="147.397211074829102" x2="226.068099975585938" y2="167.373849868774414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 166.352548599243164 226.068099975585938 172.397211074829102 229.558517456054688 166.352548599243164 222.577667236328125 166.352548599243164"></polygon>
      </g>
      <g>
        <rect x="146.068092529188107" y="251.535640314990815" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(163.790512084960938 280.042520523071289)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="226.068099975585938" y1="225.156167984008789" x2="226.068099975585938" y2="245.132806777954102" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577667236328125 244.111505508423761 226.068099975585938 250.156152725219727 229.558517456054688 244.111505508423761 222.577667236328125 244.111505508423761"></polygon>
      </g>
      <g>
        <polyline points="306.068099975585938 276.535638809204102 371.602127075195312 276.535638809204102 371.602127075195312 75.485010147094727" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="375.092544555664062 76.506303787231445 371.602127075195312 70.461648941040039 368.111679077148438 76.506303787231445 375.092544555664062 76.506303787231445"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| Source | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_g41_cf5_vbc"><br>                                    <li class="li">The video stream is collected from a camera source plugin<br>                                        and two copies are created:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_kh3_dyv_r1c"><br>                                            <li class="li">One stream is sent to the qtimetamux plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The classification model uses this tensor stream<br>                                            for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_bwn_s5l_vbc"><br>                                    <li class="li">Loads the model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_gr1_w5l_vbc"><br>                                    <li class="li">Receives the inference tensors from a classification model<br>                                        on its sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later. </li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the classification<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_rrb_1xl_vbc"><br>                                            <li class="li">Loads the submodule of the model.</li><br><br>                                            <li class="li">Produces results as video frames with classification<br>                                                labels.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives the video and text streams with the classification<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with the contents of the video stream<br>                                        on its sink pad.</li><br><br>                                    <li class="li">Adds classification result from the data sinkpad to GST<br>                                        buffer meta (meta muxing) on its source pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the classification labels on the VideoFrame using<br>                                        CL.</li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_h41_cf5_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Use qtivcomposer to mix original frame with classification mask

Run the use case on the target
                device:

    gst-launch-1.0 -e --gst-debug=2 \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer sink_1::position="<30, 30>" sink_1::dimensions="<320, 180>" ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/inceptionv3.dlc ! queue ! qtimlpostprocess settings="{\"confidence\": 40.0}" \
    results=2 module=mobilenet-softmax labels=/etc/labels/classification.json ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:
- Classify scenes from a video stream coming through a camera source.
- Compose classification labels and video stream together using qtivcomposer.
- Encode this stream as an H.264 bitstream.
- Multiplex the stream in an MP4 container and store it as an MP4 file.

Figure : Pipeline for classification and encode with qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1053.934104919439051" height="357.535463333129883" viewbox="0 0 1053.934104919439051 357.535463333129883">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500267028808594" y=".499940872192383" width="1052.93359375" height="356.53515625" rx="7.499999999999957" ry="7.499999999999957" style="fill: #fafafa;"></rect>
      <path d="M1045.934104919439051,1c3.85986328125,0,7,3.140132904052734,7,7v341.535463333129883c0,3.85986328125-3.14013671875,7-7,7H8c-3.859870910644531,0-7-3.14013671875-7-7V8c0-3.859867095947266,3.140129089355469-7,7-7h1037.934104919439051M1045.934104919439051,0H8C3.581733703613281,0,0,3.581731796264648,0,8v341.535463333129883c0,4.41827392578125,3.581733703613281,8,8,8h1037.934104919439051c4.418212890619543,0,8-3.58172607421875,8-8V8c0-4.418268203735352-3.581787109380457-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(863.507194519042969 333.626955032348633)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="843.256182127905049" y="321.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(962.088981628417969 333.626955032348633)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="941.83796354075821" y="321.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000055258294196" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(53.429786682128906 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068116903908958" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(214.97833251953125 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068116903908958" y="96.841006478947747" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595218658447266 125.347871780395508)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.534080505372003" y1="44.999948501586914" x2="160.510719299316406" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489433288574219 48.490381240844727 165.534080505372003 44.999948501586914 159.489433288574219 41.509515762329102 159.489433288574219 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="312.064964669495566" y="19.999946995800201" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(333.205726623535156 49.675605297088623)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="286.068107604980469" y1="44.999948501586914" x2="306.044761657714844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023460388183594 48.490381240844727 311.068107604980469 44.999948501586914 305.023460388183594 41.509515762329102 305.023460388183594 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="452.064964294433594" y1="44.999948501586914" x2="472.041587829589844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="471.020286560058594 48.490381240844727 477.064964294433594 44.999948501586914 471.020286560058594 41.509515762329102 471.020286560058594 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="477.331980080897665" y="19.999946995800201" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(491.808616638183594 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <rect x="622.866010903704591" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(645.073081970214844 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="597.331993103027344" y1="44.999948501586914" x2="617.308616638183594" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="616.287315368652344 48.490381240844727 622.331993103027344 44.999948501586914 616.287315368652344 41.509515762329102 616.287315368652344 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="768.400041726512427" y="19.999946995800201" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(797.021202087402344 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="742.865989685058594" y1="44.999948501586914" x2="762.842674255371094" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="761.821372985839844 48.490381240844727 767.865989685058594 44.999948501586914 761.821372985839844 41.509515762329102 761.821372985839844 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="913.934072549323901" y="19.999946995800201" width="120.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(948.500633239746094 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="888.400047302246094" y1="44.999948501586914" x2="908.376670837402344" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.355369567871094 48.490381240844727 913.400047302246094 44.999948501586914 907.355369567871094 41.509515762329102 907.355369567871094 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="226.068107604980469" y1="70.461526870727539" x2="226.068107604980469" y2="90.438165664671942" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 89.416872024536133 226.068107604980469 95.461526870727539 229.558555603027344 89.416872024536133 222.577690124511719 89.416872024536133"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="173.776563924028778" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(190.747562408447266 202.283449172973633)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlsnpe</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="147.39708137512207" x2="226.068107604980469" y2="167.373720169067383" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 166.352434158325195 226.068107604980469 172.39708137512207 229.558555603027344 166.352434158325195 222.577690124511719 166.352434158325195"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="251.535518244678315" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(163.790531158447266 280.042390823364258)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="225.156038284301758" x2="226.068107604980469" y2="245.132692337036133" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 244.111391067505792 226.068107604980469 250.156038284301758 229.558555603027344 244.111391067505792 222.577690124511719 244.111391067505792"></polygon>
      </g>
      <g>
        <polyline points="306.068107604980469 276.535524368286133 371.602134704589844 276.535524368286133 371.602134704589844 75.484888076782227" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="375.092582702636719 76.506181716918945 371.602134704589844 70.461526870727539 368.111717224122003 76.506181716918945 375.092582702636719 76.506181716918945"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| Source | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_acs_jk5_vbc"><br>                                    <li class="li">The video stream is collected from a camera source plugin<br>                                        and two copies are created:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ul_bcs_jk5_vbc"><br>                                            <li class="li">One stream is sent to the qtivcomposer plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_ccs_jk5_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ul_dcs_jk5_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The classification model uses this tensor stream<br>                                            for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_ecs_jk5_vbc"><br>                                    <li class="li">Loads the model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_fcs_jk5_vbc"><br>                                    <li class="li">Receives the inference tensors from a classification model<br>                                        on its sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later. </li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the classification<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_gcs_jk5_vbc"><br>                                            <li class="li">Loads the submodule of the model.</li><br><br>                                            <li class="li">Produces results as video frames with classification<br>                                                labels.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_gv1_wjk_5bc"><br>                                    <li class="li">Receives original video stream and video stream with<br>                                        classification results on its sinkpads. </li><br><br>                                    <li class="li">On its sourcepad, produces GST buffers with the contents<br>                                        composed of video streams from its sinkpads.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_jcs_jk5_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

**Parent Topic:** [Qualcomm Neural Processing SDK use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/qualcomm-neural-processing-sdk-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Image classification and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-classification-and-display-with-mobilenet-v1.md) [Next Topic
Object detection and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.md)