# Image classification and encode with Neural Processing SDK

Source: [https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.html](https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.html)

The use cases implement the InceptionV3 image classification model with Qualcomm
        Neural Processing SDK to classify scenes from a single camera stream and either overlay or
        compose the classification labels. The streams are then encoded.

You can use any publicly available classification model with LiteRT and convert it to
                `.dlc` format. For instructions, see [TensorFlow Model Conversion](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/model_conv_tensorflow.html).

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

## Use qtivoverlay plugin to apply classification overlay

Run the use case on the target
                device:

    gst-launch-1.0 -e \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! video/x-raw,format=NV12_Q08C,width=1280,height=720,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/inceptionv3.dlc ! queue ! qtimlpostprocess \
    settings="{\"confidence\": 40.0}" results=2 module=mobilenet-softmax labels=/etc/labels/classification.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Classify scenes from a video stream coming through a camera source.
2. Overlay the classification labels using overlaylib.
3. Encode this stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and store it as an MP4 file.

Figure : Pipeline for classification overlay and encode
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" width="1073.568058013917835" height="448.22801399230957" viewbox="0 0 1073.568058013917835 448.22801399230957">
  <defs>
    <style>.svg-1 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-1 .cls-2 { fill: #fff; font-size: 16px }
.svg-1 .cls-2,.svg-1 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-1 .cls-4 { fill: #007884 }
.svg-1 .cls-5 { fill: #d2d7e1 }
.svg-1 .cls-6 { fill: #2a2aea }
.svg-1 .cls-3 { font-size: 14px }
.svg-1 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g>
    <rect class="cls-7" x=".499820709228516" y=".500244140625" width="1072.568359375" height="447.2275390625" rx="7.500000000000005" ry="7.500000000000005"></rect>
    <path class="cls-5" d="M1065.568058013917835,1c3.85986328125,0,7,3.14013671875,7,7v432.228012084960938c0,3.859867095947266-3.14013671875,7-7,7H8.000003814697266c-3.859870910644531,0-7-3.140132904052734-7-7V8c0-3.85986328125,3.140129089355469-7,7-7h1057.568054199220569M1065.568058013917835,0H8.000003814697266C3.581737518310547,0,0,3.581756591796875,0,8v432.228012084960938c0,4.418270111083984,3.581737518310547,8.000001907348633,8.000003814697266,8.000001907348633h1057.568054199220569c4.418212890621362,0,8-3.581731796264648,8-8.000001907348633V8c0-4.418243408203125-3.581787109378638-8-8-8h0Z"></path>
  </g>
  <g>
    <g>
      <text class="cls-3" transform="translate(881.060062408447266 424.31951904296875)"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect class="cls-6" x="860.809068687771287" y="412.228057189924584" width="16" height="16" rx="1.999999999999986" ry="1.999999999999986"></rect>
    </g>
    <g>
      <text class="cls-3" transform="translate(979.641849517822266 424.31951904296875)"><tspan x="0" y="0">Open source</tspan></text>
      <rect class="cls-4" x="959.390850100628086" y="412.228057189924584" width="16" height="16" rx="2" ry="2"></rect>
    </g>
  </g>
  <g>
    <rect class="cls-4" x="19.99996957391977" y="110.692426411677843" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(88.910190582275391 139.199127197265625)"><tspan x="0" y="0">tee</tspan></text>
  </g>
  <g>
    <rect class="cls-6" x="19.99996957391977" y="187.53348589482539" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(44.527072906494141 216.04034423828125)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
  </g>
  <g>
    <rect class="cls-6" x="205.534000396724878" y="110.692426411677843" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(222.830936431884766 139.199127197265625)"><tspan x="0" y="0">qtimetamux</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="179.999973297119141" y1="135.692413330078125" x2="199.258975982666016" y2="135.692413330078125"></line>
    <polygon points="198.091800689697266 139.681488037109375 204.999973297119141 135.692413330078125 198.091800689697266 131.703369140625 198.091800689697266 139.681488037109375"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="351.068031219532713" y="110.692426411677843" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995"></rect>
    <text class="cls-2" transform="translate(372.630588531494141 139.199127197265625)"><tspan x="0" y="0">qtivoverlay</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="325.534000396728516" y1="135.692413330078125" x2="344.793003082275391" y2="135.692413330078125"></line>
    <polygon points="343.625827789306641 139.681488037109375 350.534000396728516 135.692413330078125 343.625827789306641 131.703369140625 343.625827789306641 139.681488037109375"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="496.602062042340549" y="110.692426411677843" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(511.078678131103516 139.199127197265625)"><tspan x="0" y="0">v4l2h264enc</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="471.068058013916016" y1="135.692413330078125" x2="490.327030181884766" y2="135.692413330078125"></line>
    <polygon points="489.159854888916016 139.681488037109375 496.068058013916016 135.692413330078125 489.159854888916016 131.703369140625 489.159854888916016 139.681488037109375"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="642.136092865148385" y="110.692426411677843" width="120" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(664.343173980712891 139.199127197265625)"><tspan x="0" y="0">h264parse</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="616.602054595947266" y1="135.692413330078125" x2="635.861087799072266" y2="135.692413330078125"></line>
    <polygon points="634.693912506103516 139.681488037109375 641.602054595947266 135.692413330078125 634.693912506103516 131.703369140625 634.693912506103516 139.681488037109375"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="787.67012368795622" y="110.692426411677843" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991"></rect>
    <text class="cls-2" transform="translate(816.291324615478516 139.199127197265625)"><tspan x="0" y="0">mp4mux</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="762.136112213134766" y1="135.692413330078125" x2="781.395084381103516" y2="135.692413330078125"></line>
    <polygon points="780.227909088134766 139.681488037109375 787.136112213134766 135.692413330078125 780.227909088134766 131.703369140625 780.227909088134766 139.681488037109375"></polygon>
  </g>
  <g>
    <rect class="cls-4" x="933.204154510765875" y="110.692426411677843" width="120.000000000001819" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(967.770694732666016 139.199127197265625)"><tspan x="0" y="0">filesink</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="907.670108795166016" y1="135.692413330078125" x2="926.929141998291016" y2="135.692413330078125"></line>
    <polygon points="925.761966705322266 139.681488037109375 932.670108795166016 135.692413330078125 925.761966705322266 131.703369140625 925.761966705322266 139.681488037109375"></polygon>
  </g>
  <g>
    <line class="cls-1" x1="99.999973297119141" y1="161.154022216796875" x2="99.999973297119141" y2="180.41302490234375"></line>
    <polygon points="96.010898590087891 179.245819091796875 99.999973297119141 186.154022216796875 103.989032745361328 179.245819091796875 96.010898590087891 179.245819091796875"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="19.99996957391977" y="264.469043339904601" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(64.679523468017578 294.1448974609375)"><tspan x="0" y="0">qtimlsnpe</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="99.999973297119141" y1="238.089569091796875" x2="99.999973297119141" y2="257.348587036132812"></line>
    <polygon points="96.010898590087891 256.181396484375 99.999973297119141 263.089569091796875 103.989032745361328 256.181396484375 96.010898590087891 256.181396484375"></polygon>
  </g>
  <g>
    <rect class="cls-6" x="19.99996957391977" y="342.227997660555047" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(37.722385406494141 370.73486328125)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
  </g>
  <g>
    <line class="cls-1" x1="99.999973297119141" y1="315.848526000976562" x2="99.999973297119141" y2="335.1075439453125"></line>
    <polygon points="96.010898590087891 333.940338134765625 99.999973297119141 340.848526000976562 103.989032745361328 333.940338134765625 96.010898590087891 333.940338134765625"></polygon>
  </g>
  <g>
    <polyline class="cls-1" points="179.999973297119141 367.227996826171875 265.534000396728516 367.228004455566406 265.534000396728516 166.894989013671875"></polyline>
    <polygon points="269.523075103759766 168.06219482421875 265.534000396728516 161.154022216796875 261.544925689697266 168.06219482421875 269.523075103759766 168.06219482421875"></polygon>
  </g>
  <rect class="cls-4" x="19.99996957391977" y="19.769261766033196" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(78.082035064697266 48.275970458984375)"><tspan x="0" y="0">filesrc</tspan></text>
  <g>
    <line class="cls-1" x1="179.999973297119141" y1="44.769256591796875" x2="199.258975982666925" y2="44.769256591796875"></line>
    <polygon points="198.091800689697266 48.758331298828125 204.999973297119141 44.769256591796875 198.091800689697266 40.78021240234375 198.091800689697266 48.758331298828125"></polygon>
  </g>
  <rect class="cls-4" x="204.99996957391977" y="19.769261766033196" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(253.703126907348633 48.275970458984375)"><tspan x="0" y="0">qtdemux</tspan></text>
  <g>
    <line class="cls-1" x1="364.999973297119141" y1="44.769256591796875" x2="384.258975982666925" y2="44.769256591796875"></line>
    <polygon points="383.091800689697266 48.758331298828125 389.999973297119141 44.769256591796875 383.091800689697266 40.78021240234375 383.091800689697266 48.758331298828125"></polygon>
  </g>
  <rect class="cls-4" x="389.99996957391977" y="19.769261766033196" width="160.000000000000909" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(432.207019805908203 48.275970458984375)"><tspan x="0" y="0">h264parse</tspan></text>
  <g>
    <line class="cls-1" x1="549.999942779541016" y1="44.769256591796875" x2="569.258975982666016" y2="44.769256591796875"></line>
    <polygon points="568.091800689697266 48.758331298828125 574.999942779541016 44.769256591796875 568.091800689697266 40.78021240234375 568.091800689697266 48.758331298828125"></polygon>
  </g>
  <g>
    <polyline class="cls-1" points="99.999729156494141 104.951446533203125 99.999744415283203 90.461639404296875 655.000003814697266 90.461547851561591 655.000003814697266 70.230743408203125"></polyline>
    <polygon points="103.988803863525391 103.78424072265625 99.999729156494141 110.692413330078125 96.010669708251953 103.78424072265625 103.988803863525391 103.78424072265625"></polygon>
  </g>
  <rect class="cls-4" x="574.99996957392068" y="19.769261766033196" width="160" height="50" rx="4" ry="4"></rect>
  <text class="cls-2" transform="translate(609.378910064697266 48.275970458984375)"><tspan x="0" y="0">v4l2h264dec</tspan></text>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| Source | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_g41_cf5_vbc"><br>                                    <li class="li">The video stream is collected from a camera source plugin<br>                                        and two copies are created:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_kh3_dyv_r1c"><br>                                            <li class="li">One stream is sent to the qtimetamux plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The classification model uses this tensor stream<br>                                            for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_bwn_s5l_vbc"><br>                                    <li class="li">Loads the model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_gr1_w5l_vbc"><br>                                    <li class="li">Receives the inference tensors from a classification model<br>                                        on its sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later. </li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the classification<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_rrb_1xl_vbc"><br>                                            <li class="li">Loads the submodule of the model.</li><br><br>                                            <li class="li">Produces results as video frames with classification<br>                                                labels.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives the video and text streams with the classification<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with the contents of the video stream<br>                                        on its sink pad.</li><br><br>                                    <li class="li">Adds classification result from the data sinkpad to GST<br>                                        buffer meta (meta muxing) on its source pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70023-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the classification labels on the VideoFrame using<br>                                        CL.</li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70023-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_h41_cf5_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Use qtivcomposer to mix original frame with classification mask

Run the use case on the target
                device:

    gst-launch-1.0 -e --gst-debug=2 \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer sink_1::position="<30, 30>" sink_1::dimensions="<320, 180>" ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/inceptionv3.dlc ! queue ! qtimlpostprocess settings="{\"confidence\": 40.0}" \
    results=2 module=mobilenet-softmax labels=/etc/labels/classification.json ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:
- Classify scenes from a video stream coming through a camera source.
- Compose classification labels and video stream together using qtivcomposer.
- Encode this stream as an H.264 bitstream.
- Multiplex the stream in an MP4 container and store it as an MP4 file.

Figure : Pipeline for classification and encode with qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="947.865921020507812" height="447.6268310546875" viewbox="0 0 947.865921020507812 447.6268310546875">
  <defs>
    <style>.svg-2 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-2 .cls-2 { fill: #fff; font-size: 16px }
.svg-2 .cls-2,.svg-2 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-2 .cls-4 { fill: #007884 }
.svg-2 .cls-5 { fill: #d2d7e1 }
.svg-2 .cls-6 { fill: #2a2aea }
.svg-2 .cls-3 { font-size: 14px }
.svg-2 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect class="cls-7" x=".499954223632812" y=".4998779296875" width="946.86572265625" height="446.62646484375" rx="7.499999999999999" ry="7.499999999999999"></rect>
      <path class="cls-5" d="M939.865921020507812,1c3.8597412109375,0,7,3.140235900878906,7,7v431.6268310546875c0,3.859771728515625-3.1402587890625,7-7,7H8c-3.859771728515625,0-7-3.140228271484375-7-7V8c0-3.859764099121094,3.140228271484375-7,7-7h931.865921020507812M939.865921020507812,0H8C3.581649780273438,0,0,3.581771850585938,0,8v431.6268310546875c0,4.418243408203125,3.581649780273438,8,8,8h931.865921020507812c4.418212890625,0,8-3.581756591796875,8-8V8c0-4.418228149414062-3.581787109375-8-8-8h0Z"></path>
    </g>
    <g>
      <g>
        <text class="cls-3" transform="translate(757.439010620117188 423.71820068359375)"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect class="cls-6" x="737.187998228979268" y="411.626708984375" width="16" height="16" rx="2" ry="2"></rect>
      </g>
      <g>
        <text class="cls-3" transform="translate(856.020797729492188 423.71820068359375)"><tspan x="0" y="0">Open source</tspan></text>
        <rect class="cls-4" x="835.769779641832429" y="411.626708984375" width="16" height="16" rx="2" ry="2"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <rect class="cls-4" x="19.999933004983177" y="110.091192647045318" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(88.910148620605469 138.597890853881836)"><tspan x="0" y="0">tee</tspan></text>
    </g>
    <g>
      <rect class="cls-6" x="19.999933004983177" y="186.932252130192865" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(44.52703857421875 215.439117431640625)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
    </g>
    <g>
      <rect class="cls-6" x="205.996780770569785" y="110.091192647045318" width="140" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(227.137542724609375 139.76685094833374)"><tspan x="0" y="0">qtivcomposer</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="179.999923706054688" y1="135.091194152832031" x2="199.258956909179688" y2="135.091194152832031"></line>
      <polygon points="198.091751098632812 139.080253601074219 204.999923706054688 135.091194152832031 198.091751098632812 131.102127075195312 198.091751098632812 139.080253601074219"></polygon>
    </g>
    <g>
      <line class="cls-1" x1="345.996780395507812" y1="135.091194152832031" x2="365.255813598632812" y2="135.091194152832031"></line>
      <polygon points="364.088577270507812 139.080253601074219 370.996780395507812 135.091194152832031 364.088577270507812 131.102127075195312 364.088577270507812 139.080253601074219"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="371.263796181971884" y="110.091192647045318" width="119.999999999999091" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(385.740432739257812 138.597890853881836)"><tspan x="0" y="0">v4l2h264enc</tspan></text>
    </g>
    <g>
      <rect class="cls-4" x="516.79782700477881" y="110.091192647045318" width="120" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(539.004898071289062 138.597890853881836)"><tspan x="0" y="0">h264parse</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="491.263809204101562" y1="135.091194152832031" x2="510.522842407226562" y2="135.091194152832031"></line>
      <polygon points="509.355606079101562 139.080253601074219 516.263809204101562 135.091194152832031 509.355606079101562 131.102127075195312 509.355606079101562 139.080253601074219"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="662.331857827586646" y="110.091192647045318" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991"></rect>
      <text class="cls-2" transform="translate(690.953018188476562 138.597890853881836)"><tspan x="0" y="0">mp4mux</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="636.797805786132812" y1="135.091194152832031" x2="656.056838989257812" y2="135.091194152832031"></line>
      <polygon points="654.889663696289062 139.080253601074219 661.797805786132812 135.091194152832031 654.889663696289062 131.102127075195312 654.889663696289062 139.080253601074219"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="807.865888650398119" y="110.091192647045318" width="120.000000000005457" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(842.432449340820312 138.597890853881836)"><tspan x="0" y="0">filesink</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="782.331863403320312" y1="135.091194152832031" x2="801.590896606445312" y2="135.091194152832031"></line>
      <polygon points="800.423660278320312 139.080253601074219 807.331863403320312 135.091194152832031 800.423660278320312 131.102127075195312 800.423660278320312 139.080253601074219"></polygon>
    </g>
    <g>
      <line class="cls-1" x1="99.999923706054688" y1="160.552772521972656" x2="99.999923706054688" y2="179.811790466308594"></line>
      <polygon points="96.010879516601562 178.644599914550781 99.999923706054688 185.552772521972656 103.988998413085938 178.644599914550781 96.010879516601562 178.644599914550781"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="19.999933004983177" y="263.867809575273895" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(64.67938232421875 292.37469482421875)"><tspan x="0" y="0">qtimlsnpe</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="99.999923706054688" y1="237.488327026367188" x2="99.999923706054688" y2="256.747344970703125"></line>
      <polygon points="96.010879516601562 255.580154418945312 99.999923706054688 262.488327026367188 103.988998413085938 255.580154418945312 96.010879516601562 255.580154418945312"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="19.999933004983177" y="341.626763895923432" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(37.72235107421875 370.133636474609375)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="99.999923706054688" y1="315.247283935546875" x2="99.999923706054688" y2="334.506317138671875"></line>
      <polygon points="96.010879516601562 333.339111328125 99.999923706054688 340.247283935546875 103.988998413085938 333.339111328125 96.010879516601562 333.339111328125"></polygon>
    </g>
    <g>
      <polyline class="cls-1" points="179.999923706054688 366.62677001953125 265.533950805664062 366.62677001953125 265.533950805664062 166.29376220703125"></polyline>
      <polygon points="269.523025512695312 167.460952758789062 265.533950805664062 160.552772521972656 261.544906616210938 167.460952758789062 269.523025512695312 167.460952758789062"></polygon>
    </g>
    <rect class="cls-4" x="20.000169699211256" y="20.045627678893652" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(78.082229614257812 48.552330017089844)"><tspan x="0" y="0">filesrc</tspan></text>
    <g>
      <line class="cls-1" x1="180.000167846679688" y1="45.045627593994141" x2="199.259170532226562" y2="45.045627593994141"></line>
      <polygon points="198.091995239257812 49.034690856933594 205.000167846679688 45.045627593994141 198.091995239257812 41.056564331054688 198.091995239257812 49.034690856933594"></polygon>
    </g>
    <rect class="cls-4" x="205.000169699211256" y="20.045627678893652" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(253.703323364257812 48.552330017089844)"><tspan x="0" y="0">qtdemux</tspan></text>
    <g>
      <line class="cls-1" x1="365.000198364257812" y1="45.045627593994141" x2="384.259170532226562" y2="45.045627593994141"></line>
      <polygon points="383.091995239257812 49.034690856933594 390.000198364257812 45.045627593994141 383.091995239257812 41.056564331054688 383.091995239257812 49.034690856933594"></polygon>
    </g>
    <rect class="cls-4" x="390.000169699211256" y="20.045627678893652" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(432.207199096679688 48.552330017089844)"><tspan x="0" y="0">h264parse</tspan></text>
    <g>
      <line class="cls-1" x1="550.000198364257812" y1="45.045627593994141" x2="569.259170532226562" y2="45.045627593994141"></line>
      <polygon points="568.091995239257812 49.034690856933594 575.000198364257812 45.045627593994141 568.091995239257812 41.056564331054688 568.091995239257812 49.034690856933594"></polygon>
    </g>
    <g>
      <polyline class="cls-1" points="99.999923706054688 104.350208282470703 99.999923706054688 90.091192245483398 655.000198364257812 90.091192245483398 655.000198364257812 70.091192722320557"></polyline>
      <polygon points="103.988998413085938 103.183015823364258 99.999923706054688 110.091194152832031 96.010879516601562 103.183015823364258 103.988998413085938 103.183015823364258"></polygon>
    </g>
    <rect class="cls-4" x="575.000169699211256" y="20.045627678893652" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(609.379104614257812 48.552330017089844)"><tspan x="0" y="0">v4l2h264dec</tspan></text>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| Source | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_acs_jk5_vbc"><br>                                    <li class="li">The video stream is collected from a camera source plugin<br>                                        and two copies are created:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ul_bcs_jk5_vbc"><br>                                            <li class="li">One stream is sent to the qtivcomposer plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_ccs_jk5_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ul_dcs_jk5_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The classification model uses this tensor stream<br>                                            for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_ecs_jk5_vbc"><br>                                    <li class="li">Loads the model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_fcs_jk5_vbc"><br>                                    <li class="li">Receives the inference tensors from a classification model<br>                                        on its sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later. </li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the classification<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_gcs_jk5_vbc"><br>                                            <li class="li">Loads the submodule of the model.</li><br><br>                                            <li class="li">Produces results as video frames with classification<br>                                                labels.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70023-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_gv1_wjk_5bc"><br>                                    <li class="li">Receives original video stream and video stream with<br>                                        classification results on its sinkpads. </li><br><br>                                    <li class="li">On its sourcepad, produces GST buffers with the contents<br>                                        composed of video streams from its sinkpads.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70023-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1__ol_jcs_jk5_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

**Parent Topic:** [Qualcomm Neural Processing SDK use cases](https://docs.qualcomm.com/doc/80-70023-50/topic/qualcomm-neural-processing-sdk-use-cases.html)

Last Published: Mar 27, 2026

[Previous Topic
Image classification and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/single-camera-stream-with-image-classification-and-display-with-mobilenet-v1.md) [Next Topic
Object detection and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.md)