# Image classification and encode with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-classification-and-encode.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-classification-and-encode.html)

The use cases use the InceptionV3 LiteRT model to classify scenes from a single
        camera stream and either overlay or compose the classification labels, and then encode the
        stream.

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

## Use qtivoverlay plugin to apply classification overlay

Run the use case on the target
                device:

    gst-launch-1.0 -e qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split split. ! \
    queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! v4l2h264enc capture-io-mode=4 output-io-mode=5 ! \
    h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 split. ! queue ! qtimlvconverter ! queue ! \
    qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=/etc/models/inception_v3_quantized.tflite ! queue ! qtimlpostprocess settings="{\"confidence\": 40.0}" results=2 module=mobilenet-softmax \
    labels=/etc/labels/classification.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case,  use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify the stream coming through a camera source.
2. Overlay the classification labels using overlaylib.
3. Encode the stream as a H.264 bitstream.
4. Multiplex the stream in an MP4 container and stored as an MP4 file.

Figure : Pipeline for classification overlay and encode
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1180" height="357.535699844360352" viewbox="0 0 1180 357.535699844360352">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".5" y=".500055313110352" width="1179" height="356.53515625" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
      <path d="M1172,1c3.85986328125,0,7,3.140153884887695,7,7v341.535699844360352c0,3.859832763671875-3.14013671875,7-7,7H8c-3.859846115112305,0-7-3.140167236328125-7-7V8c0-3.859846115112305,3.140153884887695-7,7-7h1164M1172,0H8C3.581692695617676,0,0,3.581846237182617,0,8v341.535699844360352c0,4.41815185546875,3.581692695617676,8,8,8h1164c4.418334960939319,0,8-3.58184814453125,8-8V8c0-4.418153762817383-3.581665039060681-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(987.492034912109375 333.627176284790039)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="967.241037468234936" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1086.073822021484375 333.627176284790039)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1065.822818881091735" y="321.535699844360352" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.363876708768657" y="20.00006906611361" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
        <text transform="translate(53.793609619140625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.431938354383419" y="20.00006906611361" width="120" height="50" rx="3.999999999999998" ry="3.999999999999998" style="fill: #007884;"></rect>
        <text transform="translate(215.342155456542969 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.431938354383419" y="96.841128549261157" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.95904541015625 125.348001480102539)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.897903442382812" y1="45.000070571899414" x2="160.874542236328125" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.853256225585938 48.490503311157227 165.897903442382812 45.000070571899414 159.853256225585938 41.509637832641602 159.853256225585938 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="311.965969177188526" y="20.00006906611361" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
        <text transform="translate(329.262907028198242 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimetamux</tspan></text>
      </g>
      <g>
        <line x1="286.43194580078125" y1="45.000070571899414" x2="306.408569335938409" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.387298583984375 48.490503311157227 311.43194580078125 45.000070571899414 305.387298583984375 41.509637832641602 305.387298583984375 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="457.499999999996362" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(482.93756103515625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtioverlay</tspan></text>
      </g>
      <g>
        <line x1="431.965972900390625" y1="45.000070571899414" x2="451.942596435546875" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="450.92132568359375 48.490503311157227 456.965972900390625 45.000070571899414 450.92132568359375 41.509637832641602 450.92132568359375 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="603.034030822804198" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(617.510650634765625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <line x1="577.5" y1="45.000070571899414" x2="597.47662353515625" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="596.455322265625 48.490503311157227 602.5 45.000070571899414 596.455322265625 41.509637832641602 596.455322265625 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="748.568061645612033" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(770.775146484375 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="723.0340576171875" y1="45.000070571899414" x2="743.01068115234375" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="741.9893798828125 48.490503311157227 748.0340576171875 45.000070571899414 741.9893798828125 41.509637832641602 741.9893798828125 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="894.102092468419869" y="20.00006906611361" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(922.7232666015625 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="868.56805419921875" y1="45.000070571899414" x2="888.544677734375" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="887.5234375 48.490503311157227 893.56805419921875 45.000070571899414 887.5234375 41.509637832641602 887.5234375 48.490503311157227"></polygon>
      </g>
      <g>
        <rect x="1039.636123291229524" y="20.00006906611361" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(1074.20263671875 48.506771087646484)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="1014.10205078125" y1="45.000070571899414" x2="1034.078735351560681" y2="45.000070571899414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1033.057495117189319 48.490503311157227 1039.10205078125 45.000070571899414 1033.057495117189319 41.509637832641602 1033.057495117189319 48.490503311157227"></polygon>
      </g>
      <g>
        <line x1="226.43194580078125" y1="70.461648941040039" x2="226.43194580078125" y2="90.438287734985352" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.941513061523438 89.416994094848633 226.43194580078125 95.461648941040039 229.92236328125 89.416994094848633 222.941513061523438 89.416994094848633"></polygon>
      </g>
      <g>
        <rect x="146.431938354383419" y="173.776685994341278" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(192.252120971679688 203.452539443969727)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="226.43194580078125" y1="147.397211074829102" x2="226.43194580078125" y2="167.373849868774414" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.941513061523438 166.352548599243164 226.43194580078125 172.397211074829102 229.92236328125 166.352548599243164 222.941513061523438 166.352548599243164"></polygon>
      </g>
      <g>
        <rect x="146.431938354383419" y="251.535640314990815" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(164.15435791015625 280.042520523071289)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="226.43194580078125" y1="225.156167984008789" x2="226.43194580078125" y2="245.132806777954102" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.941513061523438 244.111505508423761 226.43194580078125 250.156152725219727 229.92236328125 244.111505508423761 222.941513061523438 244.111505508423761"></polygon>
      </g>
      <g>
        <polyline points="306.43194580078125 276.535638809204102 371.965972900390625 276.535638809204102 371.965972900390625 75.485010147094727" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="375.456390380859375 76.506303787231445 371.965972900390625 70.461648941040039 368.47552490234375 76.506303787231445 375.456390380859375 76.506303787231445"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol"><br>                                    <li class="li">The video stream is collected from a camera source plugin<br>                                        and two copies are created:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode__ol_kh3_dyv_r1c"><br>                                            <li class="li">One stream is sent to the qtimetamux plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to a ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The classification model uses this tensor stream<br>                                            for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_bwn_s5l_vbc"><br>                                    <li class="li">Loads the classification model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_gr1_w5l_vbc"><br>                                    <li class="li">Receives the inference tensors from a classification model<br>                                        on its sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later. </li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the classification<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-image-classification-and-encode__ol_rrb_1xl_vbc"><br>                                            <li class="li">Loads the MobileNet-softmax submodule.</li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives the video stream and text stream with<br>                                        classification results corresponding to video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with the contents of video stream on<br>                                        its sink pad.</li><br><br>                                    <li class="li">Adds classification result from data sinkpad to GST buffer<br>                                        meta (meta muxing) on its source pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the classification labels on the VideoFrame using<br>                                        CL. </li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

## Use qtivcomposer to mix original frame with classification mask

Run the use case on the target
                device:

    gst-launch-1.0 -e --gst-debug=2 qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split split. ! \
    queue ! qtivcomposer name=mixer sink_1::position="<30, 30>" sink_1::dimensions="<320, 180>" ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 split. ! queue ! qtimlvconverter ! queue ! \
    qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=/etc/models/inception_v3_quantized.tflite ! queue ! qtimlpostprocess settings="{\"confidence\": 40.0}" results=2 module=mobilenet \
    labels=/etc/labels/classification.json ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify the stream coming through a camera source.
2. Compose the classification labels and video stream using qtivcomposer.
3. Encode the stream as an H.264 bitstream.
4. Multiplex the stream in an MP4 container and store it as an MP4 file.

Figure : Pipeline for classification and encode with qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1053.934104919439051" height="355.535463333129883" viewbox="0 0 1053.934104919439051 355.535463333129883">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500267028808594" y=".499940872192383" width="1052.93359375" height="354.53515625" rx="7.499999999999957" ry="7.499999999999957" style="fill: #fafafa;"></rect>
      <path d="M1045.934104919439051,1c3.85986328125,0,7,3.140132904052734,7,7v339.535463333129883c0,3.85986328125-3.14013671875,7-7,7H8c-3.859870910644531,0-7-3.14013671875-7-7V8c0-3.859867095947266,3.140129089355469-7,7-7h1037.934104919439051M1045.934104919439051,0H8C3.581733703613281,0,0,3.581731796264648,0,8v339.535463333129883c0,4.41827392578125,3.581733703613281,8,8,8h1037.934104919439051c4.418212890619543,0,8-3.58172607421875,8-8V8c0-4.418268203735352-3.581787109380457-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(864.426780700683594 331.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="844.175762375432896" y="319.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(963.008567810058594 331.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="942.757543788289695" y="319.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000055258294196" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(53.429786682128906 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068116903908958" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(214.97833251953125 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068116903908958" y="96.841006478947747" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595218658447266 125.347871780395508)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.534080505372003" y1="44.999948501586914" x2="160.510719299316406" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489433288574219 48.490381240844727 165.534080505372003 44.999948501586914 159.489433288574219 41.509515762329102 159.489433288574219 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="312.064964669495566" y="19.999946995800201" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(333.205726623535156 49.675605297088623)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="286.068107604980469" y1="44.999948501586914" x2="306.044761657714844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023460388183594 48.490381240844727 311.068107604980469 44.999948501586914 305.023460388183594 41.509515762329102 305.023460388183594 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="452.064964294433594" y1="44.999948501586914" x2="472.041587829589844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="471.020286560058594 48.490381240844727 477.064964294433594 44.999948501586914 471.020286560058594 41.509515762329102 471.020286560058594 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="477.331980080897665" y="19.999946995800201" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(491.808616638183594 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <rect x="622.866010903704591" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(645.073081970214844 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="597.331993103027344" y1="44.999948501586914" x2="617.308616638183594" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="616.287315368652344 48.490381240844727 622.331993103027344 44.999948501586914 616.287315368652344 41.509515762329102 616.287315368652344 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="768.400041726512427" y="19.999946995800201" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(797.021202087402344 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="742.865989685058594" y1="44.999948501586914" x2="762.842674255371094" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="761.821372985839844 48.490381240844727 767.865989685058594 44.999948501586914 761.821372985839844 41.509515762329102 761.821372985839844 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="913.934072549323901" y="19.999946995800201" width="120.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(948.500633239746094 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="888.400047302246094" y1="44.999948501586914" x2="908.376670837402344" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.355369567871094 48.490381240844727 913.400047302246094 44.999948501586914 907.355369567871094 41.509515762329102 907.355369567871094 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="226.068107604980469" y1="70.461526870727539" x2="226.068107604980469" y2="90.438165664671942" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 89.416872024536133 226.068107604980469 95.461526870727539 229.558555603027344 89.416872024536133 222.577690124511719 89.416872024536133"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="173.776563924028778" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(191.888187408447266 202.283449172973633)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="147.39708137512207" x2="226.068107604980469" y2="167.373720169067383" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 166.352434158325195 226.068107604980469 172.39708137512207 229.558555603027344 166.352434158325195 222.577690124511719 166.352434158325195"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="249.535518244678315" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(161.810062408447266 278.042390823364258)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess </tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="223.776567459106445" x2="226.068107604980469" y2="243.753206253051758" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 242.731904983520508 226.068107604980469 248.776552200317383 229.558555603027344 242.731904983520508 222.577690124511719 242.731904983520508"></polygon>
      </g>
      <g>
        <polyline points="306.068107604980469 274.535524368286133 382.064964294433594 274.535524368286133 382.064964294433594 75.484888076782227" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="385.555381774902344 76.506181716918945 382.064964294433594 70.461526870727539 378.574546813964844 76.506181716918945 385.555381774902344 76.506181716918945"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) from a camera and creates<br>                                        two copies of the source:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode__ol_m2f_zgm_vbc"><br>                                            <li class="li">One stream is sent to the qtimetamux plugin to<br>                                                retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_t2f_zgm_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-classification-and-encode__ul_u2f_zgm_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as an input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The classification model uses the tensor stream<br>                                            for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_v2f_zgm_vbc"><br>                                    <li class="li">Loads the classification model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces the tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_w2f_zgm_vbc"><br>                                    <li class="li">Receives the inference tensors from a classification model<br>                                        on its sinkpad.</li><br><br>                                    <li class="li">Converts the tensors into formats such as video or text that<br>                                        the multimedia plugins can process later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results.</li><br><br>                                    <li class="li">Loads the corresponding modules of the classification<br>                                        models. <p class="p">In this use case, qtimlpostprocess does the<br>                                            following:</p><ol class="ol" type="a" id="single-camera-stream-with-image-classification-and-encode__ol_x2f_zgm_vbc"><br>                                            <li class="li">Loads Mobilenet-softmax submodule.</li><br><br>                                            <li class="li">Produces results as video frames with classification<br>                                                labels.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_nmc_lxl_vbc"><br>                                    <li class="li">Receives the original video stream with classification<br>                                        results on its sinkpads. </li><br><br>                                    <li class="li">On its sourcepad, produces GST buffers with contents<br>                                        composed of video streams from its sinkpads.</li><br><br>                                </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-image-classification-and-encode__ol_aff_zgm_vbc"><br>                                    <li class="li">Applies parameters to each frame of the video stream it's<br>                                        receiving on its sinkpad.</li><br><br>                                    <li class="li">Encodes it into bitstream and sends it over its<br>                                        sourcepad.</li><br><br>                                </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                                meta. |
| mp4mux | Receives these buffers and creates containers with format<br>                                specification buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                    /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and<br>                                play it on a media player:<br>`scp root@<IP address of<br>                                        target device>:/etc/media/video.mp4 <destination<br>                                        directory>` |

**Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/tensorflow-lite-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Image classification and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-classification-and-display-with-litert.md) [Next Topic
Audio classification decode and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/audio-classification-with-litert.md)