# Image segmentation and encode with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-segmentation-and-encode.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-segmentation-and-encode.html)

The use case implements the `deeplabv3_resnet50` LiteRT model to
        compose the semantic segmentations and original video stream, encode this stream, and then
        multiplex it in an MP4 container.

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

Run the use case on the target
            device:

    gst-launch-1.0 -e --gst-debug=2 \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" sink_1::alpha=0.5 ! queue ! \
    video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/deeplabv3_plus_mobilenet_quantized.tflite ! queue ! \
    qtimlpostprocess module=deeplab-argmax labels=/etc/labels/deeplabv3_resnet50.json ! video/x-raw,width=256,height=144 ! queue ! mixer.Copy to clipboard

To stop the use case,  use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify the scenes from a video stream coming through a camera source.
2. Compose the semantic segmentation and video stream using qtivcomposer.
3. Encode the stream as a H.264 bit stream and multiplex the stream in an MP4
                container.

Figure : Pipeline for segmentation and encode with qtivcomposer
            
            <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1053.934104919439051" height="355.535463333129883" viewbox="0 0 1053.934104919439051 355.535463333129883">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500267028808594" y=".499940872192383" width="1052.93359375" height="354.53515625" rx="7.499999999999957" ry="7.499999999999957" style="fill: #fafafa;"></rect>
      <path d="M1045.934104919439051,1c3.85986328125,0,7,3.140132904052734,7,7v339.535463333129883c0,3.85986328125-3.14013671875,7-7,7H8c-3.859870910644531,0-7-3.14013671875-7-7V8c0-3.859867095947266,3.140129089355469-7,7-7h1037.934104919439051M1045.934104919439051,0H8C3.581733703613281,0,0,3.581731796264648,0,8v339.535463333129883c0,4.41827392578125,3.581733703613281,8,8,8h1037.934104919439051c4.418212890619543,0,8-3.58172607421875,8-8V8c0-4.418268203735352-3.581787109380457-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(864.426780700683594 331.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="844.175762375432896" y="319.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(963.008567810058594 331.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="942.757543788289695" y="319.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000055258294196" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(53.429786682128906 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068116903908958" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(214.97833251953125 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068116903908958" y="96.841006478947747" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595218658447266 125.347871780395508)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.534080505372003" y1="44.999948501586914" x2="160.510719299316406" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489433288574219 48.490381240844727 165.534080505372003 44.999948501586914 159.489433288574219 41.509515762329102 159.489433288574219 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="312.064964669495566" y="19.999946995800201" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(333.205726623535156 49.675605297088623)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="286.068107604980469" y1="44.999948501586914" x2="306.044761657714844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023460388183594 48.490381240844727 311.068107604980469 44.999948501586914 305.023460388183594 41.509515762329102 305.023460388183594 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="452.064964294433594" y1="44.999948501586914" x2="472.041587829589844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="471.020286560058594 48.490381240844727 477.064964294433594 44.999948501586914 471.020286560058594 41.509515762329102 471.020286560058594 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="477.331980080897665" y="19.999946995800201" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(491.808616638183594 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <rect x="622.866010903704591" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(645.073081970214844 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="597.331993103027344" y1="44.999948501586914" x2="617.308616638183594" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="616.287315368652344 48.490381240844727 622.331993103027344 44.999948501586914 616.287315368652344 41.509515762329102 616.287315368652344 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="768.400041726512427" y="19.999946995800201" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(797.021202087402344 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="742.865989685058594" y1="44.999948501586914" x2="762.842674255371094" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="761.821372985839844 48.490381240844727 767.865989685058594 44.999948501586914 761.821372985839844 41.509515762329102 761.821372985839844 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="913.934072549323901" y="19.999946995800201" width="120.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(948.500633239746094 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="888.400047302246094" y1="44.999948501586914" x2="908.376670837402344" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.355369567871094 48.490381240844727 913.400047302246094 44.999948501586914 907.355369567871094 41.509515762329102 907.355369567871094 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="226.068107604980469" y1="70.461526870727539" x2="226.068107604980469" y2="90.438165664671942" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 89.416872024536133 226.068107604980469 95.461526870727539 229.558555603027344 89.416872024536133 222.577690124511719 89.416872024536133"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="173.776563924028778" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(191.888187408447266 202.283449172973633)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="147.39708137512207" x2="226.068107604980469" y2="167.373720169067383" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 166.352434158325195 226.068107604980469 172.39708137512207 229.558555603027344 166.352434158325195 222.577690124511719 166.352434158325195"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="249.535518244678315" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(161.810062408447266 278.042390823364258)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess </tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="223.776567459106445" x2="226.068107604980469" y2="243.753206253051758" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 242.731904983520508 226.068107604980469 248.776552200317383 229.558555603027344 242.731904983520508 222.577690124511719 242.731904983520508"></polygon>
      </g>
      <g>
        <polyline points="306.068107604980469 274.535524368286133 382.064964294433594 274.535524368286133 382.064964294433594 75.484888076782227" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="385.555381774902344 76.506181716918945 382.064964294433594 70.461526870727539 378.574546813964844 76.506181716918945 385.555381774902344 76.506181716918945"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
            execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode__ol_f5k_g5n_vbc"><br>                                <li class="li">Collects the video stream (source) and creates two copies of the<br>                                        source:<ul class="ul" id="single-camera-stream-with-image-segmentation-and-encode__ul_n44_nwl_vbc"><br>                                        <li class="li">One stream is sent to the qtivcomposer plugin to retain<br>                                            the video stream.</li><br><br>                                        <li class="li">The other stream is sent to an ML inferencing branch in<br>                                            the pipeline.</li><br><br>                                    </ul><br></li><br><br>                            </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode__ol_xsf_q5l_vbc"><br>                                <li class="li">Receives the video stream on its sink pad.</li><br><br>                                <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-segmentation-and-encode__ul_ff2_twl_vbc"><br>                                        <li class="li">Color conversion</li><br><br>                                        <li class="li">Scaling down/up</li><br><br>                                        <li class="li">Normalization on the stream data when the model expects<br>                                            the floating point values as an input</li><br><br>                                    </ul><br></li><br><br>                                <li class="li">Converts the video stream to a tensor stream on its source<br>                                        pad.<p class="p">The segmentation model uses this tensor stream for<br>                                        inferencing.</p><br></li><br><br>                            </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode__ol_lfr_35n_vbc"><br>                                <li class="li">Loads the segmentation model.</li><br><br>                                <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                <li class="li">Runs the inference and produces a tensor stream with the<br>                                    segmentation results on its source pad.</li><br><br>                            </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode__ol_mtr_k5n_vbc"><br>                                <li class="li">Receives the inference tensors on its sinkpad.</li><br><br>                                <li class="li">Converts the inference tensors into video formats that the<br>                                    multimedia plugins can process later.</li><br><br>                                <li class="li">Produces the semantic segmentations for the frame.</li><br><br>                                <li class="li">Loads the corresponding modules for the segmentation<br>                                        models.<p class="p">In this use case, qtimlpostprocess does the<br>                                        following: </p><ol class="ol" type="a" id="single-camera-stream-with-image-segmentation-and-encode__ol_ntr_k5n_vbc"><br>                                        <li class="li">Loads the deeplab-argmax submodule.</li><br><br>                                        <li class="li">Produces video frames with segmentation masks.</li><br><br>                                        <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                    </ol><br><br>                                </li><br><br>                            </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode__ol_nmc_lxl_vbc"><br>                                <li class="li">Receives the original video stream with segmentation mask on its<br>                                    sinkpads. </li><br><br>                                <li class="li">Produces on its sourcepad GST buffers with contents composed of<br>                                    video streams from its sinkpads.</li><br><br>                            </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode__ol_wsc_bsn_vbc"><br>                                <li class="li">Applies parameters to each frame of the video stream it's<br>                                    receiving on its sinkpad.</li><br><br>                                <li class="li">Encodes it into bit stream and sends it over its sourcepad.</li><br><br>                            </ol> |
| h264parse | Adds more information about the bit stream to the GStreamer buffer<br>                            meta. |
| mp4mux | Receives these buffers and creates the container format specification<br>                            buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and play<br>                            it on a media player:<br>`scp root@<IP address of target<br>                                    device>:/etc/media/video.mp4 <destination<br>                                directory>` |

**Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/tensorflow-lite-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Image segmentation and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-segmentation-and-display.md) [Next Topic
Pose estimation and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-pose-estimation-and-display.md)