# Image segmentation and encode with LiteRT

The use case implements the `deeplabv3_resnet50` LiteRT model to compose the semantic segmentations and original video stream, encode this stream, and then multiplex it in an MP4 container.

Note

For Ubuntu Server, `sudo` access is necessary to write the encoded stream to the `/etc/media` folder.

Run the use case on the target device:

gst-launch-1.0 -e --gst-debug=2 \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" sink_1::alpha=0.5 ! queue ! \
    video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \
    v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/output_video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/deeplabv3_plus_mobilenet_quantized.tflite ! queue ! \
    qtimlpostprocess module=deeplab-argmax labels=/etc/labels/deeplabv3_resnet50.json ! video/x-raw,width=256,height=144 ! queue ! mixer.
    Copy to clipboard

To stop the use case, use **CTRL + C**.

The following figure shows the flow of the use case execution:

1. Identify the scenes from a video stream coming through a camera source.
2. Compose the semantic segmentation and video stream using qtivcomposer.
3. Encode the stream as a H.264 bit stream and multiplex the stream in an MP4 container.

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="1053.934104919439051" height="355.535463333129883" viewbox="0 0 1053.934104919439051 355.535463333129883" aria-label="../../_images/pipeline_segmentation_and_encode_with_qtivcomposer.svg">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500267028808594" y=".499940872192383" width="1052.93359375" height="354.53515625" rx="7.499999999999957" ry="7.499999999999957" style="fill: #fafafa;"></rect>
      <path d="M1045.934104919439051,1c3.85986328125,0,7,3.140132904052734,7,7v339.535463333129883c0,3.85986328125-3.14013671875,7-7,7H8c-3.859870910644531,0-7-3.14013671875-7-7V8c0-3.859867095947266,3.140129089355469-7,7-7h1037.934104919439051M1045.934104919439051,0H8C3.581733703613281,0,0,3.581731796264648,0,8v339.535463333129883c0,4.41827392578125,3.581733703613281,8,8,8h1037.934104919439051c4.418212890619543,0,8-3.58172607421875,8-8V8c0-4.418268203735352-3.581787109380457-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(864.426780700683594 331.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="844.175762375432896" y="319.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(963.008567810058594 331.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="942.757543788289695" y="319.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000055258294196" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(53.429786682128906 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068116903908958" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(214.97833251953125 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068116903908958" y="96.841006478947747" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595218658447266 125.347871780395508)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.534080505372003" y1="44.999948501586914" x2="160.510719299316406" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489433288574219 48.490381240844727 165.534080505372003 44.999948501586914 159.489433288574219 41.509515762329102 159.489433288574219 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="312.064964669495566" y="19.999946995800201" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(333.205726623535156 49.675605297088623)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="286.068107604980469" y1="44.999948501586914" x2="306.044761657714844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023460388183594 48.490381240844727 311.068107604980469 44.999948501586914 305.023460388183594 41.509515762329102 305.023460388183594 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="452.064964294433594" y1="44.999948501586914" x2="472.041587829589844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="471.020286560058594 48.490381240844727 477.064964294433594 44.999948501586914 471.020286560058594 41.509515762329102 471.020286560058594 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="477.331980080897665" y="19.999946995800201" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(491.808616638183594 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <rect x="622.866010903704591" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(645.073081970214844 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="597.331993103027344" y1="44.999948501586914" x2="617.308616638183594" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="616.287315368652344 48.490381240844727 622.331993103027344 44.999948501586914 616.287315368652344 41.509515762329102 616.287315368652344 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="768.400041726512427" y="19.999946995800201" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(797.021202087402344 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="742.865989685058594" y1="44.999948501586914" x2="762.842674255371094" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="761.821372985839844 48.490381240844727 767.865989685058594 44.999948501586914 761.821372985839844 41.509515762329102 761.821372985839844 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="913.934072549323901" y="19.999946995800201" width="120.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(948.500633239746094 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="888.400047302246094" y1="44.999948501586914" x2="908.376670837402344" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.355369567871094 48.490381240844727 913.400047302246094 44.999948501586914 907.355369567871094 41.509515762329102 907.355369567871094 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="226.068107604980469" y1="70.461526870727539" x2="226.068107604980469" y2="90.438165664671942" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 89.416872024536133 226.068107604980469 95.461526870727539 229.558555603027344 89.416872024536133 222.577690124511719 89.416872024536133"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="173.776563924028778" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(191.888187408447266 202.283449172973633)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="147.39708137512207" x2="226.068107604980469" y2="167.373720169067383" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 166.352434158325195 226.068107604980469 172.39708137512207 229.558555603027344 166.352434158325195 222.577690124511719 166.352434158325195"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="249.535518244678315" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(161.810062408447266 278.042390823364258)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess </tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="223.776567459106445" x2="226.068107604980469" y2="243.753206253051758" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 242.731904983520508 226.068107604980469 248.776552200317383 229.558555603027344 242.731904983520508 222.577690124511719 242.731904983520508"></polygon>
      </g>
      <g>
        <polyline points="306.068107604980469 274.535524368286133 382.064964294433594 274.535524368286133 382.064964294433594 75.484888076782227" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="385.555381774902344 76.506181716918945 382.064964294433594 70.461526870727539 378.574546813964844 76.506181716918945 385.555381774902344 76.506181716918945"></polygon>
      </g>
    </g>
  </g>
</svg>
**Figure : Pipeline for segmentation and encode with qtivcomposer**

The following table provides the sequential processing stages of the pipeline execution:

Table : Pipeline processing stages for segmentation and encode with qtivcomposer

| Process | Description |
| --- | --- |
| qtiqmmfsrc | <ol class="arabic"><br><li><p>Collects the video stream (source) and creates two copies of the source:</p><br><blockquote><br><div><ul class="simple"><br><li><p>One stream is sent to the qtivcomposer plugin to retain the video stream.</p></li><br><li><p>The other stream is sent to an ML inferencing branch in the pipeline.</p></li><br></ul><br></div></blockquote><br></li><br></ol> |
| **Preprocessing** |
| qtimlvconverter | <ol class="arabic"><br><li><p>Receives the video stream on its sink pad.</p></li><br><li><dl class="simple"><br><dt>Performs preprocessing:</dt><dd><ul class="simple"><br><li><p>Color conversion</p></li><br><li><p>Scaling down/up</p></li><br><li><p>Normalization on the stream data when the model expects the floating point values as an input</p></li><br></ul><br></dd><br></dl><br></li><br><li><p>Converts the video stream to a tensor stream on its source pad.</p><br><p>The segmentation model uses this tensor stream for inferencing.</p><br></li><br></ol> |
| **Inferencing** |
| qtimltflite | <ol class="arabic simple"><br><li><p>Loads the segmentation model.</p></li><br><li><p>Modifies the graph for the chosen delegate.</p></li><br><li><p>Receives the tensor stream on its sinkpad.</p></li><br><li><p>Runs the inference and produces a tensor stream with the segmentation results on its source pad.</p></li><br></ol> |
| **Postprocessing** |
| qtimlpostprocess | <ol class="arabic"><br><li><p>Receives the inference tensors on its sinkpad.</p></li><br><li><p>Converts the inference tensors into video formats that the multimedia plugins can process later.</p></li><br><li><p>Produces the semantic segmentations for the frame.</p></li><br><li><p>Loads the corresponding modules for the segmentation models.</p><br><p>In this use case, qtimlpostprocess does the following:</p><br><blockquote><br><div><ol class="loweralpha simple"><br><li><p>Loads the deeplab-argmax submodule.</p></li><br><li><p>Produces video frames with segmentation masks.</p></li><br><li><p>Sends them to the sinkpad of qtivcomposer.</p></li><br></ol><br></div></blockquote><br></li><br></ol> |
| qtivcomposer | <ol class="arabic simple"><br><li><p>Receives the original video stream with segmentation mask on its sinkpads.</p></li><br><li><p>Produces on its sourcepad GST buffers with contents composed of video streams from its sinkpads.</p></li><br></ol> |
| v4l2h264enc | <ol class="arabic simple"><br><li><p>Applies parameters to each frame of the video stream it's receiving on its sinkpad.</p></li><br><li><p>Encodes it into bit stream and sends it over its sourcepad.</p></li><br></ol> |
| h264parse | Adds more information about the bit stream to the GStreamer buffer meta. |
| mp4mux | Receives these buffers and creates the container format specification buffers. |
| **Output** |
| Filesink | Stores the resulting stream in a `/etc/media/output_video.mp4`  file. |
| Playback | Pull *output\_video.mp4*  from the host computer and play it on a media player: `scp root@<ip>\:/etc/media/output_video.mp4 <destination>` |

Last Published: Apr 02, 2026

Previous Topic
 
Image segmentation and display with LiteRT Next Topic

Pose estimation and display with LiteRT