# Image segmentation and encode with Neural Processing SDK

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized.html)

The use case implements the DeepLab v3 model with the Qualcomm Neural Processing SDK
        runtime. The use case is to compose the semantic segmentations and original video stream,
        encode this stream, and then multiplex it in an MP4 container.

Note: For Ubuntu Server, `sudo` access is necessary to
            write the encoded stream to the `/etc/media` folder.

Run the use case on the target
            device:

    gst-launch-1.0 -e --gst-debug=2 \
    qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1920,height=1080,framerate=30/1 ! queue ! tee name=split  \
    split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" sink_1::alpha=0.5 ! queue ! v4l2h264enc capture-io-mode=4 output-io-mode=5 ! \
    h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/deeplabv3_plus_mobilenet.dlc ! queue ! \
    qtimlpostprocess module=deeplab-argmax labels=/etc/labels/deeplabv3_resnet50.json ! video/x-raw,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case, use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identify scenes from a video stream coming through a camera source.
2. Compose semantic segmentation and video stream using qtivcomposer.
3. Encode the stream as an H.264 bit stream and multiplex the stream in an MP4
                container.

Figure : Pipeline for image segmentation and encode with qtivcomposer
            
            <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="1053.934104919439051" height="357.535463333129883" viewbox="0 0 1053.934104919439051 357.535463333129883">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500267028808594" y=".499940872192383" width="1052.93359375" height="356.53515625" rx="7.499999999999957" ry="7.499999999999957" style="fill: #fafafa;"></rect>
      <path d="M1045.934104919439051,1c3.85986328125,0,7,3.140132904052734,7,7v341.535463333129883c0,3.85986328125-3.14013671875,7-7,7H8c-3.859870910644531,0-7-3.14013671875-7-7V8c0-3.859867095947266,3.140129089355469-7,7-7h1037.934104919439051M1045.934104919439051,0H8C3.581733703613281,0,0,3.581731796264648,0,8v341.535463333129883c0,4.41827392578125,3.581733703613281,8,8,8h1037.934104919439051c4.418212890619543,0,8-3.58172607421875,8-8V8c0-4.418268203735352-3.581787109380457-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(859.426536560058594 333.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="839.175539116184154" y="321.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(958.008323669433594 333.62693977355957)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="937.757320529040953" y="321.535463333129883" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20.000055258294196" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(53.429786682128906 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <rect x="166.068116903908958" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(214.97833251953125 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="146.068116903908958" y="96.841006478947747" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(170.595218658447266 125.347871780395508)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="140.534080505372003" y1="44.999948501586914" x2="160.510719299316406" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="159.489433288574219 48.490381240844727 165.534080505372003 44.999948501586914 159.489433288574219 41.509515762329102 159.489433288574219 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="312.064964669495566" y="19.999946995800201" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(333.205726623535156 49.675605297088623)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="286.068107604980469" y1="44.999948501586914" x2="306.044761657714844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="305.023460388183594 48.490381240844727 311.068107604980469 44.999948501586914 305.023460388183594 41.509515762329102 305.023460388183594 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="452.064964294433594" y1="44.999948501586914" x2="472.041587829589844" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="471.020286560058594 48.490381240844727 477.064964294433594 44.999948501586914 471.020286560058594 41.509515762329102 471.020286560058594 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="477.331980080897665" y="19.999946995800201" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(491.808616638183594 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2h264enc</tspan></text>
      </g>
      <g>
        <rect x="622.866010903704591" y="19.999946995800201" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(645.073081970214844 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">h264parse</tspan></text>
      </g>
      <g>
        <line x1="597.331993103027344" y1="44.999948501586914" x2="617.308616638183594" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="616.287315368652344 48.490381240844727 622.331993103027344 44.999948501586914 616.287315368652344 41.509515762329102 616.287315368652344 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="768.400041726512427" y="19.999946995800201" width="120" height="50" rx="3.999999999999991" ry="3.999999999999991" style="fill: #007884;"></rect>
        <text transform="translate(797.021202087402344 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">mp4mux</tspan></text>
      </g>
      <g>
        <line x1="742.865989685058594" y1="44.999948501586914" x2="762.842674255371094" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="761.821372985839844 48.490381240844727 767.865989685058594 44.999948501586914 761.821372985839844 41.509515762329102 761.821372985839844 48.490381240844727"></polygon>
      </g>
      <g>
        <rect x="913.934072549323901" y="19.999946995800201" width="120.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(948.500633239746094 48.506645202636719)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">filesink</tspan></text>
      </g>
      <g>
        <line x1="888.400047302246094" y1="44.999948501586914" x2="908.376670837402344" y2="44.999948501586914" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.355369567871094 48.490381240844727 913.400047302246094 44.999948501586914 907.355369567871094 41.509515762329102 907.355369567871094 48.490381240844727"></polygon>
      </g>
      <g>
        <line x1="226.068107604980469" y1="70.461526870727539" x2="226.068107604980469" y2="90.438165664671942" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 89.416872024536133 226.068107604980469 95.461526870727539 229.558555603027344 89.416872024536133 222.577690124511719 89.416872024536133"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="173.776563924028778" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(190.747562408447266 202.283449172973633)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlsnpe</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="147.39708137512207" x2="226.068107604980469" y2="167.373720169067383" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 166.352434158325195 226.068107604980469 172.39708137512207 229.558555603027344 166.352434158325195 222.577690124511719 166.352434158325195"></polygon>
      </g>
      <g>
        <rect x="146.068116903908958" y="251.535518244678315" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(163.790531158447266 280.042390823364258)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="226.068107604980469" y1="225.156038284301758" x2="226.068107604980469" y2="245.132692337036133" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="222.577690124511719 244.111391067505792 226.068107604980469 250.156038284301758 229.558555603027344 244.111391067505792 222.577690124511719 244.111391067505792"></polygon>
      </g>
      <g>
        <polyline points="306.068107604980469 276.535524368286133 371.602134704589844 276.535524368286133 371.602134704589844 75.484888076782227" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
        <polygon points="375.092582702636719 76.506181716918945 371.602134704589844 70.461526870727539 368.111717224122003 76.506181716918945 375.092582702636719 76.506181716918945"></polygon>
      </g>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
            execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ol_f5k_g5n_vbc"><br>                                <li class="li">Collects the video stream (source) and creates two copies of the<br>                                        source:<ul class="ul" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ul_n44_nwl_vbc"><br>                                        <li class="li">One stream is sent to the qtivcomposer plugin to retain<br>                                            the video stream.</li><br><br>                                        <li class="li">The other stream is sent to an ML inferencing branch in<br>                                            the pipeline.</li><br><br>                                    </ul><br></li><br><br>                            </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ol_xsf_q5l_vbc"><br>                                <li class="li">Receives the video stream on its sink pad.</li><br><br>                                <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ul_ff2_twl_vbc"><br>                                        <li class="li">Color conversion</li><br><br>                                        <li class="li">Scaling down/up</li><br><br>                                        <li class="li">Normalization on the stream data when the model expects<br>                                            the floating point values as input</li><br><br>                                    </ul><br></li><br><br>                                <li class="li">Converts the video stream to a tensor stream on its source<br>                                        pad.<p class="p">The segmentation model uses this tensor stream for<br>                                        inferencing.</p><br></li><br><br>                            </ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlsnpe.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ol_lfr_35n_vbc"><br>                                <li class="li">Loads the segmentation model.</li><br><br>                                <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                <li class="li">Runs the inference and produces a tensor stream with the<br>                                    segmentation results on its source pad.</li><br><br>                            </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ol_mtr_k5n_vbc"><br>                                <li class="li">Receives the inference tensors on its sinkpad.</li><br><br>                                <li class="li">Converts the inference tensors into video formats that the<br>                                    multimedia plugins can process later.</li><br><br>                                <li class="li">Produces the semantic segmentations for the frame.</li><br><br>                                <li class="li">Loads the corresponding modules for the segmentation<br>                                        models.<p class="p">In this use case, qtimlpostprocess does the<br>                                        following: </p><ol class="ol" type="a" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ol_ntr_k5n_vbc"><br>                                        <li class="li">Loads deeplab-argmax submodule.</li><br><br>                                        <li class="li">Produces video frames with segmentation masks.</li><br><br>                                        <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                    </ol><br><br>                                </li><br><br>                            </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ol_nmc_lxl_vbc"><br>                                <li class="li">Receives the original video stream with segmentation mask on its<br>                                    sinkpads. </li><br><br>                                <li class="li">Produces on its sourcepad GST buffers with contents composed of<br>                                    video streams from its sinkpads.</li><br><br>                            </ol> |
| [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) | <ol class="ol" id="single-camera-stream-with-image-segmentation-and-encode-with-deeplabv3-quantized__ol_wsc_bsn_vbc"><br>                                <li class="li">Applies parameters to each frame of the video stream it's<br>                                    receiving on its sinkpad.</li><br><br>                                <li class="li">Encodes it into bitstream and sends it over its sourcepad.</li><br><br>                            </ol> |
| h264parse | Adds more information about the bitstream to the GStreamer buffer<br>                            meta. |
| mp4mux | Receives these buffers and creates containers format specification<br>                            buffers. |
| **Output** | **Output** |
| Filesink | Stores the resulting stream in a<br>                                /etc/media/video.mp4 file. |
| Playback | Pull video.mp4 from the host computer and play<br>                            it on a media player:<br>`scp root@<IP address of target<br>                                    device>:/etc/media/video.mp4 <destination<br>                                directory>` |

**Parent Topic:** [Qualcomm Neural Processing SDK use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/qualcomm-neural-processing-sdk-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Image segmentation and display with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-segmentation-and-display-with-deeplabv3-quantized.md) [Next Topic
Run multimedia use cases](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/multimedia-use-cases.md)