# Image classification and encode with Neural Processing SDK Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.html) The use cases implement the InceptionV3 image classification model with Qualcomm Neural Processing SDK to classify scenes from a single camera stream and either overlay or compose the classification labels. The streams are then encoded. You can use any publicly available classification model with LiteRT and convert it to `.dlc` format. For instructions, see [TensorFlow Model Conversion](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/model_conv_tensorflow.html). Note: For Ubuntu Server, `sudo` access is necessary to write the encoded stream to the `/etc/media` folder. ## Use qtivoverlay plugin to apply classification overlay Run the use case on the target device: gst-launch-1.0 -e \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! video/x-raw,format=NV12_Q08C,width=1280,height=720,interlace-mode=progressive,colorimetry=bt601 ! \ v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \ split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/inceptionv3.dlc ! queue ! qtimlpostprocess \ settings="{\"confidence\": 40.0}" results=2 module=mobilenet-softmax labels=/etc/labels/classification.json ! text/x-raw ! queue ! metamux.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Classify scenes from a video stream coming through a camera source. 2. Overlay the classification labels using overlaylib. 3. Encode this stream as an H.264 bitstream. 4. Multiplex the stream in an MP4 container and store it as an MP4 file. Figure : Pipeline for classification overlay and encode The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | Source |

The video stream is collected from a camera source plugin
and two copies are created:
- One stream is sent to the qtimetamux plugin to
  retain the video stream.
- The other stream is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The classification model uses this tensor stream
for inferencing.

Loads the model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
inference results on its source pad.

Receives the inference tensors from a classification model
on its sinkpad.

Converts the tensors into formats such as video or text that
the multimedia plugins can process later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules of the classification
models.
In this use case, qtimlpostprocess does the
following:
1. Loads the submodule of the model.
2. Produces results as video frames with classification
  labels.
3. Sends them to the sinkpad of qtimetamux.

| | [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) |

Receives the video and text streams with the classification
results corresponding to the video stream on its
sinkpads.

Produces GST buffers with the contents of the video stream
on its sink pad.

Adds classification result from the data sinkpad to GST
buffer meta (meta muxing) on its source pad.

| | [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) |

Receives the multiplexed stream.

Overlays the classification labels on the VideoFrame using
CL.

Produces GST buffers with overlays in its source pad.

| | [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) |

Applies parameters to each frame of the video stream it's
receiving on its sinkpad.

Encodes it into bitstream and sends it over its
sourcepad.

| | h264parse | Adds more information about the bitstream to GStreamer buffer
meta. | | mp4mux | Receives these buffers and creates containers with format
specification buffers. | | **Output** | **Output** | | Filesink | Stores the resulting stream in a
/etc/media/video.mp4 file. | | Playback | Pull video.mp4 from the host computer and
play it on a media player:
`scp root@ target device>:/etc/media/video.mp4 directory>` | ## Use qtivcomposer to mix original frame with classification mask Run the use case on the target device: gst-launch-1.0 -e --gst-debug=2 \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtivcomposer name=mixer sink_1::position="<30, 30>" sink_1::dimensions="<320, 180>" ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \ v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \ split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/inceptionv3.dlc ! queue ! qtimlpostprocess settings="{\"confidence\": 40.0}" \ results=2 module=mobilenet-softmax labels=/etc/labels/classification.json ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: - Classify scenes from a video stream coming through a camera source. - Compose classification labels and video stream together using qtivcomposer. - Encode this stream as an H.264 bitstream. - Multiplex the stream in an MP4 container and store it as an MP4 file. Figure : Pipeline for classification and encode with qtivcomposer The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | Source |

The video stream is collected from a camera source plugin
and two copies are created:
- One stream is sent to the qtivcomposer plugin to
  retain the video stream.
- The other stream is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The classification model uses this tensor stream
for inferencing.

Loads the model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
inference results on its source pad.

Receives the inference tensors from a classification model
on its sinkpad.

Converts the tensors into formats such as video or text that
the multimedia plugins can process later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules of the classification
models.
In this use case, qtimlpostprocess does the
following:
1. Loads the submodule of the model.
2. Produces results as video frames with classification
  labels.
3. Sends them to the sinkpad of qtivcomposer.

| | [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) |

Receives original video stream and video stream with
classification results on its sinkpads.

On its sourcepad, produces GST buffers with the contents
composed of video streams from its sinkpads.

| | [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) |

Applies parameters to each frame of the video stream it's
receiving on its sinkpad.

Encodes it into bitstream and sends it over its
sourcepad.