# Pose estimation and encode with LiteRT Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-encode.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-encode.html) The use cases implement the HRNet LiteRT model to process a single camera stream with pose estimation and encode the stream as an H.264 bitstream. Note: For Ubuntu Server, `sudo` access is necessary to write the encoded stream to the `/etc/media` folder. ## Use qtivoverlay plugin to apply pose estimation overlay Run the use case on the target device: gst-launch-1.0 -e \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! v4l2h264enc capture-io-mode=4 output-io-mode=5 ! \ h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \ split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \ external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/hrnet_pose_quantized.tflite ! queue ! \ qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json ! text/x-raw ! queue ! metamux.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Identify poses of people in the scenes from the video stream coming through the camera source. 2. Overlay the available poses using overlaylib. 3. Encode the stream as an H.264 bitstream. 4. Multiplex the stream in an MP4 container and store it as an MP4 file. Figure : Pipeline for pose estimation and encode using qtioverlay The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
- One stream is sent to the qtimetamux plugin to
  retain the video stream.
- The other stream is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The HRNet model uses this tensor stream for
inferencing.

Loads the HRNet model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
pose estimation results on its source pad.

Receives the inference tensors from a HRNet model on its
sinkpad.

Converts the tensors into formats such as video or text that
the multimedia plugins can process later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules of the pose estimation
models.
In this use case, qtimlpostprocess does the
following:
1. Loads the HRNet submodule.
2. Produces results as structures of text.
3. Sends them to the sinkpad of qtimetamux.

| | [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) |

Receives the video stream and text stream with pose results
corresponding to video stream on its sinkpads.

Produces GST buffers with the contents of video stream on
its sink pad.

Adds poses from data sinkpad to GST buffer meta (meta
muxing) on its source pad.

| | [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) |

Receives the multiplexed stream.

Overlays the poses on the VideoFrame using CL.

Produces GST buffers with overlays in its source pad.

| | [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) |

Applies parameters to each frame of the video stream it's
receiving on its sinkpad.

Encodes it into bitstream and sends it over its
sourcepad.

| | h264parse | Adds more information about the bitstream to the GStreamer buffer
meta. | | mp4mux | Receives the buffers and creates containers format specification
buffers. | | **Output** | **Output** | | Filesink | Stores the resulting stream in a
/etc/media/video.mp4 file. | | Playback | Pull video.mp4 from the host computer and
play it on a media player:
`scp root@ target device>:/etc/media/video.mp4 directory>` | ## Use qtivcomposer to mix original frame with pose estimation mask Run the use case on the target device: gst-launch-1.0 -e --gst-debug=2 \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! video/x-raw,format=NV12,width=1920,height=1080,interlace-mode=progressive,colorimetry=bt601 ! \ v4l2h264enc capture-io-mode=4 output-io-mode=5 ! h264parse ! queue ! mp4mux ! queue ! filesink location=/etc/media/video.mp4 \ split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \ model=/etc/models/hrnet_pose_quantized.tflite ! queue ! qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json \ ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Classify scenes from the video stream coming through a camera source. 2. Compose the poses and video stream together using qtivcomposer. 3. Encode this stream as an H.264 bitstream. 4. Multiplex in an MP4 container and storing it as an MP4 file. Figure : Pipeline for pose estimation and encode using qtivcomposer The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
- One stream is sent to the qtivcomposer plugin to
  retain the video stream.
- The other stream is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The HRNet model uses this tensor stream for
inferencing.

Loads the HRNet model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
pose estimation results on its source pad.

Receives the inference tensors from a HRNet model on its
sinkpad.

Converts the tensors into formats such as video or text that
the multimedia plugins can process later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules of the pose estimation
models.
In this use case, qtimlpostprocess does the
following:
1. Loads the HRNet submodule.
2. Produces results as video frames with poses
  drawn.
3. Sends them to the sinkpad of the qtivcomposer.

| | |

Receives the original video stream and video stream of poses
on its sinkpads.

On its sourcepad, produces the GST buffers with the contents
composed of video streams from its sinkpads.

| | [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) |

Receives the multiplexed stream.

Overlays the poses on the VideoFrame using CL.

Produces GST buffers with overlays in its source pad.

| | [v4l2h264enc](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264enc.html) |

Applies parameters to each frame of the video stream that
it's receiving on its sinkpad.

Encodes it into bitstream and sends it over its
sourcepad.