# Pose estimation and display with LiteRT Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-display.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-pose-estimation-and-display.html) The use cases implement the HRNet LiteRT model to process a single camera stream with pose estimation. ## Use qtivoverlay plugin to apply pose estimation overlay Run the use case on the target device: gst-launch-1.0 -e \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! waylandsink fullscreen=true sync=false \ split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \ external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/hrnet_pose_quantized.tflite ! queue ! \ qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json ! text/x-raw ! queue ! metamux.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Identify poses of people in scenes from a video stream, which is coming through a camera source. 2. Overlay the available poses using overlaylib. 3. Display the results. Figure : Pipeline for pose estimation overlay The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
1. One is sent to the qtimetamux plugin to retain the
  video stream.
2. The other is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The HRNet model uses this tensor stream for
inferencing.

Loads the HRNet model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
pose estimation results on its source pad.

Receives the inference tensors from a HRNet model on its
sinkpad.

Converts the tensors into formats such as video or text that
the multimedia plugins can process later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules of the pose estimation
models.
In this use case, qtimlpostprocess does the
following:
1. Loads the HRNet submodule.
2. Produces results as structures of text.
3. Sends them to the sinkpad of qtimetamux.

| | [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) |

Receives the video and text streams with pose results
corresponding to the video stream on its sinkpads.

Produces GST buffers with the contents of video stream on
its sink pad.

Adds poses from data sinkpad to GST buffer meta (meta
muxing) on its source pad.

| | [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) |

Receives the multiplexed stream.

Overlays the poses on the VideoFrame using CL.

Produces GST buffers with overlays in its source pad.

Receives the video stream on its sinkpad.

Submits the video stream to Weston.

Weston renders the following on a local display device:
1. The video stream captured from the camera.
2. The poses generated for several people in that
  scene.

| ## Use qtivcomposer to mix original frame with pose estimation mask Run the use case on the target device: gst-launch-1.0 -e --gst-debug=2 \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtivcomposer name=mixer ! queue ! waylandsink fullscreen=true sync=false \ split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \ model=/etc/models/hrnet_pose_quantized.tflite ! queue ! qtimlpostprocess results=2 module=hrnet labels=/etc/labels/hrnet_pose.json settings=/etc/labels/hrnet_settings.json \ ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Identify poses of people in the scenes from a video stream, which is coming through a camera source. 2. Compose the poses and video stream using qtivcomposer. 3. Display the results. Figure : Pipeline for pose estimation mask using qtivcomposer The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
- One is sent to the qtimetamux plugin to retain the
  video stream.
- The other is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The HRNet model uses this tensor stream for
inferencing.

Loads the HRNet model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
pose estimation results on its source pad.

Receives the inference tensors from a HRNet model on its
sinkpad.

Converts the tensors into formats such as video or text that
the multimedia plugins can process later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules of the pose estimation
models.
In this use case, qtimlpostprocess does the
following:
1. Loads the HRNet submodule.
2. Produces results as video frames with poses
  drawn.
3. Sends them to the sinkpad of qtivcomposer.

| | [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) |

Receives the original video stream and the video stream of
poses on its sinkpads.

On its sourcepad, produces the GST buffers with the contents
composed of video streams from its sinkpads.

Receives the video stream on its sinkpad.

Submits the video stream to Weston.

Weston renders the following on a local display device:
1. The video stream captured from the camera.
2. The poses generated for several people in that
  scene.

| **Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/tensorflow-lite-use-cases.html) Last Published: Feb 20, 2026 [Previous Topic Image segmentation and encode with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-segmentation-and-encode.md) [Next Topic Pose estimation and encode with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-pose-estimation-and-encode.md)