# Object detection using USB camera source

The use case streams video from a USB webcam connected to the Qualcomm EVK. This webcam should be accessible as a `/dev/videoX` device. Additionally, you can perform object detection and preview the results.

Note

For USB camera input, set the `video-format`, `resolution`, and `framerate` parameters in the configuration file to match the capabilities of the camera. To check the camera capabilities, see [Configure USB camera](https://docs.qualcomm.com/bundle/publicresource/topics/80-70029-8/usb.html#configure-usb-camera).

Run the following commands on the target device for different use cases:

- MJPEG video format:

gst-launch-1.0 -v -e --gst-debug=2 v4l2src device="/dev/video2" ! image/jpeg,width=1920,height=1080,framerate=30/1 ! jpegdec ! \
        videoconvert ! video/x-raw,format=NV12 ! qtivtransform ! queue ! tee name=split ! queue ! qtivcomposer name=mixer ! queue ! \
        fpsdisplaysink sync=true text-overlay=true video-sink="waylandsink sync=true fullscreen=true" split. ! queue ! qtimlvconverter ! queue ! \
        qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
        model=/etc/models/yolox_quantized.tflite ! queue ! qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/etc/labels/yolox.json \
        ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.
        Copy to clipboard
- YUY2 video format:

gst-launch-1.0 -v -e --gst-debug=2 v4l2src io-mode=4 device="/dev/video2" ! \
        video/x-raw,format=YUY2,width=640,height=480,framerate=30/1 ! queue ! tee name=split ! queue ! \
        qtivcomposer name=mixer ! queue ! fpsdisplaysink sync=true text-overlay=true video-sink="waylandsink sync=true fullscreen=true" split. ! \
        queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
        external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/yolox_quantized.tflite ! queue ! \
        qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/etc/labels/yolox.json \
        ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.
        Copy to clipboard
- NV12 video format:

gst-launch-1.0 -v -e --gst-debug=2 v4l2src io-mode=4 device="/dev/video2" ! video/x-raw,format=NV12,width=640,height=480,framerate=30/1 ! queue ! tee name=split split. ! \
        queue ! qtivcomposer name=mixer ! queue ! waylandsink fullscreen=true split. ! queue ! qtimlvconverter ! queue ! \
        qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
        model=/etc/models/yolox_quantized.tflite ! queue ! qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/etc/labels/yolox.json \
        ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.
        Copy to clipboard

The following figures show a pipeline, which processes the input from the USB camera to generate various outputs.

<?xml version="1.0" encoding="UTF-8"?>
<svg id="Layer_3" data-name="Layer 3" xmlns="http://www.w3.org/2000/svg" width="1281.08" height="147.76" viewbox="0 0 1281.08 147.76" aria-label="../../_images/gst_usb_single_camera_display-app_flow.svg">
  <g>
    <rect x=".5" y=".5" width="1280.08" height="146.76" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
    <path d="M1273.08,1c3.86,0,7,3.14,7,7v131.76c0,3.86-3.14,7-7,7H8c-3.86,0-7-3.14-7-7V8c0-3.86,3.14-7,7-7h1265.08M1273.08,0H8C3.58,0,0,3.58,0,8v131.76c0,4.42,3.58,8,8,8h1265.08c4.42,0,8-3.58,8-8V8c0-4.42-3.58-8-8-8h0Z" style="fill: #d2d7e1;"></path>
  </g>
  <rect x="1220" y="41.09" width="40" height="28" rx="4" ry="4" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></rect>
  <line x1="1232" y1="77.09" x2="1248" y2="77.09" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></line>
  <line x1="1240" y1="69.09" x2="1240" y2="77.09" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></line>
  <rect x="75.8" y="20" width="90" height="75" rx="4" ry="4" style="fill: #007884;"></rect>
  <line x1="60.8" y1="57.5" x2="75.8" y2="57.5" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
  <line x1="1205" y1="57.5" x2="1220" y2="57.5" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
  <text transform="translate(94.98 62.18)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">v4l2src</tspan></text>
  <rect x="385.8" y="35" width="120" height="60" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <g>
    <line x1="365.8" y1="65" x2="380.06" y2="65" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="378.89 68.99 385.8 65 378.89 61.01 378.89 68.99"></polygon>
  </g>
  <text transform="translate(395.75 69.09)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimlvtransform</tspan></text>
  <rect x="525.8" y="35" width="120" height="60" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <g>
    <line x1="505.8" y1="65" x2="520.06" y2="65" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="518.89 68.99 525.8 65 518.89 61.01 518.89 68.99"></polygon>
  </g>
  <text transform="translate(537.26 69.09)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
  <rect x="665.8" y="35" width="120" height="60" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <g>
    <line x1="645.8" y1="65" x2="660.06" y2="65" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="658.89 68.99 665.8 65 658.89 61.01 658.89 68.99"></polygon>
  </g>
  <text transform="translate(693 52.29)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimltflite/</tspan><tspan x="-1" y="16.8">qtimlsnpe/</tspan><tspan x="5.29" y="33.6">qtimlqnn</tspan></text>
  <rect x="805.8" y="35" width="120" height="60" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <g>
    <line x1="785.8" y1="65" x2="800.05" y2="65" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="798.89 68.99 805.8 65 798.89 61.01 798.89 68.99"></polygon>
  </g>
  <text transform="translate(811.3 69.09)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
  <rect x="945.8" y="20" width="120" height="75" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <g>
    <line x1="925.8" y1="65" x2="940.05" y2="65" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="938.89 68.99 945.8 65 938.89 61.01 938.89 68.99"></polygon>
  </g>
  <text transform="translate(963.04 61.59)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
  <rect x="1085.8" y="20" width="120" height="75" rx="4" ry="4" style="fill: #007884;"></rect>
  <g>
    <line x1="1065.8" y1="57.5" x2="1080.05" y2="57.5" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="1078.89 61.49 1085.8 57.5 1078.89 53.51 1078.89 61.49"></polygon>
  </g>
  <text transform="translate(1104.7 61.59)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Waylandsink </tspan></text>
  <rect x="185.8" y="20" width="90" height="75" rx="4" ry="4" style="fill: #007884;"></rect>
  <g>
    <line x1="165.8" y1="57.5" x2="180.06" y2="57.5" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="178.89 61.49 185.8 57.5 178.89 53.51 178.89 61.49"></polygon>
  </g>
  <text transform="translate(201.6 61.59)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">capsfilter</tspan></text>
  <rect x="295.8" y="20" width="70" height="75" rx="4" ry="4" style="fill: #007884;"></rect>
  <g>
    <line x1="275.8" y1="57.5" x2="290.06" y2="57.5" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="288.89 61.49 295.8 57.5 288.89 53.51 288.89 61.49"></polygon>
  </g>
  <text transform="translate(321.09 61.59)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">tee</tspan></text>
  <g>
    <line x1="365.75" y1="28.88" x2="940.05" y2="28.88" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="938.89 32.87 945.8 28.88 938.89 24.89 938.89 32.87"></polygon>
  </g>
  <circle cx="40" cy="54.96" r="20" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></circle>
  <circle cx="40" cy="54.96" r="7.5" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></circle>
  <path d="M27.5,84.96h25" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
  <path d="M40,84.96v-10" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
  <g>
    <g>
      <text transform="translate(1082.92 125.85)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect x="1062.67" y="113.76" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
    </g>
    <g>
      <text transform="translate(1181.51 125.85)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
      <rect x="1161.26" y="113.76" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
    </g>
  </g>
</svg>
**Figure : Object detection using USB camera source**

The following table provides the sequential processing stages of the pipeline execution:

| Pipeline | Description |
| --- | --- |
| USB camera and object detection on wayland | <ol class="arabic simple"><br><li><p>USB camera captures the camera live stream.</p></li><br><li><p>Capsfilter is applied to enforce constraints on the raw video data.</p></li><br><li><p>tee is used to split the stream for inferencing.</p></li><br><li><p><a class="xref cursorpointer" onclick="Window.BookmapComponent.navigateFile('qtivtransform.html')"><span class="doc">qtivtransform</span></a> transforms the stream data.</p></li><br><li><p><a class="xref cursorpointer" onclick="Window.BookmapComponent.navigateFile('qtimlvconverter.html')"><span class="doc">qtimlvconverter</span></a> does preprocessing and converts the video stream to a tensor stream, which is used for inferencing.</p></li><br><li><p><a class="xref cursorpointer" onclick="Window.BookmapComponent.navigateFile('qtimltflite.html')"><span class="doc">qtimltflite</span></a> runs the inference on the stream.</p></li><br><li><p>qtimlpostprocess handles the inference results from any object detection model and produces video frames.</p></li><br><li><p><a class="xref cursorpointer" onclick="Window.BookmapComponent.navigateFile('qtivcomposer.html')"><span class="doc">qtivcomposer</span></a> composes the video frames and shares them with Waylandsink.</p></li><br><li><p><a class="xref cursorpointer" onclick="Window.BookmapComponent.navigateFile('waylandsink.html')"><span class="doc">Waylandsink</span></a> submits the composed video stream to Weston, which renders it on the local display.</p></li><br></ol> |

Last Published: Apr 02, 2026

Previous Topic
 
Four stream batching with LiteRT Next Topic

Qualcomm Neural Processing SDK use cases