# Audio classification decode and display with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification-with-litert.html](https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification-with-litert.html)

The use cases implement the YAMNet LiteRT model to classify and decode audio samples
        from a microphone and a file source.

To stop the use cases, use CTRL + C.

## Audio classification on audio samples from microphone

Note: This use case isn't applicable for the current
                release.

Run this use case on the target
                device:

    gst-launch-1.0 -v pulsesrc ! audio/x-raw,format=S16LE ! audiobuffersplit output-buffer-size=31200 ! \
    qtimlaconverter sample-rate=16000 feature=lmfe params="params,nfft=96,nhop=160,nmels=64,chunklen=0.96;" ! queue ! \
    qtimltflite model=/etc/models/yamnet.tflite ! qtimlpostprocess module=yamnet labels=/etc/labels/yamnet.json ! \
    video/x-raw,width=640,height=360 ! queue ! waylandsink sync=false fullscreen=trueCopy to clipboard

The following figure shows the flow of the use case execution:

Figure : Pipeline flow for audio classification and display
                
                <?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="1085.00088882446471" height="158.20159912109375" viewbox="0 0 1085.00088882446471 158.20159912109375">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".500034332275391" y=".499801635742188" width="1084.00048828125" height="157.20166015625" rx="7.499999999999998" ry="7.499999999999998" style="fill: #fafafa;"></rect>
      <path d="M1077.00088882446471,1c3.85986328125,0,7,3.140132904052734,7,7v142.20159912109375c0,3.85986328125-3.14013671875,7-7,7H8.000003814697266c-3.859870910644531,0-7.000003814697266-3.14013671875-7.000003814697266-7V8c0-3.859867095947266,3.140132904052734-7,7.000003814697266-7h1069.000885009767444M1077.00088882446471,0H8.000003814697266C3.581600189208984,0,0,3.581733703613281,0,8v142.20159912109375c0,4.41827392578125,3.581600189208984,8,8.000003814697266,8h1069.000885009767444c4.418334960941138,0,8-3.58172607421875,8-8V8c0-4.418266296386719-3.581665039058862-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(886.666629791259766 134.293106079101562)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="866.415592193603516" y="122.20159912109375" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(985.248416900634766 134.293106079101562)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="964.997373606456676" y="122.20159912109375" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="370.002018577171839" y="52.201568603515625" width="150.000000000000909" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(389.052829742431641 80.708282470703125)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlaconverter</tspan></text>
      </g>
      <g>
        <line x1="520.002048492431641" y1="77.201568603515625" x2="539.261020660400391" y2="77.201568603515625" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="538.093845367431641 81.190628051757812 545.002048492431641 77.201568603515625 538.093845367431641 73.212501525878906 538.093845367431641 81.190628051757812"></polygon>
      </g>
      <g>
        <line x1="345.000980377197266" y1="77.201568603515625" x2="364.259983062744141" y2="77.201568603515625" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="363.092807769775391 81.190628051757812 370.000980377197266 77.201568603515625 363.092807769775391 73.212501525878906 363.092807769775391 81.190628051757812"></polygon>
      </g>
      <g>
        <rect x="545.00201857717002" y="52.201568603515625" width="150" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(590.377048492431641 80.708282470703125)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimflite</tspan></text>
      </g>
      <g>
        <line x1="695.002048492431641" y1="77.201568603515625" x2="714.261020660400391" y2="77.201568603515625" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="713.093845367431641 81.190628051757812 720.002048492431641 77.201568603515625 713.093845367431641 73.212501525878906 713.093845367431641 81.190628051757812"></polygon>
      </g>
      <g>
        <rect x="720.00201857717002" y="52.201568603515625" width="170" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(742.725162506103516 80.708282470703125)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="890.002048492431641" y1="77.201568603515625" x2="909.261020660400391" y2="77.201568603515625" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="908.093845367431641 81.190628051757812 915.002048492431641 77.201568603515625 908.093845367431641 73.212501525878906 908.093845367431641 81.190628051757812"></polygon>
      </g>
      <g>
        <rect x="915.00201857717002" y="52.201568603515625" width="150.000000000005457" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(946.099826812744141 80.708282470703125)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">waylandsink</tspan></text>
      </g>
      <g>
        <rect x="20.000979774878942" y="52.201568603515625" width="139.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(59.778354644775391 80.708282470703125)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">pulsesrc</tspan></text>
      </g>
      <g>
        <line x1="160.000980377197266" y1="77.201568603515625" x2="179.259998321533203" y2="77.201568603515625" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="178.092807769775391 81.190628051757812 185.000980377197266 77.201568603515625 178.092807769775391 73.212501525878906 178.092807769775391 81.190628051757812"></polygon>
      </g>
      <g>
        <rect x="185.000979774878033" y="52.201568603515625" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(208.715824127197266 80.708282470703125)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">audiobuffersplit</tspan></text>
      </g>
    </g>
    <text transform="translate(512.106967926025391 31.536155700683594)" style="font-family: Roboto-Bold, Roboto; font-size: 16px; font-weight: 700;"><tspan x="0" y="0">GST BIN</tspan></text>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

Table : Pipeline processing stages for audio classification

| Process | Description |
| --- | --- |
| [pulsesrc](https://docs.qualcomm.com/doc/80-70022-50/topic/pulsesrc.html) | Collects the audio stream (source) from the microphone. |
| audiobuffersplit | Splits the incoming audio buffers into equal sized<br>                                chunks. |
| [qtimlaconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlaconverter.html) | Performs preprocessing on the audio stream and converts the<br>                                stream to a tensor stream.<br>The audio classification model uses<br>                                    this tensor stream for inferencing. |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | <ol class="ol" id="audio-classification-with-litert__ol_bwn_s5l_vbc"><br>                                    <li class="li">Loads the model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        inference results on its source pad.</li><br><br>                                </ol> |
| qtimlpostprocess | Handles the audio classification inference results:<ol class="ol" id="audio-classification-with-litert__ol_ol3_dky_kbc"><br>                                    <li class="li">Applies a threshold to the chosen number of results.</li><br><br>                                    <li class="li">Creates text overlay for classes.</li><br><br>                                </ol> |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70022-50/topic/waylandsink.html) | <ol class="ol" id="audio-classification-with-litert__ol_c54_cp2_s2c"><br>                                    <li class="li">Receives the video and audio streams on its sinkpad.</li><br><br>                                    <li class="li">Submits the streams to Weston. </li><br><br>                                    <li class="li">Weston renders the video stream and the classified audio<br>                                        generated for that scene on a local display device.</li><br><br>                                </ol> |

## Audio classification and decode on audio samples from file source

- Run the use case on the target device using a FLAC
                        decoder:

        gst-launch-1.0 -e --gst-debug=2 filesrc location=/etc/media/video-flac.mp4  ! qtdemux name=demux demux. ! queue ! h264parse ! \
        v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw, format=NV12 ! qtivcomposer name=mixer sink_1::position="<50, 50>" sink_1::dimensions="<368, 64>" ! \
        queue ! waylandsink fullscreen=true demux. ! queue ! flacparse ! flacdec ! queue ! audioconvert ! audioresample ! \
        audiobuffersplit output-buffer-size=31200 ! queue ! qtimlaconverter  sample-rate=16000 feature=lmfe params="params,nfft=96,nhop=160,nmels=64,chunklen=0.96;" ! \
        queue ! qtimltflite name=infeng model=/etc/models/yamnet.tflite ! qtimlpostprocess name=postproc settings="{\"confidence\": 10.0}" results=3 module=yamnet \
        labels=/etc/labels/yamnet.json ! video/x-raw,format=BGRA,width=368,height=64 ! queue ! mixer.Copy to clipboard

Figure : Pipeline flow for audio classification and decode–FLAC
                            decoder
                        
                        <?xml version="1.0" encoding="UTF-8"?>
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" width="1280.000000000000909" height="247.548997986130416" viewbox="0 0 1280.000000000000909 247.548997986130416">
  <g>
    <rect x=".499877929688409" y=".500261413864791" width="1279.00048828125" height="246.548828125" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
    <path d="M1272.000000000000909.999999987892807c3.85986328125,0,7,3.140166878700256,7,7.00000011920929v231.54899787902832c0,3.859832763672784-3.14013671875,7-7,7H8c-3.859832763670966,0-7-3.140167236327216-7-7V8.000000107102096C1,4.140166866593063,4.140167236329034.999999987892807,8,.999999987892807h1264.000000000000909M1272.000000000000909,0H8C3.581665039063409,0,0,3.581833231262863,0,8.000000107102096v231.54899787902832c0,4.418167114257812,3.581665039063409,8,8,8h1264.000000000000909c4.4183349609375,0,8-3.581832885742188,8-8V8.000000107102096c0-4.418166875839233-3.5816650390625-8.000000107102096-8-8.000000107102096h0Z" style="fill: #d2d7e1;"></path>
  </g>
  <text transform="translate(609.605392456055597 30.799913513474166)" style="font-family: Roboto-Bold, Roboto; font-size: 16px; font-weight: 700;"><tspan x="0" y="0">GST BIN</tspan></text>
  <g>
    <line x1="119.5" y1="79.078264343552291" x2="138.759017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="137.591827392579034 83.06733142118901 144.5 79.078264343552291 137.591827392579034 75.089204895310104 137.591827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="245.000000000000909" y1="79.078264343552291" x2="264.259017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="263.091827392579034 83.06733142118901 270.000000000000909 79.078264343552291 263.091827392579034 75.089204895310104 263.091827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="369.500003814698175" y1="79.078264343552291" x2="388.759017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="387.591827392579034 83.06733142118901 394.500003814698175 79.078264343552291 387.591827392579034 75.089204895310104 387.591827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="515.000000000001819" y1="79.078264343552291" x2="534.259017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="533.091827392579034 83.06733142118901 540.000000000001819 79.078264343552291 533.091827392579034 75.089204895310104 533.091827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="659.500000000000909" y1="79.078264343552291" x2="678.759033203125909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="677.591827392579034 83.06733142118901 684.500000000000909 79.078264343552291 677.591827392579034 75.089204895310104 677.591827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="804.000000000000909" y1="79.078264343552291" x2="823.259033203125909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="822.091827392579034 83.06733142118901 829.000000000000909 79.078264343552291 822.091827392579034 75.089204895310104 822.091827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="949.500000000000909" y1="79.078264343552291" x2="968.759033203125909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="967.591796875000909 83.06733142118901 974.500000000000909 79.078264343552291 967.591796875000909 75.089204895310104 967.591796875000909 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="1094.302001953125909" y1="79.078264343552291" x2="1113.561035156250909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="1112.393798828125909 83.06733142118901 1119.302001953125909 79.078264343552291 1112.393798828125909 75.089204895310104 1112.393798828125909 83.06733142118901"></polygon>
  </g>
  <rect x="20.000003373463187" y="52.843036719346856" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(52.694671630860284 72.918596374802291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audio</tspan><tspan x="-11.89453125" y="16.7998046875">capsfilter</tspan></text>
  <rect x="145.000003373463187" y="52.843036719346856" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(165.978057861329034 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">flacparse</tspan></text>
  <rect x="270.000003373463187" y="52.843036719346856" width="100" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(297.079139709473566 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">flacdec</tspan></text>
  <rect x="395.000003373463187" y="52.843036719346856" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(414.421913146973566 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audioconvert</tspan></text>
  <rect x="540.000003373463187" y="52.843036719346856" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(553.730972290039972 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audioresample</tspan></text>
  <rect x="685.000003373462278" y="52.843036719346856" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #007884;"></rect>
  <text transform="translate(695.750518798829034 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audiobuffersplit</tspan></text>
  <rect x="830.000003373462278" y="52.843036719346856" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <text transform="translate(841.044464111329034 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimlaconverter</tspan></text>
  <rect x="975.000003373462278" y="52.843036719346856" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <text transform="translate(1005.092712402344659 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimltflite</tspan></text>
  <rect x="1120.000003373462278" y="52.843036719346856" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <text transform="translate(1135.507385253907159 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
  <g>
    <line x1="119.5" y1="167.784181701950729" x2="138.759017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="137.591827392579034 171.773241150192916 144.5 167.784181701950729 137.591827392579034 163.795122253708541 137.591827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="245.000000000000909" y1="167.784181701950729" x2="264.259017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="263.091827392579034 171.773241150192916 270.000000000000909 167.784181701950729 263.091827392579034 163.795122253708541 263.091827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="369.500003814698175" y1="167.784181701950729" x2="388.759017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="387.591827392579034 171.773241150192916 394.500003814698175 167.784181701950729 387.591827392579034 163.795122253708541 387.591827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="515.000000000001819" y1="167.784181701950729" x2="534.259017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="533.091827392579034 171.773241150192916 540.000000000001819 167.784181701950729 533.091827392579034 163.795122253708541 533.091827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="659.500000000000909" y1="167.784181701950729" x2="678.759033203125909" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="677.591827392579034 171.773241150192916 684.500000000000909 167.784181701950729 677.591827392579034 163.795122253708541 677.591827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="804.000000000000909" y1="167.784181701950729" x2="823.259033203125909" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="822.091827392579034 171.773241150192916 829.000000000000909 167.784181701950729 822.091827392579034 163.795122253708541 822.091827392579034 171.773241150192916"></polygon>
  </g>
  <rect x="20.000003373463187" y="141.548954072591187" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(50.821624755860284 170.640179741196334)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">filesrc</tspan></text>
  <rect x="145.000003373463187" y="141.548954072591187" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(167.615264892579034 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtdemux</tspan></text>
  <rect x="270.000003373463187" y="141.548954072591187" width="100" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(303.262245178223566 162.240280258469284)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">video</tspan><tspan x="-12.4619140625" y="16.7998046875">capsfilter</tspan></text>
  <rect x="395.000003373463187" y="141.548954072591187" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(421.931190490723566 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">h264parse</tspan></text>
  <rect x="540.000003373463187" y="141.548954072591187" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(560.081558227539972 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">v4l2h264dec</tspan></text>
  <rect x="685.000003373462278" y="141.548954072591187" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
  <text transform="translate(702.248077392579034 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
  <rect x="830.000003373462278" y="141.548954072591187" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(851.585479736329034 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">waylandsink</tspan></text>
  <g>
    <polyline points="63.747772216797784 108.584016907028854 63.747772216797784 122.19599353056401 195 122.19599353056401 195 141.548952209763229" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
    <polygon points="59.758728027344659 109.751215088181198 63.747772216797784 102.843034851364791 67.736846923829034 109.751215088181198 59.758728027344659 109.751215088181198"></polygon>
  </g>
  <g>
    <polyline points="745.000000000000909 135.807970154099166 745.000000000000909 122.19599353056401 1190.096069335938409 122.195879089646041 1190.096069335938409 102.842920410446823" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
    <polygon points="748.989074707032159 134.640779602341354 745.000000000000909 141.548952209763229 741.010925292969659 134.640779602341354 748.989074707032159 134.640779602341354"></polygon>
  </g>
  <g>
    <g>
      <text transform="translate(1090.347961425782159 223.640489685349166)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect x="1070.096971777506951" y="211.548997986130416" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
    </g>
    <g>
      <text transform="translate(1188.929794311524347 223.640489685349166)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
      <rect x="1168.678753190360112" y="211.548997986130416" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
    </g>
  </g>
</svg>
- Run the use case on the target device using the mpg123audiodec
                        decoder:

        gst-launch-1.0 -e --gst-debug=2 \
        filesrc location=/etc/media/video-mp3.mp4 ! qtdemux name=demux demux. ! queue ! h264parse ! v4l2h264dec  capture-io-mode=4 output-io-mode=4 ! \
        video/x-raw, format=NV12 ! qtivcomposer name=mixer  sink_1::position="<50, 50>" sink_1::dimensions="<368, 64>" !  queue  ! \
        waylandsink fullscreen=true demux. ! queue !  mpegaudioparse ! mpg123audiodec ! audioconvert ! audioresample !  \
        audiobuffersplit output-buffer-size=31200 ! queue ! qtimlaconverter  sample-rate=16000  !  queue ! \
        qtimltflite name=infeng model=/etc/models/yamnet.tflite   !  qtimlpostprocess name=postproc settings="{\"confidence\": 10.0}"  \
        results=3 module=yamnet labels=/etc/labels/yamnet.json ! video/x-raw,format=BGRA,width=368,height=64 ! queue ! mixer.Copy to clipboard

Figure : Pipeline flow for audio classification and decode–mpg123audiodec
                            decoder
                        
                        <?xml version="1.0" encoding="UTF-8"?>
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" width="1280.000000000000909" height="247.548997986130416" viewbox="0 0 1280.000000000000909 247.548997986130416">
  <g>
    <rect x=".499877929688409" y=".500261413864791" width="1279.00048828125" height="246.548828125" rx="7.5" ry="7.5" style="fill: #fafafa;"></rect>
    <path d="M1272.000000000000909.999999987892807c3.85986328125,0,7,3.140166878700256,7,7.00000011920929v231.54899787902832c0,3.859832763672784-3.14013671875,7-7,7H8c-3.859832763670966,0-7-3.140167236327216-7-7V8.000000107102096C1,4.140166866593063,4.140167236329034.999999987892807,8,.999999987892807h1264.000000000000909M1272.000000000000909,0H8C3.581665039063409,0,0,3.581833231262863,0,8.000000107102096v231.54899787902832c0,4.418167114257812,3.581665039063409,8,8,8h1264.000000000000909c4.4183349609375,0,8-3.581832885742188,8-8V8.000000107102096c0-4.418166875839233-3.5816650390625-8.000000107102096-8-8.000000107102096h0Z" style="fill: #d2d7e1;"></path>
  </g>
  <text transform="translate(609.605392456055597 30.799913513474166)" style="font-family: Roboto-Bold, Roboto; font-size: 16px; font-weight: 700;"><tspan x="0" y="0">GST BIN</tspan></text>
  <g>
    <line x1="119.5" y1="79.078264343552291" x2="138.759017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="137.591827392579034 83.06733142118901 144.5 79.078264343552291 137.591827392579034 75.089204895310104 137.591827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="245.000000000000909" y1="79.078264343552291" x2="264.259017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="263.091827392579034 83.06733142118901 270.000000000000909 79.078264343552291 263.091827392579034 75.089204895310104 263.091827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="369.500003814698175" y1="79.078264343552291" x2="388.759017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="387.591827392579034 83.06733142118901 394.500003814698175 79.078264343552291 387.591827392579034 75.089204895310104 387.591827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="515.000000000001819" y1="79.078264343552291" x2="534.259017944336847" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="533.091827392579034 83.06733142118901 540.000000000001819 79.078264343552291 533.091827392579034 75.089204895310104 533.091827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="659.500000000000909" y1="79.078264343552291" x2="678.759033203125909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="677.591827392579034 83.06733142118901 684.500000000000909 79.078264343552291 677.591827392579034 75.089204895310104 677.591827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="804.000000000000909" y1="79.078264343552291" x2="823.259033203125909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="822.091827392579034 83.06733142118901 829.000000000000909 79.078264343552291 822.091827392579034 75.089204895310104 822.091827392579034 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="949.500000000000909" y1="79.078264343552291" x2="968.759033203125909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="967.591796875000909 83.06733142118901 974.500000000000909 79.078264343552291 967.591796875000909 75.089204895310104 967.591796875000909 83.06733142118901"></polygon>
  </g>
  <g>
    <line x1="1094.302001953125909" y1="79.078264343552291" x2="1113.561035156250909" y2="79.078264343552291" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="1112.393798828125909 83.06733142118901 1119.302001953125909 79.078264343552291 1112.393798828125909 75.089204895310104 1112.393798828125909 83.06733142118901"></polygon>
  </g>
  <rect x="20.000003373463187" y="52.843036719346856" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(52.694671630860284 72.918596374802291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audio</tspan><tspan x="-11.89453125" y="16.7998046875">capsfilter</tspan></text>
  <rect x="145.000003373463187" y="52.843036719346856" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(159.996612548829034 73.534319031052291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">mpegaudio</tspan><tspan x="17.58203125" y="16.7998046875">parse</tspan></text>
  <rect x="270.000003373463187" y="52.843036719346856" width="100" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(294.218299865723566 73.534319031052291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">mpg123</tspan><tspan x="-2.84375" y="16.7998046875">audiodec</tspan></text>
  <rect x="395.000003373463187" y="52.843036719346856" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(414.421913146973566 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audioconvert</tspan></text>
  <rect x="540.000003373463187" y="52.843036719346856" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(553.730972290039972 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audioresample</tspan></text>
  <rect x="685.000003373462278" y="52.843036719346856" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #007884;"></rect>
  <text transform="translate(695.750518798829034 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">audiobuffersplit</tspan></text>
  <rect x="830.000003373462278" y="52.843036719346856" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <text transform="translate(841.044464111329034 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimlaconverter</tspan></text>
  <rect x="975.000003373462278" y="52.843036719346856" width="120" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <text transform="translate(1005.092712402344659 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimltflite</tspan></text>
  <rect x="1120.000003373462278" y="52.843036719346856" width="140" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
  <text transform="translate(1135.507385253907159 81.349748718552291)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
  <g>
    <line x1="119.5" y1="167.784181701950729" x2="138.759017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="137.591827392579034 171.773241150192916 144.5 167.784181701950729 137.591827392579034 163.795122253708541 137.591827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="245.000000000000909" y1="167.784181701950729" x2="264.259017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="263.091827392579034 171.773241150192916 270.000000000000909 167.784181701950729 263.091827392579034 163.795122253708541 263.091827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="369.500003814698175" y1="167.784181701950729" x2="388.759017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="387.591827392579034 171.773241150192916 394.500003814698175 167.784181701950729 387.591827392579034 163.795122253708541 387.591827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="515.000000000001819" y1="167.784181701950729" x2="534.259017944336847" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="533.091827392579034 171.773241150192916 540.000000000001819 167.784181701950729 533.091827392579034 163.795122253708541 533.091827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="659.500000000000909" y1="167.784181701950729" x2="678.759033203125909" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="677.591827392579034 171.773241150192916 684.500000000000909 167.784181701950729 677.591827392579034 163.795122253708541 677.591827392579034 171.773241150192916"></polygon>
  </g>
  <g>
    <line x1="804.000000000000909" y1="167.784181701950729" x2="823.259033203125909" y2="167.784181701950729" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
    <polygon points="822.091827392579034 171.773241150192916 829.000000000000909 167.784181701950729 822.091827392579034 163.795122253708541 822.091827392579034 171.773241150192916"></polygon>
  </g>
  <rect x="20.000003373463187" y="141.548954072591187" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(50.821624755860284 170.640179741196334)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">filesrc</tspan></text>
  <rect x="145.000003373463187" y="141.548954072591187" width="100" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(167.615264892579034 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtdemux</tspan></text>
  <rect x="270.000003373463187" y="141.548954072591187" width="100" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(303.262245178223566 162.240280258469284)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">video</tspan><tspan x="-12.4619140625" y="16.7998046875">capsfilter</tspan></text>
  <rect x="395.000003373463187" y="141.548954072591187" width="120" height="50" rx="3.999999999999999" ry="3.999999999999999" style="fill: #007884;"></rect>
  <text transform="translate(421.931190490723566 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">h264parse</tspan></text>
  <rect x="540.000003373463187" y="141.548954072591187" width="119.999999999999091" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(560.081558227539972 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">v4l2h264dec</tspan></text>
  <rect x="685.000003373462278" y="141.548954072591187" width="120" height="50" rx="3.999999999999995" ry="3.999999999999995" style="fill: #2a2aea;"></rect>
  <text transform="translate(702.248077392579034 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
  <rect x="830.000003373462278" y="141.548954072591187" width="120" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
  <text transform="translate(851.585479736329034 170.05564986448735)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">waylandsink</tspan></text>
  <g>
    <polyline points="63.747772216797784 108.584016907028854 63.747772216797784 122.19599353056401 195 122.19599353056401 195 141.548952209763229" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
    <polygon points="59.758728027344659 109.751215088181198 63.747772216797784 102.843034851364791 67.736846923829034 109.751215088181198 59.758728027344659 109.751215088181198"></polygon>
  </g>
  <g>
    <polyline points="745.000000000000909 135.807970154099166 745.000000000000909 122.19599353056401 1193.843566894532159 122.19599353056401 1193.843566894532159 102.843034851364791" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
    <polygon points="748.989074707032159 134.640779602341354 745.000000000000909 141.548952209763229 741.010925292969659 134.640779602341354 748.989074707032159 134.640779602341354"></polygon>
  </g>
  <g>
    <g>
      <text transform="translate(1090.347747802735284 223.640489685349166)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect x="1070.096723915195071" y="211.548997986130416" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
    </g>
    <g>
      <text transform="translate(1188.929519653321222 223.640489685349166)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
      <rect x="1168.678505328048232" y="211.548997986130416" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

Table : Pipeline processing stages for audio classification using FLAC and
                    mpg123audiodec decoders

| Plugin | Description |
| --- | --- |
| File source: filesrc | Captures the video and audio stream, followed by qtdemux, which<br>                                demultiplexes the stream. |
| Audio and video capsfilter | Ensures that the video and audio streams are in the correct<br>                                format. |
| <ul class="ul" id="audio-classification-with-litert__ul_asb_ww2_s2c"><br>                                    <li class="li">Audio: mpegaudioparse or  flacparse </li><br><br>                                    <li class="li">Video:  h264parse</li><br><br>                                </ul> | Parses the audio and the H.264 video. |
| <ul class="ul" id="audio-classification-with-litert__ul_zyx_zw2_s2c"><br>                                    <li class="li">Audio: mpg123audiodec or flacdec</li><br><br>                                    <li class="li">Video: <a href="https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264dec.html">v4l2h264dec</a></li><br><br>                                </ul> | Decodes the audio and video. |
| audioconvert | Converts the audio buffers between various possible<br>                                formats. |
| audioresample | Resamples the audio buffers to different sample rates. |
| audiobuffersplit | Splits the incoming audio buffers into equal sized<br>                                chunks. |
| [qtimlaconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlaconverter.html) | Performs preprocessing on the audio stream and converts the<br>                                stream to a tensor stream.<br>The audio classification model uses<br>                                    this tensor stream for inferencing. |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | Performs inferencing using the YAMNet model. |
| qtimlpostprocess | Handles the audio classification inference results:<ol class="ol" id="audio-classification-with-litert__ol_fz3_fw2_s2c"><br>                                    <li class="li">Applies a threshold to the chosen number of results.</li><br><br>                                    <li class="li">Creates text overlay for classes.</li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | Combines the text overlay for classification results and video<br>                                preview. |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70022-50/topic/waylandsink.html) | <ol class="ol" id="audio-classification-with-litert__ol_kjr_fvr_lbc"><br>                                    <li class="li">Waylandsink submits the video stream received on its sink<br>                                        pad to Weston.</li><br><br>                                    <li class="li">Weston renders the video stream on a local display.</li><br><br>                                </ol> |

## Related information

[Audio classification](https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification.html)

**Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/tensorflow-lite-use-cases.html)

Last Published: Feb 20, 2026

[Previous Topic
Image classification and encode with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-classification-and-encode.md) [Next Topic
Object detection and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-object-detection-and-display.md)