# Audio classification

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification.html](https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification.html)

The **gst-ai-audio-classification** application shows audio classification using
        input from either a file source or a microphone. It displays both the classification results
        and a video preview.

The following figure shows the pipeline, which gets the input from a file or a
            microphone, preprocesses it, and runs inferences on AI hardware. The results are
            displayed on the screen.

For information about the plugins used in the pipeline flow, see [Pipeline flow](https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification.html#audio-classification__section_fdn_fmz_n2c).

Figure : gst-ai-audio-classification pipeline
            
            <?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="1457.458892822265625" height="286.26763916015625" viewbox="0 0 1457.458892822265625 286.26763916015625">
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <rect x=".500213623046875" y=".500244140625" width="1456.45849609375" height="285.267578125" rx="7.499999999999998" ry="7.499999999999998" style="fill: #fafafa;"></rect>
      <path d="M1449.458892822265625,1c3.85986328125,0,7,3.140182495117188,7,7v270.26763916015625c0,3.85980224609375-3.14013671875,7-7,7H8c-3.85980224609375,0-7-3.14019775390625-7-7V8c0-3.859817504882812,3.14019775390625-7,7-7h1441.458892822265625M1449.458892822265625,0H8C3.581817626953125,0,0,3.581817626953125,0,8v270.26763916015625c0,4.418182373046875,3.581817626953125,8,8,8h1441.458892822265625c4.418212890625,0,8-3.581817626953125,8-8V8c0-4.418182373046875-3.581787109375-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(1260.386688232421875 266.359134674072266)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="1240.135672349277229" y="254.26763916015625" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1358.968460083007812 266.359134674072266)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1338.71745376213039" y="254.26763916015625" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_3" data-name="Layer 3">
    <g>
      <rect x="1351.817126596250091" y="65.254833414730456" width="85.641674867638358" height="165.012759342012941" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(1358.085169434547424 152.178565979003906)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">Waylandsink</tspan></text>
      <g>
        <g>
          <rect x="1376.997964030069852" y="20.99352328886016" width="35.279999999998836" height="24.695999999999913" rx="3.999999999998181" ry="3.999999999998181" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></rect>
          <line x1="1387.581964030065137" y1="52.745523288860568" x2="1401.693964030062489" y2="52.745523288860568" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></line>
        </g>
        <line x1="1394.63796403006927" y1="45.689523288860073" x2="1394.63796403006927" y2="52.745523288860568" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></line>
      </g>
      <rect x="69.560344827092194" y="153.86759275674558" width="84.279999999999745" height="76.399999999997817" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="153.84033203125" y1="192.067596435546875" x2="166.1158447265625" y2="192.067596435546875" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="164.94866943359375 196.056671142578125 171.856842041015625 192.067596435546875 164.94866943359375 188.078521728515625 164.94866943359375 196.056671142578125"></polygon>
      </g>
      <line x1="57.144761381970966" y1="192.06759275674267" x2="69.560344827092194" y2="192.06759275674267" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <g>
        <line x1="265.024749755858466" y1="164.6065673828125" x2="277.300262451171875" y2="164.6065673828125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="276.133071899414062 168.59564208984375 283.041244506835938 164.6065673828125 276.133071899414062 160.61749267578125 276.133071899414062 168.59564208984375"></polygon>
      </g>
      <rect x="283.109158747208312" y="150.606570426072722" width="119.788614770222011" height="28" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="402.897773742675781" y1="164.6065673828125" x2="413.992752075195312" y2="164.6065673828125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="412.825553894042969 168.59564208984375 419.733734130859375 164.6065673828125 412.825553894042969 160.61749267578125 412.825553894042969 168.59564208984375"></polygon>
      </g>
      <g>
        <line x1="540.702880859375" y1="164.6065673828125" x2="1237.417999267578125" y2="164.6065673828125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1236.250823974609375 168.59564208984375 1243.159027099609375 164.6065673828125 1236.250823974609375 160.61749267578125 1236.250823974609375 168.59564208984375"></polygon>
      </g>
      <text transform="translate(93.892120361328125 195.866906046867371)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">filesrc</tspan></text>
      <rect x="172.115614671849471" y="153.86759275674558" width="93.100000000001273" height="76.399999999997817" rx="4" ry="4" style="fill: #007884;"></rect>
      <rect x="1243.159002694916126" y="153.86759275674558" width="91.076923185508349" height="76.399999999997817" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(193.236660003662109 195.866796493530273)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtdemux</tspan></text>
      <text transform="translate(312.29649829864502 168.616449356079102)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">h264parse</tspan></text>
      <rect x="420.914267820910027" y="150.606570426072722" width="119.788614770222011" height="28" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(442.75433349609375 168.616449356079102)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">V4l2h264dec</tspan></text>
      <g>
        <line x1="265.024749755858466" y1="208.2681884765625" x2="277.300262451171875" y2="208.2681884765625" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="276.133071899414062 212.25726318359375 283.041244506835938 208.2681884765625 276.133071899414062 204.27911376953125 276.133071899414062 212.25726318359375"></polygon>
      </g>
      <g>
        <line x1="401.717239379882812" y1="208.267578125" x2="413.992752075195312" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="412.825553894042969 212.25665283203125 419.733734130859375 208.267578125 412.825553894042969 204.278533935546875 412.825553894042969 212.25665283203125"></polygon>
      </g>
      <rect x="283.109158747208312" y="188.268191519822722" width="120" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(286.922876358032227 204.267449378967285)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">mpg123audioparse</tspan><tspan x="26.55859375" y="15.599609375">/flacparse</tspan></text>
      <g>
        <line x1="540.702880859375" y1="208.267578125" x2="552.9783935546875" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="551.811199188232422 212.25665283203125 558.719375610351562 208.267578125 551.811199188232422 204.278533935546875 551.811199188232422 212.25665283203125"></polygon>
      </g>
      <rect x="420.914267820910027" y="188.267592756743397" width="120" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(430.393516540527344 204.267449378967285)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">mpg123audiodec</tspan><tspan x="26.55859375" y="15.599609375">/flacdec</tspan></text>
      <g>
        <line x1="659.711135864257812" y1="208.267578125" x2="671.98663330078125" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="670.819442749023438 212.25665283203125 677.727630615235284 208.267578125 670.819442749023438 204.278533935546875 670.819442749023438 212.25665283203125"></polygon>
      </g>
      <rect x="559.71112974287189" y="188.267592756743397" width="100" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(572.03167724609375 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audioconvert</tspan></text>
      <g>
        <line x1="777.97796630859375" y1="208.267578125" x2="790.25347900390625" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="789.0863037109375 212.25665283203125 795.994476318359375 208.267578125 789.0863037109375 204.278533935546875 789.0863037109375 212.25665283203125"></polygon>
      </g>
      <rect x="677.977972667326867" y="188.267592756743397" width="99.999999999999091" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(685.01434326171875 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audioresample</tspan></text>
      <g>
        <line x1="895.994476318359375" y1="208.267578125" x2="908.269989013671875" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.102783203125 212.25665283203125 914.010955810546875 208.267578125 907.102783203125 204.278533935546875 907.102783203125 212.25665283203125"></polygon>
      </g>
      <rect x="795.994466970805661" y="188.267592756743397" width="100" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(800.262908935546875 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audiobuffersplit</tspan></text>
      <g>
        <line x1="1014.010955810546875" y1="208.267578125" x2="1026.286468505859375" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1025.119293212890625 212.25665283203125 1032.027435302734375 208.267578125 1025.119293212890625 204.278533935546875 1025.119293212890625 212.25665283203125"></polygon>
      </g>
      <rect x="914.010961274285364" y="188.267592756743397" width="100" height="40" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(918.551849365234375 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlaconverter</tspan></text>
      <g>
        <line x1="1092.027435302734375" y1="208.267578125" x2="1104.302947998046875" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1103.135772705078125 212.25665283203125 1110.043975830078125 208.267578125 1103.135772705078125 204.278533935546875 1103.135772705078125 212.25665283203125"></polygon>
      </g>
      <g>
        <line x1="1230.043975830078125" y1="208.267578125" x2="1237.417999267578125" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1236.250823974609375 212.25665283203125 1243.159027099609375 208.267578125 1236.250823974609375 204.278533935546875 1236.250823974609375 212.25665283203125"></polygon>
      </g>
      <rect x="1032.027455577765068" y="188.267592756743397" width="60" height="40" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(1037.957000732421875 212.067039489746094)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimflite</tspan></text>
      <g>
        <path d="M45.241325117124688,70.294191616951139h-19.845000000000255c-2.435575604436963,0-4.409999999999854,1.974424395561073-4.409999999999854,4.409999999999854v35.279999999998836c0,2.435577707292396,1.974424395562892,4.409999999999854,4.409999999999854,4.409999999999854h26.460000000000036c2.435577707290577,0,4.409999999998945-1.974422292707459,4.409999999998945-4.409999999999854v-28.665000000000873l-11.024999999998727-11.024999999997817Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M43.036325117124761,70.294191616951139v8.819999999999709c0,2.435575604438782,1.974424395561073,4.409999999999854,4.409999999999854,4.409999999999854h8.819999999998799" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M34.216325117125052,90.139191616948665l11.024999999999636,6.615000000001601-11.024999999999636,6.615000000001601v-13.230000000003201Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
      </g>
      <g>
        <path d="M45.241325117124688,170.017592756743397h-19.845000000000255c-2.435575604436963,0-4.409999999999854,1.974424395561073-4.409999999999854,4.409999999999854v35.279999999998836c0,2.435577707292396,1.974424395562892,4.409999999999854,4.409999999999854,4.409999999999854h26.460000000000036c2.435577707290577,0,4.409999999998945-1.974422292707459,4.409999999998945-4.409999999999854v-28.664999999997235l-11.024999999998727-11.025000000001455Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M43.036325117124761,170.017592756743397v8.819999999999709c0,2.435575604438782,1.974424395561073,4.409999999999854,4.409999999999854,4.409999999999854h8.819999999998799" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M34.216325117125052,189.862592756744561l11.024999999999636,6.614999999997963-11.024999999999636,6.615000000001601v-13.229999999999563Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
      </g>
      <rect x="69.560344827092194" y="62.944191616948956" width="84.279999999999745" height="58.80000000000291" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="153.84033203125" y1="92.344192504882812" x2="166.1158447265625" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="164.94866943359375 96.333251953125 171.856842041015625 92.344192504882812 164.94866943359375 88.355133056640625 164.94866943359375 96.333251953125"></polygon>
      </g>
      <line x1="57.144761381970966" y1="92.344191616950411" x2="69.560344827092194" y2="92.344191616950411" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <rect x="171.924751787149944" y="62.944191616948956" width="93.100000000000364" height="58.80000000000291" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="265.024749755858466" y1="92.344192504882812" x2="277.300262451171875" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="276.133071899414062 96.333251953125 283.041244506835938 92.344192504882812 276.133071899414062 88.355133056640625 276.133071899414062 96.333251953125"></polygon>
      </g>
      <rect x="283.109158747208312" y="62.944191616948956" width="123.100000000000364" height="58.80000000000291" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="406.209159851074219" y1="92.344192504882812" x2="418.484672546386719" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="417.317474365234375 96.333251953125 424.225654602050781 92.344192504882812 417.317474365234375 88.355133056640625 417.317474365234375 96.333251953125"></polygon>
      </g>
      <rect x="424.293565707267589" y="62.944191616948956" width="93.100000000000364" height="58.80000000000291" rx="3.999999999999997" ry="3.999999999999997" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="517.393566131591797" y1="92.344192504882812" x2="529.669076919555664" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="528.501884460449219 96.333251953125 535.410060882568359 92.344192504882812 528.501884460449219 88.355133056640625 528.501884460449219 96.333251953125"></polygon>
      </g>
      <text transform="translate(87.145050048828125 95.795881271362305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">pulsesrc</tspan></text>
      <text transform="translate(202.405792236328125 88.122214317321777)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audio</tspan><tspan x="-13.59375" y="15.599609375">buffersplit</tspan></text>
      <text transform="translate(299.199934959411621 96.353498458862305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlaconverter</tspan></text>
      <text transform="translate(446.773468971252441 96.353498458862305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimflite</tspan></text>
      <rect x="534.877972667326503" y="62.944191616948956" width="143.100000000000364" height="58.80000000000291" rx="3.999999999999999" ry="3.999999999999999" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="677.97796630859375" y1="92.344192504882812" x2="1346.076141357421875" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1344.908966064453125 96.333251953125 1351.817108154296875 92.344192504882812 1344.908966064453125 88.355133056640625 1344.908966064453125 96.333251953125"></polygon>
      </g>
      <text transform="translate(555.828033447265625 96.353498458862305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      <rect x="1110.043949881244771" y="188.267592756743397" width="120" height="40" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(1119.444000244140625 212.06732177734375)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      <line x1="1394.63796403006927" y1="54.141660515580952" x2="1394.63796403006927" y2="64.907250964170089" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <g>
        <line x1="1334.235931396484375" y1="190.94036865234375" x2="1346.076141357421875" y2="190.94036865234375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1344.908966064453125 194.929443359375 1351.817108154296875 190.94036865234375 1344.908966064453125 186.951324462890625 1344.908966064453125 194.929443359375"></polygon>
      </g>
      <text transform="translate(1248.999556064605713 195.86639404296875)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
    </g>
  </g>
</svg>

## Sample model and label files

Table : Sample model and label files for gst-ai-audio-classification

| Runtime | Model files | Label files |
| :--- | :--- | :--- |
| LiteRT | <var class="keyword varname">yamnet.tflite</var> | <var class="keyword varname">yamnet.json</var> |

## Run the application on the target device

The sample application uses the
                    /etc/configs/config-audio-classification.json file to read
                the input parameters.

To create your own config JSON file, use [config-audio-classification.json](https://git.codelinaro.org/clo/le/platform/vendor/qcom-opensource/gst-plugins-qti-oss/-/blob/imsdk.lnx.2.0.0.r2-rel/gst-sample-apps/gst-ai-audio-classification/config-audio-classification.json?ref_type=heads) as a
                reference.

1. Ensure that you complete the [Prerequisites](https://docs.qualcomm.com/doc/80-70022-50/topic/download-model-and-label-files.html).
2. Update the config JSON file based on the model, input stream, and other
                    properties. For more information, see [Config JSON field description](https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification.html#audio-classification__section_lcw_2zj_32c).
3. Use the following format of the
                        config-audio-classification.json file.

        {
          "file-path": "<path to video file>",
          "model": "<path to model file>",
          "labels": "<path to label file>",
          "threshold": <integer between 1 and 100>,
          "runtime": "<cpu or gpu>",
          "codec": "<mp3 or flac>"
        }Copy to clipboard

For example, run the application using input from a file, MP3 encoding, and GPU
                        runtime:

        {
        "file-path": "/etc/media/video-mp3.mp4", 
        "model": "/etc/models/yamnet.tflite", 
        "labels": "/etc/labels/yamnet.json",
        "runtime": "cpu",
        "threshold": 20,
        "codec": "mp3"
        }Copy to clipboard
4. Run the gst-ai-audio-classification
                    application:

        gst-ai-audio-classification --config-file=/etc/configs/config-audio-classification.jsonCopy to clipboard
5. To display the available help options, run the following command in the SSH
                        shell:

        gst-ai-audio-classification -hCopy to clipboard
6. To stop the use case, use CTRL +
                    C.

## Expected output

The output video and classified audio are played on the screen.

## Pipeline flow

The following table lists the plugins used in the audio classification
                    pipeline:| Plugin | Description |
| --- | --- |
| File source: filesrc | <ul class="ul" id="audio-classification__ul_z1z_x4f_w1c"><br>                                    <li class="li">Captures the video stream using filesrc, followed by<br>                                        qtdemux, which demultiplexes the stream.</li><br><br>                                    <li class="li">Uses tee to split the stream for inferencing.</li><br><br>                                </ul> |
| h264parse | Parses the H.264 video. |
| [v4l2h264dec](https://docs.qualcomm.com/doc/80-70022-50/topic/v4l2h264dec.html) | Decodes the video. |
| mpegaudioparse or flacparse | Parses the audio (MP3 or FLAC). |
| mpg123audiodec or flacdec | Decodes the audio (MP3 or FLAC). |
| audioconvert | Converts the audio buffers between various possible<br>                                formats. |
| audioresample | Resamples the audio buffers to different sample rates. |
| [pulsesrc](https://docs.qualcomm.com/doc/80-70022-50/topic/pulsesrc.html) | Reads the audio from the microphone. |
| audiobuffersplit | Splits the incoming audio buffers into equal sized<br>                                chunks. |
| [qtimlaconverter](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimlaconverter.html) | Performs preprocessing on the audio stream and converts the<br>                                stream to a tensor stream.<br>The audio classification model uses<br>                                    this tensor stream for inferencing. |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimltflite.html) | Performs inferencing using the YAMNet model. |
| qtimlpostprocess | Uses yamnet module to handle the audio classification inference<br>                                    results:<ol class="ol" id="audio-classification__ol_ol3_dky_kbc"><br>                                    <li class="li">Applies a threshold to the chosen number of results.</li><br><br>                                    <li class="li">Creates text overlay for classes.</li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) | Combines the text overlay for classification results and video<br>                                preview. |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70022-50/topic/waylandsink.html) | <ol class="ol" id="audio-classification__ol_kjr_fvr_lbc"><br>                                    <li class="li">Waylandsink submits the video stream received on its sink<br>                                        pad to Weston.</li><br><br>                                    <li class="li">Weston renders the video stream on a local display.</li><br><br>                                </ol> |

## Config JSON field description

The different parameters available to configure the JSON file and run the use case
                are as follows:

| Field | Values/description |
| :--- | :--- |
| **runtime** | Use one of the following runtimes:<ul class="ul" id="audio-classification__ul_mry_nck_32c"><br>                                    <li class="li"><code class="ph codeph">cpu</code></li><br><br>                                    <li class="li"><code class="ph codeph">gpu</code></li><br><br>                                </ul> |
| **Input source** | Use one of the following input sources:<ul class="ul" id="audio-classification__ul_xym_rck_32c"><br>                                    <li class="li"><code class="ph codeph">file-path</code>: The directory path to the video<br>                                        file.</li><br><br>                                    <li class="li">Microphone</li><br><br>                                </ul> |
| **threshold=&lt;integer&gt;** | Use any integer between 1 and 100. |
| **codec** | The audio codec of input video:<ul class="ul" id="audio-classification__ul_q3j_qqz_n2c"><br>                                    <li class="li">MP3 (default)</li><br><br>                                    <li class="li">FLAC</li><br><br>                                </ul> |

## Related information

[Audio classification decode and display with LiteRT](https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification-with-litert.html)

**Parent Topic:** [Run AI/ML sample applications](https://docs.qualcomm.com/doc/80-70022-50/topic/ai-ml-sample-applications.html)

Last Published: Feb 20, 2026

[Previous Topic
Face recognition](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/gst-ai-face-recognition.md) [Next Topic
Metadata parsing](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/gst-ai-metadata-parser.md)