# Audio classification

Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification.html](https://docs.qualcomm.com/doc/80-70022-50/topic/audio-classification.html)

The **gst-ai-audio-classification** application shows audio classification using
        input from either a file source or a microphone. It displays both the classification results
        and a video preview.

The following figure shows the pipeline, which gets the input from a file or a
            microphone, preprocesses it, and runs inferences on AI hardware. The results are
            displayed on the screen.

For information about the plugins used in the pipeline flow, see Pipeline flow.

Figure : gst-ai-audio-classification pipeline
            
            <?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="1457.458892822265625" height="286.26763916015625" viewbox="0 0 1457.458892822265625 286.26763916015625">
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <rect x=".500213623046875" y=".500244140625" width="1456.45849609375" height="285.267578125" rx="7.499999999999998" ry="7.499999999999998" style="fill: #fafafa;"></rect>
      <path d="M1449.458892822265625,1c3.85986328125,0,7,3.140182495117188,7,7v270.26763916015625c0,3.85980224609375-3.14013671875,7-7,7H8c-3.85980224609375,0-7-3.14019775390625-7-7V8c0-3.859817504882812,3.14019775390625-7,7-7h1441.458892822265625M1449.458892822265625,0H8C3.581817626953125,0,0,3.581817626953125,0,8v270.26763916015625c0,4.418182373046875,3.581817626953125,8,8,8h1441.458892822265625c4.418212890625,0,8-3.581817626953125,8-8V8c0-4.418182373046875-3.581787109375-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(1260.386688232421875 266.359134674072266)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="1240.135672349277229" y="254.26763916015625" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(1358.968460083007812 266.359134674072266)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="1338.71745376213039" y="254.26763916015625" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_3" data-name="Layer 3">
    <g>
      <rect x="1351.817126596250091" y="65.254833414730456" width="85.641674867638358" height="165.012759342012941" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(1358.085169434547424 152.178565979003906)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">Waylandsink</tspan></text>
      <g>
        <g>
          <rect x="1376.997964030069852" y="20.99352328886016" width="35.279999999998836" height="24.695999999999913" rx="3.999999999998181" ry="3.999999999998181" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></rect>
          <line x1="1387.581964030065137" y1="52.745523288860568" x2="1401.693964030062489" y2="52.745523288860568" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></line>
        </g>
        <line x1="1394.63796403006927" y1="45.689523288860073" x2="1394.63796403006927" y2="52.745523288860568" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></line>
      </g>
      <rect x="69.560344827092194" y="153.86759275674558" width="84.279999999999745" height="76.399999999997817" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="153.84033203125" y1="192.067596435546875" x2="166.1158447265625" y2="192.067596435546875" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="164.94866943359375 196.056671142578125 171.856842041015625 192.067596435546875 164.94866943359375 188.078521728515625 164.94866943359375 196.056671142578125"></polygon>
      </g>
      <line x1="57.144761381970966" y1="192.06759275674267" x2="69.560344827092194" y2="192.06759275674267" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <g>
        <line x1="265.024749755858466" y1="164.6065673828125" x2="277.300262451171875" y2="164.6065673828125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="276.133071899414062 168.59564208984375 283.041244506835938 164.6065673828125 276.133071899414062 160.61749267578125 276.133071899414062 168.59564208984375"></polygon>
      </g>
      <rect x="283.109158747208312" y="150.606570426072722" width="119.788614770222011" height="28" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="402.897773742675781" y1="164.6065673828125" x2="413.992752075195312" y2="164.6065673828125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="412.825553894042969 168.59564208984375 419.733734130859375 164.6065673828125 412.825553894042969 160.61749267578125 412.825553894042969 168.59564208984375"></polygon>
      </g>
      <g>
        <line x1="540.702880859375" y1="164.6065673828125" x2="1237.417999267578125" y2="164.6065673828125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1236.250823974609375 168.59564208984375 1243.159027099609375 164.6065673828125 1236.250823974609375 160.61749267578125 1236.250823974609375 168.59564208984375"></polygon>
      </g>
      <text transform="translate(93.892120361328125 195.866906046867371)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">filesrc</tspan></text>
      <rect x="172.115614671849471" y="153.86759275674558" width="93.100000000001273" height="76.399999999997817" rx="4" ry="4" style="fill: #007884;"></rect>
      <rect x="1243.159002694916126" y="153.86759275674558" width="91.076923185508349" height="76.399999999997817" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(193.236660003662109 195.866796493530273)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtdemux</tspan></text>
      <text transform="translate(312.29649829864502 168.616449356079102)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">h264parse</tspan></text>
      <rect x="420.914267820910027" y="150.606570426072722" width="119.788614770222011" height="28" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(442.75433349609375 168.616449356079102)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">V4l2h264dec</tspan></text>
      <g>
        <line x1="265.024749755858466" y1="208.2681884765625" x2="277.300262451171875" y2="208.2681884765625" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="276.133071899414062 212.25726318359375 283.041244506835938 208.2681884765625 276.133071899414062 204.27911376953125 276.133071899414062 212.25726318359375"></polygon>
      </g>
      <g>
        <line x1="401.717239379882812" y1="208.267578125" x2="413.992752075195312" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="412.825553894042969 212.25665283203125 419.733734130859375 208.267578125 412.825553894042969 204.278533935546875 412.825553894042969 212.25665283203125"></polygon>
      </g>
      <rect x="283.109158747208312" y="188.268191519822722" width="120" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(286.922876358032227 204.267449378967285)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">mpg123audioparse</tspan><tspan x="26.55859375" y="15.599609375">/flacparse</tspan></text>
      <g>
        <line x1="540.702880859375" y1="208.267578125" x2="552.9783935546875" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="551.811199188232422 212.25665283203125 558.719375610351562 208.267578125 551.811199188232422 204.278533935546875 551.811199188232422 212.25665283203125"></polygon>
      </g>
      <rect x="420.914267820910027" y="188.267592756743397" width="120" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(430.393516540527344 204.267449378967285)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">mpg123audiodec</tspan><tspan x="26.55859375" y="15.599609375">/flacdec</tspan></text>
      <g>
        <line x1="659.711135864257812" y1="208.267578125" x2="671.98663330078125" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="670.819442749023438 212.25665283203125 677.727630615235284 208.267578125 670.819442749023438 204.278533935546875 670.819442749023438 212.25665283203125"></polygon>
      </g>
      <rect x="559.71112974287189" y="188.267592756743397" width="100" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(572.03167724609375 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audioconvert</tspan></text>
      <g>
        <line x1="777.97796630859375" y1="208.267578125" x2="790.25347900390625" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="789.0863037109375 212.25665283203125 795.994476318359375 208.267578125 789.0863037109375 204.278533935546875 789.0863037109375 212.25665283203125"></polygon>
      </g>
      <rect x="677.977972667326867" y="188.267592756743397" width="99.999999999999091" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(685.01434326171875 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audioresample</tspan></text>
      <g>
        <line x1="895.994476318359375" y1="208.267578125" x2="908.269989013671875" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="907.102783203125 212.25665283203125 914.010955810546875 208.267578125 907.102783203125 204.278533935546875 907.102783203125 212.25665283203125"></polygon>
      </g>
      <rect x="795.994466970805661" y="188.267592756743397" width="100" height="40" rx="4" ry="4" style="fill: #007884;"></rect>
      <text transform="translate(800.262908935546875 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audiobuffersplit</tspan></text>
      <g>
        <line x1="1014.010955810546875" y1="208.267578125" x2="1026.286468505859375" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1025.119293212890625 212.25665283203125 1032.027435302734375 208.267578125 1025.119293212890625 204.278533935546875 1025.119293212890625 212.25665283203125"></polygon>
      </g>
      <rect x="914.010961274285364" y="188.267592756743397" width="100" height="40" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(918.551849365234375 212.066659927368164)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlaconverter</tspan></text>
      <g>
        <line x1="1092.027435302734375" y1="208.267578125" x2="1104.302947998046875" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1103.135772705078125 212.25665283203125 1110.043975830078125 208.267578125 1103.135772705078125 204.278533935546875 1103.135772705078125 212.25665283203125"></polygon>
      </g>
      <g>
        <line x1="1230.043975830078125" y1="208.267578125" x2="1237.417999267578125" y2="208.267578125" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1236.250823974609375 212.25665283203125 1243.159027099609375 208.267578125 1236.250823974609375 204.278533935546875 1236.250823974609375 212.25665283203125"></polygon>
      </g>
      <rect x="1032.027455577765068" y="188.267592756743397" width="60" height="40" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(1037.957000732421875 212.067039489746094)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimflite</tspan></text>
      <g>
        <path d="M45.241325117124688,70.294191616951139h-19.845000000000255c-2.435575604436963,0-4.409999999999854,1.974424395561073-4.409999999999854,4.409999999999854v35.279999999998836c0,2.435577707292396,1.974424395562892,4.409999999999854,4.409999999999854,4.409999999999854h26.460000000000036c2.435577707290577,0,4.409999999998945-1.974422292707459,4.409999999998945-4.409999999999854v-28.665000000000873l-11.024999999998727-11.024999999997817Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M43.036325117124761,70.294191616951139v8.819999999999709c0,2.435575604438782,1.974424395561073,4.409999999999854,4.409999999999854,4.409999999999854h8.819999999998799" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M34.216325117125052,90.139191616948665l11.024999999999636,6.615000000001601-11.024999999999636,6.615000000001601v-13.230000000003201Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
      </g>
      <g>
        <path d="M45.241325117124688,170.017592756743397h-19.845000000000255c-2.435575604436963,0-4.409999999999854,1.974424395561073-4.409999999999854,4.409999999999854v35.279999999998836c0,2.435577707292396,1.974424395562892,4.409999999999854,4.409999999999854,4.409999999999854h26.460000000000036c2.435577707290577,0,4.409999999998945-1.974422292707459,4.409999999998945-4.409999999999854v-28.664999999997235l-11.024999999998727-11.025000000001455Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M43.036325117124761,170.017592756743397v8.819999999999709c0,2.435575604438782,1.974424395561073,4.409999999999854,4.409999999999854,4.409999999999854h8.819999999998799" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
        <path d="M34.216325117125052,189.862592756744561l11.024999999999636,6.614999999997963-11.024999999999636,6.615000000001601v-13.229999999999563Z" style="fill: none; stroke: #000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2px;"></path>
      </g>
      <rect x="69.560344827092194" y="62.944191616948956" width="84.279999999999745" height="58.80000000000291" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="153.84033203125" y1="92.344192504882812" x2="166.1158447265625" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="164.94866943359375 96.333251953125 171.856842041015625 92.344192504882812 164.94866943359375 88.355133056640625 164.94866943359375 96.333251953125"></polygon>
      </g>
      <line x1="57.144761381970966" y1="92.344191616950411" x2="69.560344827092194" y2="92.344191616950411" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <rect x="171.924751787149944" y="62.944191616948956" width="93.100000000000364" height="58.80000000000291" rx="4" ry="4" style="fill: #007884;"></rect>
      <g>
        <line x1="265.024749755858466" y1="92.344192504882812" x2="277.300262451171875" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="276.133071899414062 96.333251953125 283.041244506835938 92.344192504882812 276.133071899414062 88.355133056640625 276.133071899414062 96.333251953125"></polygon>
      </g>
      <rect x="283.109158747208312" y="62.944191616948956" width="123.100000000000364" height="58.80000000000291" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="406.209159851074219" y1="92.344192504882812" x2="418.484672546386719" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="417.317474365234375 96.333251953125 424.225654602050781 92.344192504882812 417.317474365234375 88.355133056640625 417.317474365234375 96.333251953125"></polygon>
      </g>
      <rect x="424.293565707267589" y="62.944191616948956" width="93.100000000000364" height="58.80000000000291" rx="3.999999999999997" ry="3.999999999999997" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="517.393566131591797" y1="92.344192504882812" x2="529.669076919555664" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="528.501884460449219 96.333251953125 535.410060882568359 92.344192504882812 528.501884460449219 88.355133056640625 528.501884460449219 96.333251953125"></polygon>
      </g>
      <text transform="translate(87.145050048828125 95.795881271362305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">pulsesrc</tspan></text>
      <text transform="translate(202.405792236328125 88.122214317321777)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">audio</tspan><tspan x="-13.59375" y="15.599609375">buffersplit</tspan></text>
      <text transform="translate(299.199934959411621 96.353498458862305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlaconverter</tspan></text>
      <text transform="translate(446.773468971252441 96.353498458862305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimflite</tspan></text>
      <rect x="534.877972667326503" y="62.944191616948956" width="143.100000000000364" height="58.80000000000291" rx="3.999999999999999" ry="3.999999999999999" style="fill: #2a2aea;"></rect>
      <g>
        <line x1="677.97796630859375" y1="92.344192504882812" x2="1346.076141357421875" y2="92.344192504882812" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1344.908966064453125 96.333251953125 1351.817108154296875 92.344192504882812 1344.908966064453125 88.355133056640625 1344.908966064453125 96.333251953125"></polygon>
      </g>
      <text transform="translate(555.828033447265625 96.353498458862305)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      <rect x="1110.043949881244771" y="188.267592756743397" width="120" height="40" rx="4" ry="4" style="fill: #2a2aea;"></rect>
      <text transform="translate(1119.444000244140625 212.06732177734375)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      <line x1="1394.63796403006927" y1="54.141660515580952" x2="1394.63796403006927" y2="64.907250964170089" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
      <g>
        <line x1="1334.235931396484375" y1="190.94036865234375" x2="1346.076141357421875" y2="190.94036865234375" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="1344.908966064453125 194.929443359375 1351.817108154296875 190.94036865234375 1344.908966064453125 186.951324462890625 1344.908966064453125 194.929443359375"></polygon>
      </g>
      <text transform="translate(1248.999556064605713 195.86639404296875)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 13px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
    </g>
  </g>
</svg>

## Sample model and label files

Table : Sample model and label files for gst-ai-audio-classification

| Runtime | Model files | Label files |
| :--- | :--- | :--- |
| LiteRT | <var class="keyword varname">yamnet.tflite</var> | <var class="keyword varname">yamnet.json</var> |

## Run the application on the target device

The sample application uses the
                    /etc/configs/config-audio-classification.json file to read
                the input parameters.

To create your own config JSON file, use [config-audio-classification.json](https://git.codelinaro.org/clo/le/platform/vendor/qcom-opensource/gst-plugins-qti-oss/-/blob/imsdk.lnx.2.0.0.r2-rel/gst-sample-apps/gst-ai-audio-classification/config-audio-classification.json?ref_type=heads) as a
                reference.

1. Ensure that you complete the Prerequisites.
2. Update the config JSON file based on the model, input stream, and other
                    properties. For more information, see Config JSON field description.
3. Use the following format of the
                        config-audio-classification.json file.

        {
          "file-path": "<path to video file>",
          "model": "<path to model file>",
          "labels": "<path to label file>",
          "threshold": <integer between 1 and 100>,
          "runtime": "<cpu or gpu>",
          "codec": "<mp3 or flac>"
        }Copy to clipboard

For example, run the application using input from a file, MP3 encoding, and GPU
                        runtime:

        {
        "file-path": "/etc/media/video-mp3.mp4", 
        "model": "/etc/models/yamnet.tflite", 
        "labels": "/etc/labels/yamnet.json",
        "runtime": "cpu",
        "threshold": 20,
        "codec": "mp3"
        }Copy to clipboard
4. Run the gst-ai-audio-classification
                    application:

        gst-ai-audio-classification --config-file=/etc/configs/config-audio-classification.jsonCopy to clipboard
5. To display the available help options, run the following command in the SSH
                        shell:

        gst-ai-audio-classification -hCopy to clipboard
6. To stop the use case, use CTRL +
                    C.

## Expected output

The output video and classified audio are played on the screen.

## Pipeline flow

The following table lists the plugins used in the audio classification
                    pipeline:| Plugin | Description |
| --- | --- |
| File source: filesrc | <ul class="ul" id="audio-classification__ul_z1z_x4f_w1c"><br>                                    <li class="li">Captures the video stream using filesrc, followed by<br>                                        qtdemux, which demultiplexes the stream.</li><br><br>                                    <li class="li">Uses tee to split the stream for inferencing.</li><br><br>                                </ul> |
| h264parse | Parses the H.264 video. |
| v4l2h264dec | Decodes the video. |
| mpegaudioparse or flacparse | Parses the audio (MP3 or FLAC). |
| mpg123audiodec or flacdec | Decodes the audio (MP3 or FLAC). |
| audioconvert | Converts the audio buffers between various possible<br>                                formats. |
| audioresample | Resamples the audio buffers to different sample rates. |
| pulsesrc | Reads the audio from the microphone. |
| audiobuffersplit | Splits the incoming audio buffers into equal sized<br>                                chunks. |
| qtimlaconverter | Performs preprocessing on the audio stream and converts the<br>                                stream to a tensor stream.<br>The audio classification model uses<br>                                    this tensor stream for inferencing. |
| qtimltflite | Performs inferencing using the YAMNet model. |
| qtimlpostprocess | Uses yamnet module to handle the audio classification inference<br>                                    results:<ol class="ol" id="audio-classification__ol_ol3_dky_kbc"><br>                                    <li class="li">Applies a threshold to the chosen number of results.</li><br><br>                                    <li class="li">Creates text overlay for classes.</li><br><br>                                </ol> |
| qtivcomposer | Combines the text overlay for classification results and video<br>                                preview. |
| Waylandsink | <ol class="ol" id="audio-classification__ol_kjr_fvr_lbc"><br>                                    <li class="li">Waylandsink submits the video stream received on its sink<br>                                        pad to Weston.</li><br><br>                                    <li class="li">Weston renders the video stream on a local display.</li><br><br>                                </ol> |

## Config JSON field description

The different parameters available to configure the JSON file and run the use case
                are as follows:

| Field | Values/description |
| :--- | :--- |
| **runtime** | Use one of the following runtimes:<ul class="ul" id="audio-classification__ul_mry_nck_32c"><br>                                    <li class="li"><code class="ph codeph">cpu</code></li><br><br>                                    <li class="li"><code class="ph codeph">gpu</code></li><br><br>                                </ul> |
| **Input source** | Use one of the following input sources:<ul class="ul" id="audio-classification__ul_xym_rck_32c"><br>                                    <li class="li"><code class="ph codeph">file-path</code>: The directory path to the video<br>                                        file.</li><br><br>                                    <li class="li">Microphone</li><br><br>                                </ul> |
| **threshold=&lt;integer&gt;** | Use any integer between 1 and 100. |
| **codec** | The audio codec of input video:<ul class="ul" id="audio-classification__ul_q3j_qqz_n2c"><br>                                    <li class="li">MP3 (default)</li><br><br>                                    <li class="li">FLAC</li><br><br>                                </ul> |

## Related information

Audio classification decode and display with LiteRT

**Parent Topic:** Run AI/ML sample applications

Last Published: Feb 20, 2026

Previous Topic
 
Face recognition Next Topic

Metadata parsing