# Overview

Source: [https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html](https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html)

A subsystem is a processor that has its own independent execution environment on the
                Qualcomm^®^ Linux device (SoC). The tools, sample logs, and procedures to
            troubleshoot the affected subsystems are described here. Following are the different
            subsystems on Qualcomm SoC. 

Table : Subsystems on SoC

| Subsystem | Description |
| --- | --- |
| Application processor subsystem (APSS) | This is the master subsystem. It executes the Linux kernel as the<br>                                high-level OS (HLOS). |
| aDSP/LPASS | Application digital signal processor (aDSP) is also known as the<br>                                Low-power audio subsystem (LPASS). |
| mDSP/MPSS | Modem processor subsystem |
| cDSP | Compute DSP |
| WPSS | WLAN processor subsystem |
| TrustZone (TZ) | This is the Qualcomm Trusted Execution Environment. For more<br>                                information on the TrustZone subsystem, see [Qualcomm Linux Security<br>                                    Guide](https://docs.qualcomm.com/bundle/publicresource/topics/80-70014-11). |
| Always On Processor (AOP) | This subsystem regulates the power on the device. |

Except the application processor, software pertaining to all other subsystems are also
            known as non-HLOS.

To understand the boot flow of the subsystems, see [Qualcomm Linux Boot Guide](https://docs.qualcomm.com/bundle/publicresource/topics/80-70014-4).

To know more about the chipset specifications and for the functional block diagram, see
                [QCS6490/QCS5430 Data Sheet](https://docs.qualcomm.com/bundle/publicresource/topics/80-23889-1/device-description.html).

For definitions of commonly used terms such as NoC, xPU, see [Acronyms and terms](https://docs.qualcomm.com/doc/80-70014-12/topic/references.html#acronyms_and_terms).
Note: The Qualcomm Linux
                platform allows you to develop applications for QCS6490 and QCS5430.

## Debug workflow

Source: [https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html](https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html)

As there are multiple subsystems on the SoC, it is important to first identify the
      subsystem on which the issue occurred so that you can debug the right subsystem.

The application processor is the master processor, and it can detect when the other
      subsystem crash occurred. For example, if the modem processor subsystem crashes, the kernel
      log captures the subsystem restart (SSR) crash error log. Therefore, to identify which
      subsystems have to be debugged, you must first verify the kernel debug messages.

The following figure shows the debug workflow to identify the subsystem to be
        debugged:
Figure : Debug workflow to identify the subsystem to be debugged
        
        <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by Microsoft Visio, SVG Export Debug-workflow.svg Page-1 -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ev="http://www.w3.org/2001/xml-events" width="12.0417in" height="7.57292in" viewbox="0 0 867 545.25" xml:space="preserve" color-interpolation-filters="sRGB" class="st14"><style>.svg-1 .st1 { fill: #f7f8fa; marker-end: url("#mrkr5-4"); stroke: none; stroke-width: 1.5 }
.svg-1 .st2 { fill: #4672c4; fill-opacity: 1; stroke: #4672c4; stroke-opacity: 1; stroke-width: 0.37313432835821 }
.svg-1 .st3 { fill: #000000; font-family: Arial; font-size: 0.833336em; font-weight: bold }
.svg-1 .st4 { font-size: 1em; font-weight: normal }
.svg-1 .st5 { fill: none; stroke: #3253dc; stroke-width: 2.5 }
.svg-1 .st6 { fill: #000000; font-family: Arial; font-size: 1.00001em }
.svg-1 .st7 { font-size: 1em }
.svg-1 .st8 { marker-end: url("#mrkr5-20"); stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2.25 }
.svg-1 .st9 { fill: #000000; fill-opacity: 1; stroke: #000000; stroke-opacity: 1; stroke-width: 0.47169811320755 }
.svg-1 .st10 { fill: #ffffff; stroke: none; stroke-linecap: butt; stroke-width: 7.2 }
.svg-1 .st11 { fill: none }
.svg-1 .st12 { stroke: #3253dc; stroke-width: 2.5 }
.svg-1 .st13 { fill: #000000; font-family: Arial; font-size: 0.75em }
.svg-1 .st14 { fill: none; fill-rule: evenodd; font-size: 12px; overflow: visible; stroke-linecap: square; stroke-miterlimit: 3 }</style>
<defs id="Markers">	<g id="lend5">		<path d="M 2 1 L 0 0 L 1.98117 -0.993387 C 1.67173 -0.364515 1.67301 0.372641 1.98465 1.00043 " style="stroke:none"></path>	</g>	<marker id="mrkr5-4" class="st2" refx="-4.69" orient="auto" markerunits="strokeWidth" overflow="visible">		<use xlink:href="#lend5" transform="scale(-2.68,-2.68) "></use>	</marker>	<marker id="mrkr5-20" class="st9" refx="-3.71" orient="auto" markerunits="strokeWidth" overflow="visible">		<use xlink:href="#lend5" transform="scale(-2.12,-2.12) "></use>	</marker></defs><g>	<title>Page-1</title>	<g id="shape26-1" transform="translate(19.5,-19.5)">		<title></title>		<desc>*The debug features to troubleshoot errors in TrustZone are c...</desc>		<rect x="0" y="39" width="828" height="506.25" class="st1"></rect>		<text x="2" y="540.25" class="st3">*<tspan class="st4">The debug features to troubleshoot errors in TrustZone are currently available for users who have full access to the propriet</tspan><tspan class="st4">ary software shipped with Qualcomm Linux.</tspan></text>		</g>	<g id="shape1-8" transform="translate(397.143,-479.25)">		<title></title>		<desc>Start</desc>		<path d="M14.79 545.25 L64.07 545.25 A14.7857 14.7857 -180 0 0 64.07 515.68 L14.79 515.68 A14.7857 14.7857 -180 1 0 14.79					 545.25 Z" class="st5"></path>		<text x="26.76" y="534.06" class="st6">Start</text>		</g>	<g id="shape2-11" transform="translate(369.818,-345.021)">		<title></title>		<desc>Panic or Bug in dmesg?</desc>		<path d="M0 503.24 L66.69 461.23 L133.37 503.24 L66.69 545.25 L0 503.24 Z" class="st5"></path>		<text x="32.33" y="499.64" class="st6">Panic or Bug<tspan x="39" dy="1.2em" class="st7">in dmesg?</tspan></text>		</g>	<g id="shape3-15" transform="translate(445.538,-479.25)">		<title></title>		<path d="M-8.97 545.25 L-8.97 563.25 L-9.03 563.25 L-9.03 587.11" class="st8"></path>	</g>	<g id="shape4-21" transform="translate(177.762,-208.98)">		<title></title>		<desc>Subsystem restart log in dmesg exists?</desc>		<path d="M0 499.24 L66.69 453.22 L133.37 499.24 L66.69 545.25 L0 499.24 Z" class="st5"></path>		<text x="37.01" y="488.44" class="st6">Subsystem <tspan x="14" dy="1.2em" class="st7">restart log in dmesg </tspan><tspan x="48.01" dy="1.2em" class="st7">exists?</tspan></text>		</g>	<g id="shape5-26" transform="translate(569.877,-208.25)">		<title></title>		<desc>Error in TrustZone*?</desc>		<path d="M0 499.24 L66.69 453.22 L133.37 499.24 L66.69 545.25 L0 499.24 Z" class="st5"></path>		<text x="47.02" y="495.64" class="st6">Error in <tspan x="33.67" dy="1.2em" class="st7">TrustZone*?</tspan></text>		</g>	<g id="shape6-30" transform="translate(369.818,-387.034)">		<title></title>		<desc>Yes</desc>		<path d="M0 545.25 L-125.37 545.25 L-125.37 622.93" class="st8"></path>		<rect x="-116.037" y="538.05" width="20.6782" height="14.4001" class="st10"></rect>		<text x="-116.04" y="548.85" class="st6">Yes</text>		</g>	<g id="shape7-37" transform="translate(503.191,-387.034)">		<title></title>		<desc>No</desc>		<path d="M0 545.25 L133.37 545.25 L133.37 623.66" class="st8"></path>		<rect x="102.394" y="538.05" width="15.3399" height="14.4001" class="st10"></rect>		<text x="102.39" y="548.85" class="st6">No</text>		</g>	<g id="shape8-44" transform="translate(71.7976,-77.7024)">		<title></title>		<desc>Debug specific subsystem</desc>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25 Z" class="st11"></path>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25" class="st12"></path>		<path d="M9 545.25 L9 471.32" class="st12"></path>		<path d="M89.57 545.25 L89.57 471.32" class="st12"></path>		<text x="31.6" y="497.49" class="st6">Debug <tspan x="29.28" dy="1.2em" class="st7">specific </tspan><tspan x="20.61" dy="1.2em" class="st7">subsystem</tspan></text>		</g>	<g id="shape10-52" transform="translate(308.369,-77.7024)">		<title></title>		<desc>Debug Application processor</desc>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25 Z" class="st11"></path>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25" class="st12"></path>		<path d="M9 545.25 L9 471.32" class="st12"></path>		<path d="M89.57 545.25 L89.57 471.32" class="st12"></path>		<text x="31.6" y="497.49" class="st6">Debug <tspan x="19.93" dy="1.2em" class="st7">Application </tspan><tspan x="22.94" dy="1.2em" class="st7">processor</tspan></text>		</g>	<g id="shape11-60" transform="translate(586.25,-77.7024)">		<title></title>		<desc>Debug Always-on processor</desc>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25 Z" class="st11"></path>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25" class="st12"></path>		<path d="M9 545.25 L9 471.32" class="st12"></path>		<path d="M89.57 545.25 L89.57 471.32" class="st12"></path>		<text x="31.6" y="497.49" class="st6">Debug <tspan x="21.61" dy="1.2em" class="st7">Always</tspan>-on <tspan x="22.94" dy="1.2em" class="st7">processor </tspan></text>		</g>	<g id="shape12-68" transform="translate(712.238,-77.7024)">		<title></title>		<desc>Debug NoC</desc>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25 Z" class="st11"></path>		<path d="M0 545.25 L98.57 545.25 L98.57 471.32 L0 471.32 L0 545.25" class="st12"></path>		<path d="M9 545.25 L9 471.32" class="st12"></path>		<path d="M89.57 545.25 L89.57 471.32" class="st12"></path>		<text x="17.93" y="511.89" class="st6">Debug NoC</text>		</g>	<g id="shape13-74" transform="translate(177.762,-254.994)">		<title></title>		<desc>Yes</desc>		<path d="M0 545.25 L-56.68 545.25 L-56.68 640.27" class="st8"></path>		<rect x="-67.0177" y="561.393" width="20.6782" height="14.4001" class="st10"></rect>		<text x="-67.02" y="572.19" class="st6">Yes</text>		</g>	<g id="shape14-81" transform="translate(311.135,-254.994)">		<title></title>		<desc>No</desc>		<path d="M0 545.25 L46.52 545.25 L46.52 640.27" class="st8"></path>		<rect x="38.8502" y="566.472" width="15.3399" height="14.4001" class="st10"></rect>		<text x="38.85" y="577.27" class="st6">No</text>		</g>	<g id="shape16-88" transform="translate(569.877,-254.264)">		<title></title>		<desc>--Non-secure WD bite --AHB timeout --XPU error --SMMU error --No error</desc>		<path d="M0 545.25 L-122.44 545.25 L-122.44 684.85 L-154.59 684.85" class="st8"></path>		<rect x="-162.951" y="547.079" width="81.0264" height="53.9992" class="st10"></rect>		<text x="-162.95" y="555.18" class="st13">Non-secure WD bite<tspan x="-147.7" dy="1.2em" class="st7">AHB timeout</tspan><tspan x="-142.44" dy="1.2em" class="st7">XPU error</tspan><tspan x="-146.94" dy="1.2em" class="st7">SMMU error</tspan><tspan x="-138.94" dy="1.2em" class="st7">No error</tspan></text>		</g>	<g id="shape18-99" transform="translate(703.25,-254.264)">		<title></title>		<desc>NoC error</desc>		<path d="M0 545.25 L58.27 545.25 L58.27 639.54" class="st8"></path>		<rect x="31.9359" y="560.23" width="52.6759" height="14.4001" class="st10"></rect>		<text x="31.94" y="571.03" class="st6">NoC error</text>		</g>	<g id="shape21-106" transform="translate(645.253,-208.25)">		<title></title>		<desc>AoP error</desc>		<path d="M-8.69 545.25 L-9.22 593.52" class="st8"></path>		<rect x="-34.6759" y="566.36" width="51.352" height="14.4001" class="st10"></rect>		<text x="-34.68" y="577.16" class="st6">AoP error</text>		</g></g>
</svg>

When there are no panic signatures in the kernel dmesg logs, verify the TrustZone Diagostic
      (diag) logs for errors like Non-Secure watchdog bite, NoC error. Following are some example
      kernel messages indicating panic, bugs, and subsystem crash issues.

### Identify kernel panic and bugs

From the kernel dmesg logs, you can determine if the reset was a kernel panic or bug.
        Following are some example logs indicating kernel panic or bugs:

- Panic (pattern 1: general
            error)

        18.800936: <6> Kernel panic - not syncing: Fatal exception
        18.800938: <6> SMP: stopping secondary CPUs
        18.800947: <6> CPU0: stopping
        Kernel panic - not syncing: Apps watchdog Bark received!Copy to clipboard
- Bug (pattern
            1)

        12.899532: <6> BUG kmalloc-128 (Not tainted): Redzone overwritten
        12.905418: <6> -----------------------------------------------------------------------------Copy to clipboard
- Bug (pattern
            2)

        320.510769: <6> ------------[ cut here ]------------
        320.510781: <6> kernel BUG at /local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_platform_manifest_refs_tags/drivers/platform/msm/ipa/ipa_v3/ipa_qmi_service.c:955!Copy to clipboard
- Bug (pattern
            3)

        [ 180.993861] BUG: spinlock lockup suspected on CPU#0, swapper/0/0
        [ 180.993883] lock: stop_lock+0x0/0x18, .magic: dead4ead, .owner: swapper/6/0, .owner_cpu: 6
        [ 181.015629] Causing a watchdog bite!Copy to clipboard

Note: Most bugs are followed by a kernel panic.

### Identify subsystem crash issues

To determine if the restart was due to a subsystem crash, verify the kernel dmesg logs. For
        example, the following log indicates that the aDSP subsystem crashed, as per the
          qcm6490.dtsi (node: remoteproc\_adsp: remoteproc@3000000).

0x000000000A27652C |   5198.790423:   qcom_q6v5_pas 3000000.remoteproc: fatal error received: err_inject_crash.c:413:Crash injected via Diag 
    0x000000000A276689 |   5198.801061:   remoteproc remoteproc2: crash detected in 3000000.remoteproc: type fatal error 
    0x000000000A2767A1 |   5198.809602:   remoteproc remoteproc2: handling crash #1 in 3000000.remoteproc 
    0x000000000A27688E |   5198.816837:   remoteproc remoteproc2: recovering 3000000.remoteproc 
    0x000000000A276971 |   5198.823784:   qcom_q6v5_pas 8a00000.remoteproc: subsystem event rejectedCopy to clipboard

### Identify system issues from TrustZone logs

This feature is currently available for users who have full access to the proprietary
        software shipped with Qualcomm Linux. For sample logs indicating errors in the TrustZone,
        see [Qualcomm Linux Debug Guide - Addendum](https://docs.qualcomm.com/bundle/resource/topics/80-70014-12A/identify_subsystem_to_be_debugged.html#identify_and_debug_errors_in_tz).

If there is no error in the TrustZone diag logs, it is most probably a secure watchdog bite
        issue. For more information on watchdog issues, see [Hardware reset](https://docs.qualcomm.com/doc/80-70014-12/topic/general_system_debugging.html#reset_hardware).

For information on debugging issues in TrustZone, see [Qualcomm Linux Security Guide](https://docs.qualcomm.com/bundle/publicresource/topics/80-70014-11/debug.html).

## Debugging features

Source: [https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html](https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html)

There are two approaches in debugging:

- **On-target debugging**
    On-target debugging is a basic way of debugging
                        software issues. This approach is powerful as most of the information is
                        accessible from the live device. However, this approach requires that the
                        issue is easily reproducible, and it also requires additional hardware
                        components like the host PC, debugger.

    SSH is the primary tool to
                        connect host with the device. To use SSH, you must enable SELinux Permissive
                        mode by following the steps mentioned [here](https://docs.qualcomm.com/bundle/publicresource/topics/80-70014-254/how_to.html#how-to-ssh-).
- **Off-target debugging**

    This debugging approach requires logs instead of an actual device and is a
                        convenient and efficient method of debugging. Debugging is done using memory
                        dump or logging tools. Various types of logs can be used for offline
                        debugging. You can use log files saved in the embedded file system too.
                        Since the RAM dump captures most of the memory region but limited hardware
                        register information, debugging a hardware-related issue is a challenge.

## On-target debug features

Source: [https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html](https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html)

The following debug features are available on the device and can be used at runtime.

### Perf utility

Perf is a powerful utility in the Linux ecosystem that facilitates performance
                analysis and profiling. This utility is included in the Linux kernel at the
                    tools/perf directory.

The Perf utility can help debug various aspects of system behavior, including CPU
                performance counters, tracepoints, kprobes, and uprobes (for dynamic tracing).
                Following are the key features of the Perf utility:
- **CPU performance counters**: These are the CPU hardware registers used
                        to track events like executed instructions, cache misses, and branch
                        mispredictions. These events form the basis for profiling applications and
                        identifying performance bottlenecks.
- **Tracepoints**: These are the tracepoints placed at logical locations in
                        the code. For example, system calls, network events, file system operations.
                        These tracepoints provide information like timestamps and stack traces with
                        minimal overhead.
- **Dynamic tracing**: Perf can dynamically create tracepoints using the
                            kprobes and uprobes frameworks. Therefore, this allows tracing in both
                            kernel and user space.

For more information on the Perf utility, see the following:
- [https://docs.kernel.org/trace/coresight/coresight-perf.html](https://docs.kernel.org/trace/coresight/coresight-perf.html)
- [https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md](https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md)
- [https://docs.kernel.org/trace/coresight/index.html](https://docs.kernel.org/trace/coresight/index.html)
- [https://perf.wiki.kernel.org/index.php/Tutorial](https://perf.wiki.kernel.org/index.php/Tutorial)

### Subsystem restart

The slave subsystems can restart independently when they crash without requiring a
                device reboot. This feature of slave restart is known as Subsystem Restart (SSR). It
                is recommended to use the SSR feature only on commercial devices and not during the
                development phase. This feature is implemented using the remoteproc framework
                available in the upstream Linux kernel.

By default, the SSR is disabled to enable a full memory dump for debugging issues.
                For more information on the SSR feature, see [Remoteproc subsystem](https://docs.qualcomm.com/bundle/publicresource/topics/80-70014-3/features.html#Toc187817603).

The SSR dump can be enabled as part of the SSR operation. The SSR dump is generated
                when a subsystem restarts and if the SSR dump is enabled. You may prefer using the
                SSR dump for debugging as the size of the SSR dump is smaller than the full memory
                dump. However, the entire RAM dump is required to debug certain subsystem
                crashes.

For information on how to enable and capture Subsystem dumps, see [Subsystem dumps](https://docs.qualcomm.com/doc/80-70014-12/topic/debugging_linux_kernel.html#subsystem_ram_dumps).

### Force subsystem reset

This feature can be used in debugging use cases that require you to restart a
                subsystem.

This feature is currently available for users who have full access to the proprietary
                software shipped with Qualcomm Linux. To force reset the subsystem, you can run the
                diag command using the QXDM tool. For information on QXDM tool commands and their
                usage, see [Qualcomm Linux Debug Guide - Addendum](https://docs.qualcomm.com/bundle/resource/topics/80-70014-12A/debug-non-hlos-subsystems.html#qxdm_professional).

## Off-target debug features

Source: [https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html](https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html)

Following features and tools are available for offline debugging.

### RAM dump

RAM dump is a snapshot of the entire memory (RAM) at the time of failure. RAM dump
                can be analyzed using various tools including QCAP, RAM parser, and TRACE32
                simulator. For information on how to collect and parse RAM dumps, see [Collect and parse RAM dumps](https://docs.qualcomm.com/doc/80-70014-12/topic/debugging_linux_kernel.html#collect_and_parse_ram_dumps).

### QXDM Professional™ 

QXDM Professional tool can be used to debug various subsystems.

This tool is currently available for users who have full access to the proprietary
                software shipped with Qualcomm Linux. For information on QXDM tool commands and
                their usage, see [Qualcomm Linux Debug Guide - Addendum](https://docs.qualcomm.com/bundle/resource/topics/80-70014-12A/debug-non-hlos-subsystems.html#qxdm_professional).

## Common issues

Source: [https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html](https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html)

If there is a system malfunction, it can be categorized into one of the following:

- [Linux kernel space issues](https://docs.qualcomm.com/doc/80-70014-12/topic/debugging_linux_kernel.html)
- [System issues](https://docs.qualcomm.com/doc/80-70014-12/topic/general_system_debugging.html)
    - [Watchdog issues](https://docs.qualcomm.com/doc/80-70014-12/topic/general_system_debugging.html#watchdog_timeout)
    - [Bus
                                hang and timeout issues](https://docs.qualcomm.com/doc/80-70014-12/topic/general_system_debugging.html#erroneous_transaction_on_bus_error_and_timeout)
    - [Hardware reset issues](https://docs.qualcomm.com/doc/80-70014-12/topic/general_system_debugging.html#reset_hardware)

## Miscellaneous issues

Source: [https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html](https://docs.qualcomm.com/doc/80-70014-12/topic/Debug-overview.html)

Following are the miscellaneous issues that you may encounter:
- Device freeze issues due to software bugs
- Linux application related issues
- Random resets due to hardware issues related to PCB

To debug Linux kernel related issues, see [Debug Linux kernel space](https://docs.qualcomm.com/doc/80-70014-12/topic/debugging_linux_kernel.html).

To debug Linux application related issues, see [Debug Linux user space](https://docs.qualcomm.com/doc/80-70014-12/topic/using_open_source_debug_tools.html).

Last Published: Jul 12, 2024

[Next Topic
Debug Linux user space](https://docs.qualcomm.com/bundle/publicresource/80-70014-12/topics/using_open_source_debug_tools.md)