# Architecture

## Overview

The Qualcomm Snapdragon X CPU is a high performance, ARMv8-compliant 64-bit CPU.
It is a custom processor based on the ARMv8 Instruction Set Architecture (ISA) specification.
The Snapdragon X processor only supports AArch64 instructions; AAarch32 instructions are not supported.
It implements the NEON instructions for vectorized operations.
The Snapdragon X CPU family comes in multiple core configurations.

| Processor | Configuration |
| --- | --- |
| Snapdragon X Elite | 12 performance cores |

## General primer on the ARM architecture

The ARM architecture is a Reduced Instruction Set Computer (RISC) based architecture with the following features:

- Low-cycle execution time
- Heavy pipelining of instructions
- Low-power execution
- High-performance

The ARM system employs load/store instructions to fetch data into and write data from internal registers.
Almost all operations are done on register values, using registers values as both source and destination.
Because ARM is a RISC architecture, the instructions tend to be focused on performing a single operation, allowing simplified instruction decode and, subsequently, reducing the overall complexity of the chip.

There are three ARM architecture profiles:

- **A-profile (Applications)** – Designed for high performance systems and made to run complex operating systems such as Windows and Linux.
- **R-profile (Real-time)** – Designed for systems with real-time requirements and used in networking and embedded control systems.
- **M-profile (Microcontroller)** – Designed for microcontrollers and found in many IoT devices.

Snapdragon X is a custom designed CPU conforming to the ARM A-profile.

Within a profile, there are various versions of that architectural specification, which change over time as features are improved or new features added.
The Snapdragon X CPU implements version 8 of the ARM A-profile specification, referred to as ARMv8-A.

Within a version of a profile, there may be minor versions that are released to add new features.
These are referred to as .x extensions. For example, the ARMv8.1-A extension adds atomic memory instructions.
Each extension includes mandatory features and optional features.
Any processor that is said to be compliant at a particular extension level means that it implements all the mandatory features of that extension level and the mandatory features of all previous extensions.

The Snapdragon X processor is ARMv8.7-A compliant, which means it implements the mandatory features of all extensions from ARMv8.0-A (the base specification) to ARMv8.7-A.
Some of these notable extensions include atomic instructions, CRC instructions, and floating point to integer instructions.
Some notable features NOT implemented include AArch32 support, Big-endian support, and SVE instructions.

## Introduction to ARM Assembly

The ARMv8 class of processors are 64-bit processors, which have 64-bits of data and address, 31 64-bit general purpose internal registers, a set of 32 128-bit wide vector registers,
and are designed to run complex operating systems like Windows or Linux. ARM processors are the dominant processor for cellphones and mobile devices.

Example instructions include loading a register from memory, writing a memory location from a register, or adding two register values together and placing the result in a register.
Registers in ARM64 are numbered from R0 to R30, with R31 being a dedicated zero register that always reads/writes as 0.
The full 64-bits can be accessed in assembly as *Xn* or the lower 32-bits can be accessed as *Wn*.
Most instructions fall into one of the following categories: load/store instructions, data processing instructions, and branch instructions.
There are other instructions that perform more specialized functions, such as setting system bits, but this primer focuses on basic instructions that fall into the three categories listed above.

Values are moved into and out of registers to perform any operations on them.
There are load/store instructions to move data from/into memory into/from registers.
Operations (e.g., add) are performed on data that resides in the registers.
In ARM assembly, the resultant register is typically the first listed in the instruction.

For example, the following instruction adds two numbers together (the number stored in x3 and the number stored in x2) and stores the result in another register (x4):

add         x4, x3, x2
    Copy to clipboard

An immediate value can be used in many instructions with the prefix #.
Modifying the case above to add the number 17 to what is in x3, the instruction would be:

add         x4, x3, #17
    Copy to clipboard

To load a register with data from memory, one of the load instructions could be used:

ldr         x1, [x7]
    Copy to clipboard

This instruction treats the data in x7 as an address to load from (a pointer).
This will load the data from that memory address into register x1.
A store instruction is similar:

str         x1, [x7]
    Copy to clipboard

Note that in the case of a store instruction, the source register is the first parameter, and the destination address is the second parameter.
Putting this all together to load two 64-bit numbers, add them together, and store the result, the assembly would look like:

ldr         x8, [x8]
    ldr         x9, [x9]
    add         x8, x8, x9
    str         x8, [x10]
    Copy to clipboard

And, likewise, doing this operation using only 32-bit numbers would look like:

ldr         w8, [x8]
    ldr         w9, [x9]
    add         w8, w8, w9
    str         w8, [x10]
    Copy to clipboard

Unconditional branching can be achieved with the branch instruction:

b           label
    Copy to clipboard

Where label is the assembly label of the location to which to branch.
This label will be replaced with an actual address during compilation.

An example of a conditional branch would be:

ldr         w8, [x9]
    cmp         w8, #10
    ble         label
    Copy to clipboard

This statement indicates that if the value of w8 is less than 10, then branch to the address at the label.

In addition to the 31 general purpose registers, Snapdragon X has 32 128-bit vector registers to support special vector operations.
These vector operations are supported by advanced Single Instruction Multiple Data (SIMD) instructions.
The SIMD instructions support both floating point (single and double precision) and integer operations and can operate on multiple lanes of data simultaneously.

## ARM64 and ARM64EC

The basic application binary interface (ABI) for Windows when compiled and run on ARMv8 processors in 64-bit mode, for the most part, follows the ARM standard AArch64 EABI.
The ABI defines the calling convention, stack usage, and data alignment used for executables running under Windows.
Windows uses ARM64 when referring to the use of the ARM64 ABI as the calling convention within an executable.

ARM64EC (*Emulation Compatible*) is a new ABI for building apps for Windows 11 on ARM.
ARM64EC enables ARM64 binaries to run natively and interoperably with x64 code.
It is a Windows 11 feature that requires the use of the Windows 11 SDK and is not available on Windows 10 on ARM.
This makes ARM64EC code and x64 code interoperable.
The operating system emulates the x64 portion of the binary.

Code built as ARM64EC is interoperable with x64 code running under emulation within the same process.
The ARM64EC code in the process runs with native performance, while any x64 code runs using emulation that comes built-in with Windows 11.
ARM64EC guarantees interoperability with x64 by using the same calling conventions, stack usage, and preprocessor definitions as x64.
Code using the ARM64EC ABI is not interoperable with code written using the ARM64 ABI, as the stack layout and calling conventions are different.

It is important to note that ARM64EC code is native ARM64 code.
It is not emulated code, though it can interoperate with emulated x64 code running on a Windows 11 ARM64 machine.
ARM64EC was designed to deliver native-level functionality and performance, while providing transparent and direct interoperability with x64 code running under emulation.

## Differences from X64

The ARM architecture is a RISC-based architecture as opposed to X64’s Complex Instruction Set Computer (CISC) based architecture.
A RISC system focuses on doing fewer operations per instruction while having a low clock cycle/instruction.
In comparison, a CISC system focuses on completing more operations per instruction while having a generally higher clock cycle/instruction.
For example, a single instruction in a CISC-based CPU may be able to read a value from memory, increment it, and store the result back to memory.
A RISC-based CPU, however, would use three separate instructions, one for the load, one for the increment, and one for the store.
However, this does not necessarily directly translate to 3X the clock cycles necessary to complete the instructions in comparison to a CISC machine.
Both systems must fetch from memory, perform the operation, and store the value back out to memory.
A CISC system simply breaks the single instruction down into micro operations to perform the required operations.

X64 systems typically have fewer general purpose registers than ARM64, with X64 systems having 16 64-bit general purpose registers and ARM64 having 31 64-bit general-purpose registers.
x64 systems have either 16 128-bit floating point registers (AVX), 16 256-bit floating point registers (AVX2), or 32 512-bit floating point registers (AVX-512).
ARM64 has 32 128-bit floating point registers.
In both architectures, the registers can be used in floating point operations and Single Instruction Multiple Data (SIMD) operations.

The ARM64 processors do not employ simultaneous multithreading, which is referred to as Hyperthreading in x86-64 cores.
Snapdragon X ARM64 processors do employ out of order execution, which can provide a significant performance improvement.
These differences between RISC and CISC in instruction complexity, it also implies a more subtle difference between what is referred to as weakly ordered versus strongly ordered.
The ARM architecture is referred to as a weakly ordered system, while the Intel architecture is referred to as a strongly ordered system.

Last Published: Mar 03, 2026

[Previous Topic
Visual Studio](https://docs.qualcomm.com/bundle/publicresource/80-78185-2/topics/tools.md) [Next Topic
Performance optimization](https://docs.qualcomm.com/bundle/publicresource/80-78185-2/topics/performance_optimization.md)