# Registers

HVX is a load-store architecture where compute operands originate
from registers and load/store instructions move data between memory
and registers.

The vector registers are not for addressing or control information,
but rather hold intermediate vector computation results. They are
only accessible using HVX compute or load/store instructions.

The vector predicate registers contain the decision bits for each
8-bit quantity of the vector data registers.

## Vector data registers

The HVX coprocessor contains 32 vector registers (named V0 through
V31). These registers store operand data for the vector instructions.

For example:

V1 = vmem(R0)              // Load a vector of data
                               // from address R0
    
    V4.w = vadd(V2.w, V3.w)    // Add each word in V2
                               // to corresponding word in V3
    Copy to clipboard

The vector data registers can be specified as register pairs
representing a double-vector of data. For example:

V5:4.w = vadd(V3:2.w, V1:0.w)    // Add each word in V1:0 to
                                     // corresponding word in V3:2
    Copy to clipboard

### Reversed vector pairs

Reversed pairs are supported for vector pair register operands and input operands.

The example below demonstrates the destination and first source register pair
having its register number flipped.

Original instructions

v7:6.b = vadd(v3:2.b, v4:5.b)      // Add vector pairs of bytes
    v1:0.h = vadd(v13:12.h, v5:4.h)    // Add vector pairs of halfwords
    Copy to clipboard

Reversed instructions

v6:7.b = vadd(v2:3.b, v4:5.b)      // Add vector pairs of bytes
    v0:1.h = vadd(v12:13.h, v5:4.h)    // Add vector pairs of halfwords
    Copy to clipboard

This changes the instruction input operations. In the examples above, the
original byte add computes:

V7.b = V3.b + V4.b
    V6.b = V2.b + V5.b
    Copy to clipboard

The reversed byte add computes:

V6.b = V2.b + V4.b
    V7.b = V3.b + V5.b
    Copy to clipboard

### VRF to GRF transfers

[VRF to GRF transfer instructions](https://docs.qualcomm.com/doc/80-N2040-61/topic/registers.html#v79-tbl-vrf-to-grf-transfer-instructions)
lists the Hexagon instructions that transfer values between the vector register
file (VRF) and the general register file (GRF).

A packet can contain up to two insert instructions or one extract
instruction. The extract instruction incurs a long-latency stall and
is primarily meant for debug purposes.

VRF to GRF transfer instructions

| **Syntax** | **Behavior** | **Description** |
| --- | --- | --- |
| Rd.w=extractw(Vu,Rs) | Rd = Vu.uw[Rs&0xF]; | Extract word from a vector into Rd with location specified by<br>Rs. Primarily meant for debug. |
| Vx.w=insertw(Rss) | Vx.uw[Rss.w[1]&0xF] = Rss.w[0]; | Insert word into vector at specified location. The low word in<br>Rss specifies the data to insert, and the upper word<br>specifies the location. |

## Vector predicate registers

Vector predicate registers hold the result of vector compare
instructions, for example:

Q3 = vcmp.eq(V2.w, V5.w)
    Copy to clipboard

This example compares each 32-bit field of V2 and V5 and the
corresponding 4-bit field is set in the corresponding predicate
register Q3. For half-word operations, two bits are set per
half-word. For byte operations, one bit is set per byte.

The vmux instruction frequently uses vector predicate instruction.
This takes each bit in the predicate register and selects the first
or second byte in each source, and places it in the corresponding
destination output field.

V4 = vmux(Q2, V5, V6)
    Copy to clipboard

Last Published: Jan 16, 2025

[Previous Topic
HVX Revision history](https://docs.qualcomm.com/bundle/publicresource/80-N2040-61/topics/hvx-revision.md) [Next Topic
Memory](https://docs.qualcomm.com/bundle/publicresource/80-N2040-61/topics/memory.md)