# Program flow The Hexagon processor supports the following program flow facilities. ## Conditional instructions Many Hexagon processor instructions can conditionally execute. For example: if (P0) R0 = memw(R2) // Conditionally load word if P0 if (!P1) jump label // Conditionally jump if not P1 Copy to clipboard The following instructions can be specified as conditional: - Jumps and calls - Many load and store instructions - Logical instructions (including AND/OR/XOR) - Shift halfword - 32-bit add/subtract by register or short immediate - Sign and zero extend - 32-bit register transfer and 64-bit combine word - Register transfer immediate - Deallocate frame and return For more information, see [conditional load/stores](https://docs.qualcomm.com/doc/80-N2040-60/topic/memory.html#v79-prm-conditional-loadstores) and [Conditional execution](https://docs.qualcomm.com/doc/80-N2040-60/topic/conditional-execution.html). ## Hardware loops The Hexagon processor includes hardware loop instructions that perform loop branches with zero overhead. For example: loop0(start,#3) // loop 3 times start: { R0 = mpyi(R0,R0) } :endloop0 Copy to clipboard Two sets of hardware loop instructions are provided (loop0 and loop1) to nest hardware loops one level deep. For example: // Sum the rows of a 100x200 matrix. loop1(outer_start,#100) outer_start: R0 = #0 loop0(inner_start,#200) inner_start: R3 = memw(R1++#4) { R0 = add(R0,R3) }:endloop0 { memw(R2++#4) = R0 }:endloop1 Copy to clipboard Use the hardware loop instructions as follows: - For non-nested loops, loop0 is used. - For nested loops, loop0 is used for the inner loop, and loop1 for the outer loop. Note If a program must create loops nested more than one level deep, the two innermost loops can be implemented as hardware loops, with the remaining outer loops implemented as software branches. Each hardware loop is associated with a pair of dedicated loop registers: - The loop start address register SAn is set to the address of the first instruction in the loop (which is typically expressed in assembly language as a label). - The loop count register LCn is set to a 32-bit unsigned value that specifies the number of loop iterations to perform. When the PC reaches the end of the loop, LCn is examined to determine whether to repeat or exit the loop. The hardware loop setup instruction sets both of these registers at the same time - typically there is no need to set them individually. However, because the loop registers completely specify the hardware loop state, saving and restoring the registers (either automatically by a processor interrupt or manually by the programmer) enables a suspended hardware loop to resume normally when its loop registers are reloaded with the saved values. The Hexagon processor provides two sets of loop registers for the two hardware loops: - SA0 and LC0 are used by loop0 - SA1 and LC1 are used by loop1 Hardware loop instructions | **Syntax** | **Description** | | --- | --- | | loopN(start, Rs) | Hardware loop with register loop count.

Set registers SAn and LCn for hardware loop N:

SAn is assigned the specified start address of the loop.

LCn is assigned the value of general register Rs.

Note

The loop start operand is encoded as a PC-relative immediate value. | | loopN(start, #count) | Hardware loop with immediate loop count.

Set registers SAn and LCn for hardware loop N:

SAn is assigned the specified start address of the loop.

LCn is assigned the specified immediate value (0 to 1023).

Note

The loop start operand is encoded as a PC-relative immediate value. | | :endloopN | Hardware loop end instruction. Performs the following operation:

if (LCn > 1) {PC = SAn; LCn = LCn-1}

Note

This instruction appears in assembly as a suffix
appended to the last packet in the loop. It is encoded in
the last packet. | | SAn = Rs | Set loop start address to general register Rs | | LCn = Rs | Set loop count to general register Rs | Note The loop instructions are assigned to instruction class CR. ### Loop setup To set up a hardware loop, the loop registers SAn and LCn must be set to the proper values. This is done in two ways: - A loopN instruction - Register transfers to SAn and LCn The loopN instruction performs all the work of setting SAn and LCn. For example: loop0(start,#3) // SA0=&start, LC0=3 start: { R0 = mpyi(R0,R0) } :endloop0 Copy to clipboard In this example, the hardware loop (consisting of a single multiply instruction) executes three times. The loop0 instruction sets register SA0 to the address value of label start, and LC0 to 3. Loop counts are limited to the range 0 to1023 when they are expressed as immediate values in loopN. If the intended loop count exceeds this range, it must be specified as a register value. For example: Using loopN: R1 = #20000; loop0(start,R1) // LC0=20000, SA0=&start start: { R0 = mpyi(R0,R0) } :endloop0 Copy to clipboard Using register transfers: R1 = #20000 LC0 = R1 // LC0=20000 R1 = #start SA0 = R1 // SA0=&start start: { R0 = mpyi(R0,R0) } :endloop0 Copy to clipboard If a loopN instruction is located too far from its loop start address, the PC-relative offset value that specifies the start address can exceed the maximum range of the start address operand of the instruction . If this occurs, either move the loopN instruction closer to the loop start, or specify the loop start address as a 32-bit constant ([constant extenders](https://docs.qualcomm.com/doc/80-N2040-60/topic/instruction-encoding.html#v79-prm-constant-extenders)). For example, using 32-bit constants: R1 = #20000; loop0(##start,R1) // LC0=20000, SA0=&start ... Copy to clipboard ### Loop end The loop end instruction indicates the last packet in a hardware loop. It is expressed in assembly language by appending the packet with the symbol :endloopN, where N specifies the hardware loop (0 or 1). For example: loop0(start,#3) start: { R0 = mpyi(R0,R0) } :endloop0 // last packet in loop Copy to clipboard The last instruction in the loop must always be expressed in assembly language as a packet (using curly braces), even if it is the only instruction in the packet. Nested hardware loops can specify the same instruction as the end of both the inner and outer loops. For example: // Sum the rows of a 100x200 matrix. // Software pipeline the outer loop. p0 = cmp.gt(R0,R0) // p0 = false loop1(outer_start,#100) outer_start: { if (p0) memw(R2++#4) = R0 p0 = cmp.eq(R0,R0) // p0 = true R0 = #0 loop0(inner_start,#200) } inner_start: R3 = memw(R1++#4) { R0 = add(R0,R3) }:endloop0:endloop1 memw(R2++#4) = R0 Copy to clipboard Though endloopN behaves like a regular instruction (by implementing the loop test and branch), it does not execute in any instruction slot, and does not count as an instruction in the packet. Therefore a single instruction packet that is marked as a loop end can perform up to six operations: - Four regular instructions (the normal limit for an instruction packet) - The endloop0 test and branch - The endloop1 test and branch Note The endloopN instruction is encoded in the instruction packet ([loop packets](https://docs.qualcomm.com/doc/80-N2040-60/topic/instruction-encoding.html#v79-prm-loop-packets)). ### Loop execution After a hardware loop is set up, the loop body always executes at least one time regardless of the specified loop count (because the loop count is not examined until the last instruction in the loop). Therefore, if a loop must optionally execute zero times, it must be preceded with an explicit conditional branch. For example: loop0(start,R1) P0 = cmp.eq(R1,#0) if (P0) jump skip start: { R0 = mpyi(R0,R0) } :endloop0 skip: Copy to clipboard In this example, a hardware loop is set up with the loop count in R1, but if the value in R1 is zero a software branch skips over the loop body. After the loop end instruction of a hardware loop executes, the Hexagon processor examines the value in the corresponding loop count register: - If the value is greater than 1, the processor decrements the loop count register and performs a zero-cycle branch to the loop start address. - If the value is less than or equal to 1, the processor resumes program execution at the instruction immediately following the loop end instruction. Note Because nested hardware loops can share loop end instruction, the processor can examine both loop count registers in a single operation. ### Pipelined hardware loops Software pipelined loops are common for VLIW architectures such as the Hexagon processor. They offer increased code performance in loops by overlapping multiple loop iterations. A software pipeline has three sections: - A prologue in which the loop is primed - A kernel (or steady-state) portion - An epilogue that drains the pipeline [Software pipelined loop](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-software-pipelined-loop) shows an example where the kernel section of the pipelined loop performs three iterations of the loop in parallel: - The load for iteration N+2 - The multiply for iteration N+1 - The store for iteration N **Software pipelined loop** int foo(int *A, int *result) { int i; for (i=0;i<100;i++) { result[i]= A[i]*A[i]; } } Copy to clipboard foo: { R3 = R1 loop0(.kernel,#98) // Decrease loop count by 2 } R1 = memw(R0++#4) // First prologue stage { R1 = memw(R0++#4) // Second prologue stage R2 = mpyi(R1,R1) } .falign .kernel: { R1 = memw(R0++#4) // Kernel R2 = mpyi(R1,R1) memw(R3++#4) = R2 }:endloop0 { R2 = mpyi(R1,R1) // First epilogue stage memw(R3++#4) = R2 } memw(R3++#4) = R2 // Second epilogue stage jumpr lr Copy to clipboard A drawback to software pipelining is the extra code necessary for the prologue and epilogue sections of a pipelined loop. To address this issue, the Hexagon processor provides the spNloop0 instruction, where the “N” in the instruction name indicates a digit in the range 1 to 3. For example: P3 = sp2loop0(start,#10) // Set up pipelined loop Copy to clipboard The spNloop0 instruction is a variant of the loop0 instruction: it sets up a normal hardware loop using SA0 and LC0, but also performs the following additional operations: - When the spNloop0 instruction executes, it assigns the truth value false to the predicate register P3. - After the associated loop executes N times, P3 is automatically set to true. This feature (known as automatic predicate control) enables the store instructions in the kernel section of a pipelined loop to conditionally execute by P3 and thus - because of the way spNloop0 controls P3 - not execute during the pipeline warm-up. This can reduce the code size of software pipelined loops by eliminating the need for prologue code. The spNloop0 instruction cannot be used to eliminate the epilogue code from a pipelined loop; however, in some cases it is possible to do this through the use of programming techniques. Typically, the issue affecting the removal of epilogue code is load safety. If the kernel section of a pipelined loop can safely access past the end of its arrays - either because it is known as safe, or because the arrays have been padded at the end - epilogue code is unnecessary. However, if load safety cannot be ensured, explicit epilogue code is required to drain the software pipeline. [Software pipelined loop (using spNloop0)](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-software-pipelined-loop-using-spnloop0) shows how spNloop0 and load safety simplify the code shown in [software pipelined loop](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-software-pipelined-loop). **Software pipelined loop (using spNloop0)** int foo(int *A, int *result) { int i; for (i=0;i<100;i++) { result[i]= A[i]*A[i]; } } Copy to clipboard foo: { // load safety assumed P3 = sp2loop0(.kernel,#102) // Set up pipelined loop R3 = R1 } .falign .kernel: { R1 = memw(R0++#4) // Kernel R2 = mpyi(R1,R1) if (P3) memw(R3++#4) = R2 }:endloop0 jumpr lr Copy to clipboard Note The count value that spNloop0 uses to control the P3 setting is stored in the user status register USR.LPCFG. ### Loop restrictions Hardware loops have the following restrictions: - The loop setup packet in loopN or spNloop0 ([pipelined hardware loops](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-pipelined-hardware-loops)) cannot contain a speculative indirect jump, new-value compare jump, or dealloc\_return. - The last packet in a hardware loop cannot contain any program flow instructions (including jumps or calls). - The loop end packet in loop0 cannot contain any instruction that changes SA0 or LC0. Similarly, the loop end packet in loop1 cannot contain any instruction that changes SA1 or LC1. - The loop end packet in spNloop0 cannot contain any instruction that changes P3. Note SA1 and LC1 can be changed at the end of loop0 while SA0 and LC0 can be changed at the end of loop1. ## Software branches Unlike hardware loops, software branches use an explicit instruction to perform a branch operation. Software branches include jumps, calls, and returns. The target address for branch instructions is specified as register indirect or PC-relative offsets. PC-relative offsets are normally less than 32 bits, but can be specified as 32 bits by using the appropriate syntax in the target operand ([extended branches](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-extended-branches)). Branch instructions are unconditional or conditional, with the execution of conditional instructions controlled by a predicate expression. Software branch instructions | **Syntax** | **Operation** | | --- | --- | | [if (pred\_expr)] jump label

[if (pred\_expr)] jumpr Rs | Branch to address specified by register Rs or PCrelative
offset.

Can be conditionally executed. | | [if (pred\_expr)] call label

[if (pred\_expr)] callr Rs | Branch to address specified by register Rs or PCrelative
offset.

Store subroutine return address in link register LR.

Can be conditionally executed. | | [if (pred\_expr)] jumpr LR | Branch to subroutine return address contained in link
register LR.

Can be conditionally executed. | ### Jumps Jump instructions change the program flow to a target address, which are specified by either a register or a PC-relative immediate value. Jump instructions can be conditional based on the value of a predicate expression. Jump instructions | **Syntax** | **Operation** | | --- | --- | | jump label | Direct jump.

Branch to address specified by label.

Label is encoded as PC-relative signed immediate value. | | jumpr Rs | Indirect jump.

Branch to address contained in general register Rs. | | if ([!]Ps) jump label

if ([!]Ps) jumpr Rs | Conditional jump.

Perform jump if predicate expression evaluates to true. | Note Conditional jumps can be specified as [speculative](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-speculative-jumps). ### Calls Call instructions jump to subroutines. The instruction performs a jump to the target address and also stores the return address in the link register LR. The forms of call are functionally similar to jump instructions and include both PC-relative and register indirect in both unconditional and conditional forms. Call instructions | **Syntax** | **Operation** | | --- | --- | | call label | Direct subroutine call.

Branch to address specified by label, and store return
address in register LR. Label is encoded as PC-relative
signed immediate value. | | callr Rs | Indirect subroutine call.

Branch to address contained in general register Rs, and
store return address in register LR. | | if ([!]Ps) call label

if ([!]Ps) callr Rs | Conditional call.

If predicate expression evaluates to true, perform
subroutine call to specified target address. | ### Returns Return instructions return from a subroutine. The instruction performs an indirect jump to the subroutine return address stored in link register LR. Returns are implemented as jump register indirect instructions, and support both unconditional and conditional forms. Return instructions | **Syntax** | **Operation** | | --- | --- | | jumpr LR | Subroutine return.

Branch to subroutine return address contained in link
register LR. | | if ([!]Ps) jumpr LR | Conditional subroutine return.

If predicate expression evaluates to true, perform
subroutine return to specified target address. | | dealloc\_return | Subroutine return with stack frame deallocate.

Perform deallocframe operation
([stack instructions](https://docs.qualcomm.com/doc/80-N2040-60/topic/software-stack.html#v79-prm-stack-instructions))
and then perform subroutine return to the target address
loaded by deallocframe from the link register. | | if ([!]Ps) dealloc\_return | Conditional subroutine return with stack frame deallocate.

If predicate expression evaluates to true, perform
deallocframe and subroutine return to the target address
loaded by deallocframe from the link register. | Note The link register LR is an alias of general register R31. Therefore subroutine returns can be performed with the instruction jumpr R31. The conditional subroutine returns (including dealloc\_return) can be specified as [speculative](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-speculative-jumps)). ### Extended branches When a jump or call instruction specifies a PC-relative offset as the branch target, the offset value is normally encoded in significantly less than 32 bits. This can limit the ability for programs to specify “long” branches, which span a large range of the memory address space of the processor. To support long branches, the jump and call instructions have special versions that encode a full 32-bit value as the PC-relative offset. Note Such instructions use an extra word to store the 32-bit offset ([constant extenders](https://docs.qualcomm.com/doc/80-N2040-60/topic/instruction-encoding.html#v79-prm-constant-extenders)). The size of a PC-relative branch offset is expressed in assembly language by optionally prefixing the target label with the symbol “##” or “#”: - “##” specifies that the assembler *must* use a 32-bit offset - “#” specifies that the assembler must *not* use a 32-bit offset - No “#” specifies that the assembler use a 32-bit offset only if necessary For example: jump ##label // 32-bit offset call #label // Non 32-bit offset jump label // Offset size determined by assembler Copy to clipboard ### Branches to and from packets Instruction packets are atomic: even if they contain multiple instructions, they are referenced only by the address of the first instruction in the packet. Therefore, branches to a packet can target only the first instruction of the packet. Packets can contain up to two branches ([dual-jumps](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-dual-jumps)). The branch destination can target the current packet or the beginning of another packet. A branch does not interrupt the execution of the current packet: all of the instructions in the packet execute, even if they appear in the assembly source after the branch instruction. If a packet is at the end of a hardware loop, it cannot contain a branch instruction. ## Speculative jumps Conditional instructions normally depend on predicates that are generated in a previous instruction packet. However, [Dot-new predicates](https://docs.qualcomm.com/doc/80-N2040-60/topic/conditional-execution.html#v79-prm-dot-new-predicates) enable conditional instructions to use a predicate generated in the same packet that contains the conditional instruction. When dot-new predicates are used with a conditional jump, the resulting instruction is called a speculative jump. For example: { P0 = cmp.eq(R9,#16) // single-packet compare-and-jump IF (P0.new) jumpr:t R11 // ... enabled by use of P0.new } Copy to clipboard Speculative jumps require the programmer to specify a direction hint in the jump instruction, indicating whether the conditional jump is expected. The hint initializes the dynamic branch predictor of the Hexagon processor. Whenever the predictor is wrong, the speculative jump instruction takes two cycles to execute instead of one (due to a pipeline stall). Hints can improve program performance by indicating how speculative jumps are expected to execute over the course of a program: the more often the specified hint indicates how the instruction actually executes, the better the performance. Hints are expressed in assembly language by appending the suffix “:t” or “:nt” to the jump instruction symbol. For example: - jump:t - The jump instruction is most often taken - jump:nt - The jump instruction is most often not taken In addition to dot-new predicates, speculative jumps also accept conditional arithmetic expressions (=0, !=0, >=0, <=0) involving the general register Rs. Speculative jump instructions | **Syntax** | **Operation** | | --- | --- | | if ([!]Ps.new) jump:t label

if ([!]Ps.new) jump:nt label | Speculative direct jump.

If predicate expression evaluates to true, jump to address
specified by label. | | if ([!]Ps.new) jumpr:t Rs

if ([!]Ps.new) jumpr:nt Rs | Speculative indirect jump.

If predicate expression evaluates to true, jump to address
in register Rs. | | if (Rs == #0) jump:t label

if (Rs == #0) jump:nt label | Speculative direct jump.

If predicate Rs = 0 is true, jump to address specified by label. | | if (Rs != #0) jump:t label

if (Rs != #0) jump:nt label | Speculative direct jump.

If predicate Rs != 0 is true, jump to address specified by label. | | if (Rs >= #0) jump:t label

if (Rs >= #0) jump:nt label | Speculative direct jump.

If predicate Rs >= 0 is true, jump to address specified by label. | | if (Rs <= #0) jump:t label

if (Rs <= #0) jump:nt label | Speculative direct jump.

If predicate Rs <= 0 is true, jump to address specified by label. | Note The hints :t and :nt interact with the predicate value to determine the instruction cycle count. Speculative indirect jumps are not supported with register Rs predicates. ## Compare jumps To reduce code size, the Hexagon processor supports a compound instruction that combines a compare with a speculative jump in a single 32-bit instruction. For example: { p0 = cmp.eq (R2,R5) // Single-instr compare-and-jump if (p0.new) jump:nt target // Enabled by compound instr } Copy to clipboard The register operands used in a compare jump are limited to R0 through R7 or R16 through R23 ([Sub-instructions](https://docs.qualcomm.com/doc/80-N2040-60/topic/instruction-encoding.html#v79-tbl-sub-instructions)). The compare and jump instructions that are used in a compare jump are limited to the instructions listed in [Compare jump instructions](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-compare-jump-instructions). The compare can use predicate P0 or P1, while the jump must specify the same predicate that is set in the compare. A compare jump instruction is expressed in assembly source as two independent compare and jump instructions in a packet. The assembler translates the two instructions into a single compound instruction. **Compare jump instructions** - **Compare Instruction** Pd = cmp.eq (Rs, Rt) Pd = cmp.gt (Rs, Rt) Pd = cmp.gtu (Rs, Rt) Pd = cmp.eq (Rs,#U5) Pd = cmp.gt (Rs,#U5) Pd = cmp.gtu (Rs,#U5) Pd = cmp.eq (Rs,#-1) Pd = cmp.gt (Rs,#-1) Pd = tstbit (Rs, #0) Copy to clipboard - **Jump Instruction** IF (Pd.new) jump:t label IF (Pd.new) jump:nt label IF (!Pd.new) jump:t label IF (!Pd.new) jump:nt label Copy to clipboard ### New-value compare jumps A compare jump instruction can access a register that is assigned a new value in the same instruction packet ([instruction packets](https://docs.qualcomm.com/doc/80-N2040-60/topic/instructions.html#v79-prm-instruction-packets)). This feature is expressed in assembly language by the following changes: - Appending the suffix “.new” to the new-value register in the compare - Rewriting the compare jump so its constituent compare and jump operations appear as a single conditional instruction For example: // load-compare-and-jump packet enabled by new-value compare jump { R0 = memw(R2+#8) if (cmp.eq(R0.new,#0)) jump:nt target } Copy to clipboard New-value compare jump instructions have the following restrictions: - They are limited to the instruction forms listed in [New-value compare jump instructions](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-new-value-compare-jump-instructions). - They cannot be combined with another jump instruction in the same packet. - If an instruction produces a 64-bit result or performs a floating-point operation, its result registers cannot be used as the new-value register. - If an instruction uses auto-increment or absolute-set addressing mode ([addressing modes](https://docs.qualcomm.com/doc/80-N2040-60/topic/memory.html#v79-prm-addressing-modes)), its address register cannot be used as the new-value register. - If the instruction that sets a new-value register is conditional ([consuming scalar predicates](https://docs.qualcomm.com/doc/80-N2040-60/topic/conditional-execution.html#v79-prm-consuming-scalar-predicates)), it must always execute. If the specified jump direction hint is wrong (speculative jumps <v79-prm-speculative-jumps>), a new-value compare jump takes three cycles to execute instead of one. While this penalty is one cycle longer than in a regular speculative jump, the overall performance is still better than using a regular speculative jump (which must execute an extra packet in all cases). Note New-value compare jump instructions are assigned to instruction class NV, which can execute only in Slot 0. The instruction that assigns the new value must execute in Slot 1, 2, or 3. **New-value compare jump instructions** if ([!]cmp.eq (Rs.new, Rt)) jump:[hint] label if ([!]cmp.gt (Rs.new, Rt)) jump:[hint] label if ([!]cmp.gtu (Rs.new, Rt)) jump:[hint] label if ([!]cmp.gt (Rs, Rt.new)) jump:[hint] label if ([!]cmp.gtu (Rs, Rt.new)) jump:[hint] label if ([!]cmp.eq (Rs.new, #u5)) jump:[hint] label if ([!]cmp.gt (Rs.new, #u5)) jump:[hint] label if ([!]cmp.gtu (Rs.new ,#u5)) jump:[hint] label if ([!]cmp.eq (Rs.new, #-1)) jump:[hint] label if ([!]cmp.gt (Rs.new, #-1)) jump:[hint] label if ([!]tstbit (Rs.new, #0)) jump:[hint] label Copy to clipboard ## Register transfer jumps To reduce code size, the Hexagon processor supports a compound instruction that combines a register transfer with an unconditional jump in a single 32-bit instruction. For example: { jump target // Jump to label “target” R1 = R2 // Assign contents of reg R2 to R1 } Copy to clipboard The source and target register operands in the register transfer are limited to R0 through R7 or R16 through R23 ([Sub-instructions](https://docs.qualcomm.com/doc/80-N2040-60/topic/instruction-encoding.html#v79-tbl-sub-instructions)). The target address in the jump is a scaled 9-bit PC-relative address value (as opposed to the 22-bit value in the regular unconditional jump instruction). A register transfer jump instruction is expressed in assembly source as two independent instructions in a packet. The assembler translates the instructions into a single compound instruction. Register transfer jump instructions | **Syntax** | **Operation** | | --- | --- | | jump label; Rd=Rs | Register transfer jump.

Perform register transfer and branch to address specified by
label. Label is encoded as PC-relative 9-bit signed
immediate value. | | jump label; Rd=#u6 | Register transfer immediate jump.

Perform register transfer (of 6-bit unsigned immediate
value) and branch to address specified by label.

Label is encoded as PC-relative 9-bit signed immediate
value. | ## Dual jumps Two software branch instructions (referred to here as “jumps”) can appear in the same instruction packet, under the conditions listed in [Dual jump instructions](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-tbl-dual-jump-instructions). The first jump is defined as the jump instruction at the lower address, and the second jump as the jump instruction at the higher address. Unlike most packetized operations, dual jumps are not executed in parallel ([packet execution semantics](https://docs.qualcomm.com/doc/80-N2040-60/topic/instructions.html#v79-prm-packet-execution-semantics)). Instead, the two jumps are processed in a well-defined order in a packet: 1. The predicate in the first jump is evaluated. 2. If the first jump is taken, the second jump is ignored. 3. If the first jump is not taken, the second jump is performed. Dual jump instructions | **Instruction** | **Description** | **First jump in packet?** | **Second jump in packet?** | | --- | --- | --- | --- | | jump | Direct jump | No | Yes | | if ([!]Ps[.new]) jump | Conditional jump | Yes | Yes | | call

if ([!]Ps) call | Direct calls | No | Yes | | Pd=cmp.xx ; if ([!]Pd.new) jump | Compare jump | Yes | Yes | | if ([!]cmp.xx(Rs.new, Rt)) jump | New-value compare jump | No | No | | jumpr

if ([!]Ps[.new]) jumpr callr

if ([!]Ps) callr dealloc\_return if ([!]Ps[.new])
dealloc\_return | Indirect jumps Indirect calls dealloc\_return | No | No | | endloopN | Hardware loop end | No | No | Note If a call is ignored in a dual jump, the link register LR is not changed. ## Hint indirect jump target Because it obtains the jump target address from a register, the jumpr instruction ([jumps](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-jumps)) normally causes the processor to stall for one cycle. To avoid the stall penalty caused by a jumpr instruction, the Hexagon processor supports the hintjr jump hint instruction, which can be specified before the jumpr instruction. The hintjr instruction indicates that the program is about to execute a jumpr to the address contained in the specified register. Speculative jump hint instruction | **Syntax** | **Operation** | | --- | --- | | hintjr(Rs) | Informs the processor that the jumpr (Rs) instruction is about
to be performed. | Note To prevent a stall, the hintjr instruction must execute at least 2 packets before the corresponding jumpr instruction. The hintjr instruction is not needed for jumpr instructions used as returns ([returns](https://docs.qualcomm.com/doc/80-N2040-60/topic/program-flow.html#v79-prm-returns)), because in this case the Hexagon processor automatically predicts the jump targets based on the most recent nested call instructions. ## Pauses Pauses suspend the execution of a program for a period of time, and put it into low-power mode. The program remains suspended for the duration specified in the instruction. The pause instruction accepts an unsigned 8-bit immediate operand that specifies the pause duration in terms of cycles. The maximum possible duration is 263 cycles (255+8). Hexagon processor interrupts cause a program to exit the paused state before its specified duration has elapsed. The pause instruction is useful for implementing user-level low-power synchronization operations (such as spin locks). Pause instruction | **Syntax** | **Operation** | | --- | --- | | pause(#u8) | Suspend program in low-power mode for specified cycle duration. | ## Exceptions Exceptions are internally-generated disruptions to the program flow. The Hexagon processor OS handles fatal exceptions by terminating the execution of the application system. The user is responsible for fixing the problem and recompiling their applications. The error messages generated by exceptions include the following information to help locate the problem: - Cause code: Hexadecimal value indicating the type of exception - User IP: PC value indicating the instruction executed when the exception occurred - Bad VA: Virtual address indicating the data accessed when the exception occurred Note The cause code, user IP, and Bad VA values are stored in the Hexagon processor system control registers SSR[7:0], ELR, and BADVA respectively. If multiple exceptions occur simultaneously, the exception with the lowest error code value has the highest exception priority. If a packet contains multiple loads, or a load and a store, and both operations have an exception of any type, all slot 1 exceptions process before any slot 0 exception is processed. v79 exceptions | **Cause code** | **Event type** | **Event description** | **Notes** | | --- | --- | --- | --- | | 0x0 | Reset | Software thread reset. | Non-maskable, highest priority | | 0x01 | Precise, unrecoverable | Unrecoverable BIU error (bus error, timeout, L2 parity error,
and so on). | Non-maskable | | 0x03 | Precise, unrecoverable | Double exception (exception occurs while SSR[EX]=1). | Non-maskable | | 0x11 | Precise | Privilege violation: User/Guest mode execute to page with no
execute permissions (X=0). | Non-maskable | | 0x12 | Precise | Privilege violation: User mode execute to a page with no user
permissions (X=1, U=0). | Non-maskable | | 0x15 | Precise | Invalid packet. | Non-maskable | | 0x16 | Precise | Illegal execution of coprocessor instruction. | Non-maskable | | 0x17 | Precise | Instruction cache error. | Non-maskable | | 0x1A | Precise | Privilege violation: Executing a guest mode instruction in user
mode. | Non-maskable | | 0x1B | Precise | Privilege violation: Executing a supervisor instruction in
User/Guest mode. | Non-maskable | | 0x1D | Precise, unrecoverable | Packet with multiple writes to the same destination register. | Non-maskable | | 0x1E | Precise, unrecoverable | Program counter values that are not properly aligned. | Non-maskable | | 0x20 | Precise | Load to misaligned address. | Non-maskable | | 0x21 | Precise | Store to misaligned address. | Non-maskable | | 0x22 | Precise | Privilege violation: User/Guest mode read to page with no read
permission (R=0). | Non-maskable | | 0x23 | Precise | Privilege violation: User/Guest mode write to page with no write
permissions (W=0). | Non-maskable | | 0x24 | Precise | Privilege violation: User mode read to page with no user
permission (R=1, U=0). | Non-maskable | | 0x25 | Precise | Privilege violation: User mode write to page with no user
permissions (W=1, U=0). | Non-maskable | | 0x26 | Precise | Coprocessor VMEM address error. | Non-maskable | | 0x27 | Precise | Stack overflow: Allocframe instruction exceeded FRAMELIMIT. | Non-maskable | | 0x28 | Precise | DMA error | Non-maskable | | 0x29 | Precise | VTCM window miss | Non-maskable, unrecoverable | | 0x42 | Imprecise | Data abort. | Non-maskable | | 0x43 | Imprecise | NMI | Non-maskable | | 0x44 | Imprecise | Multiple TLB match. | Non-maskable | | 0x45 | Imprecise | Livelock exception. | Non-maskable | | 0x60 | TLB miss-X | Missing fetch address on PC-page. | Non-maskable | | 0x61 | TLB miss-X | Missing fetch on second page from packet that spans pages. | Non-maskable | | 0x62 | TLB miss-X | icinva. | Non-maskable | | | Reserved | | | | 0x70 | TLB miss-RW | Memory read. | Non-maskable | | 0x71 | TLB miss-RW | Memory write. | Non-maskable | | #u8 | Trap0 | Software Trap0 instruction. | Non-maskable | | #u8 | Trap1 | Software Trap1 instruction. | Non-maskable | | 0x72 | Precise | DMA read to page privilege violation. | Non-maskable | | 0x73 | Precise | DMA write to page privilege violation. | Non-maskable | | 0x74 | Precise | DMA user read to page privilege violation. | Non-maskable | | 0x75 | Precise | DMA user write to page privilege violation. | Non-maskable | | 0x76 | TLB miss-RW | TLB miss on DMA memory read. | Non-maskable | | 0x77 | TLB miss-RW | TLB miss on DMA memory write. | Non-maskable | | | Reserved | | | | 0x79 | TLB miss-RW | TLB VTCM window miss. | Non-maskable, unrecoverable | | 0x80 | Debug | Single-step debug exception. | | | | Reserved | | | | 0xBF | Floating-point | Execution of a floating-point instruction resulted in an
exception. | Non-maskable | | 0xC0 | Interrupt0 | General external interrupt. | Maskable, highest priority general interrupt | | 0xC1 | Interrupt 1 | General external interrupt | Maskable | | 0xC2 | Interrupt 2 | General external interrupt | VIC0 interface | | 0xC3 | Interrupt 3 | General external interrupt | VIC1 interface | | 0xC4 | Interrupt 4 | General external interrupt | VIC2 interface | | 0xC5 | Interrupt 5 | General external interrupt | VIC3 interface | | 0xC6 | Interrupt 6 | General external interrupt | | | 0xC7 | Interrupt 7 | General external interrupt | Lowest-priority interrupt | Last Published: Jan 16, 2025 [Previous Topic Software stack](https://docs.qualcomm.com/bundle/publicresource/80-N2040-60/topics/software-stack.md) [Next Topic PMU events](https://docs.qualcomm.com/bundle/publicresource/80-N2040-60/topics/pmu-events.md)