# Qualcomm<sup>®</sup> Hexagon™ V71 # Programmer's Reference Manual 80-N2040-51 Rev. AB August 17, 2023 All Qualcomm products mentioned herein are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm and Hexagon are trademarks or registered trademarks of Qualcomm Incorporated. Other product and brand names may be trademarks or registered trademarks of their respective owners. This technical data may be subject to U.S. and international export, re-export, or transfer ("export") laws. Diversion contrary to U.S. and international law is strictly prohibited. Qualcomm Technologies, Inc. 5775 Morehouse Drive San Diego, CA 92121 U.S.A. # Contents | Figures | 13 | |-------------------------------------|----| | Tables | 14 | | | | | 1 Introduction | 16 | | 1.1 Conventions | 16 | | 1.2 Technical assistance | 16 | | 2 Registers | 17 | | 2.1 Register operands | 17 | | 2.2 General registers | 19 | | 2.3 Control registers | | | 2.3.1 Program counter | 23 | | 2.3.2 Loop registers | 23 | | 2.3.3 User status register | 24 | | 2.3.4 Modifier registers | 26 | | 2.3.5 Predicate registers | 27 | | 2.3.6 Circular start registers | 28 | | 2.3.7 User general pointer register | 28 | | 2.3.8 Global pointer | 28 | | 2.3.9 Cycle count registers | 29 | | 2.3.10 Frame limit register | 29 | | 2.3.11 Frame key register | 29 | | 2.3.12 Packet count registers | 30 | | 2.3.13 Qtimer registers | 30 | | 3 Instructions | 32 | | 3.1 Instruction syntax | 32 | | 3.2 Instruction classes | 35 | | 3.3 Instruction packets | 36 | | 3.3.1 Packet execution semantics | 37 | | 3.3.2 Sequencing semantics | 37 | | 3.3.3 Resource constraints | 38 | | 3.3.4 Grouping constraints | 39 | | 3.3.5 Dependency constraints | 40 | | 3.3.6 Ordering constraints | 40 | | | 3.3.7 Alignment constraints | 41 | |----------|------------------------------------------|----| | 3.4 | Instruction intrinsics | 41 | | 3.5 | Compound instructions | 42 | | 3.6 | Duplex instructions | 42 | | | | | | 4 | Data processing | 43 | | 4.1 | Data types | 43 | | | 4.1.1 Fixed-point data | 43 | | | 4.1.2 Floating-point data | 43 | | | 4.1.3 Complex data | 43 | | | 4.1.4 Vector data | 44 | | 4.2 | Instruction options | 45 | | | 4.2.1 Fractional scaling | 45 | | | 4.2.2 Saturation | 46 | | | 4.2.3 Arithmetic rounding | 46 | | | 4.2.4 Convergent rounding | 46 | | | 4.2.5 Scaling for divide and square-root | 47 | | 4.3 | XTYPE operations | 47 | | | 4.3.1 Floating point | 47 | | | 4.3.2 Multiply | 48 | | | 4.3.3 Shift | 50 | | 4.4 | ALU32 operations | 51 | | 4.5 | Vector operations | 51 | | 4.6 | CR operations | 52 | | 4.7 | Compound operations | 52 | | 4.8 | Special operations | 53 | | | 4.8.1 H.264 CABAC processing | 53 | | | 4.8.1.1 CABAC implementation | 54 | | | 4.8.1.2 Code example | 55 | | | 4.8.2 IP Internet checksum | 56 | | | 4.8.2.1 Code example | 56 | | | 4.8.3 Software-defined radio | 57 | | | 4.8.3.1 Rake despreading | 57 | | | 4.8.3.2 Polynomial operations | 58 | | | | | | <b>5</b> | Memory | 60 | | 5.1 | Memory model for the Hexagon processor | 60 | | | 5.1.1 Address space | 60 | | | 5.1.2 Byte order | | | | 5.1.3 Alignment | 61 | | 5.2 | Memory loads | | | 5.3 | Memory stores | 62 | | 5.4 | Dual stores | .63 | |------|-------------------------------------------------|-----| | 5.5 | Slot 1 store with slot 0 load | .63 | | 5.6 | New-value stores | .63 | | 5.7 | Mem-ops | .64 | | 5.8 | Addressing modes | .64 | | | 5.8.1 Absolute | .65 | | | 5.8.2 Absolute-set | .65 | | | 5.8.3 Absolute with register offset | .65 | | | 5.8.4 Global pointer relative | .66 | | | 5.8.5 Indirect | .66 | | | 5.8.6 Indirect with offset | .67 | | | 5.8.7 Indirect with register offset | .67 | | | 5.8.8 Indirect with autoincrement immediate | .67 | | | 5.8.9 Indirect with autoincrement register | .68 | | | 5.8.10 Circular with autoincrement immediate | .68 | | | 5.8.11 Circular with autoincrement register | .70 | | | 5.8.12 Bit-reversed with autoincrement register | .71 | | 5.9 | Conditional load/stores | .72 | | 5.10 | Cache memory | .73 | | | 5.10.1 Uncached memory | .74 | | | 5.10.2 Tightly coupled memory | .74 | | | 5.10.3 Cache maintenance operations | .74 | | | 5.10.4 L2 cache operations | .75 | | | 5.10.5 Cache line zero | .75 | | | 5.10.6 Cache prefetch | .76 | | 5.1 | 1 Memory ordering | .79 | | 5.1 | 2 Atomic operations | .80 | | | | | | | Conditional execution | | | 6.1 | Scalar predicates | | | | 6.1.1 Generating scalar predicates | | | | 6.1.2 Consuming scalar predicates | | | | 6.1.3 Auto-AND predicates | | | | 6.1.4 Dot-new predicates | .86 | | | 6.1.5 Dependency constraints | | | 6.2 | Vector predicates | .87 | | | 6.2.1 Vector compare | .88 | | | 6.2.2 Vector mux instruction | | | | 6.2.3 Using vector conditionals | .89 | | 6.3 | Predicate operations | .90 | | 7 Software stack | 91 | |------------------------------------|-----| | 7.1 Stack structure | 91 | | 7.2 Stack frames | 92 | | 7.3 Stack protection | 92 | | 7.3.1 Stack bounds checking | 92 | | 7.3.2 Stack smashing protection | 93 | | 7.4 Stack registers | 93 | | 7.5 Stack instructions | 93 | | 8 Program flow | 95 | | 8.1 Conditional instructions | 95 | | 8.2 Hardware loops | 96 | | 8.2.1 Loop setup | 97 | | 8.2.2 Loop end | 98 | | 8.2.3 Loop execution | 99 | | 8.2.4 Pipelined hardware loops | 99 | | 8.2.5 Loop restrictions | 102 | | 8.3 Software branches | 102 | | 8.3.1 Jumps | 103 | | 8.3.2 Calls | 103 | | 8.3.3 Returns | 104 | | 8.3.4 Extended branches | 105 | | 8.3.5 Branches to and from packets | 105 | | 8.4 Speculative jumps | 105 | | 8.5 Compare jumps | 107 | | 8.5.1 New-value compare jumps | 107 | | 8.6 Register transfer jumps | 109 | | 8.7 Dual jumps | 109 | | 8.8 Hint indirect jump target | 110 | | 8.9 Pauses | | | 8.10 Exceptions | 111 | | 9 PMU events | 114 | | 9.1 V71 processor event symbols | 114 | | 10 Instruction encoding | 122 | | 10.1 Instructions | | | 10.2 Sub-instructions | | | 10.3 Duplexes | | | 10.4 Instruction classes | | | 10.5 Instruction packets | | | 10.6 Loop packets | | | 10.7 Immediate values | 130 | |----------------------------------|-----| | 10.8 Scaled immediate values | 131 | | 10.9 Constant extenders | 131 | | 10.10 New-value operands | 135 | | 10.11 Instruction mapping | 135 | | 11 Instruction set | 136 | | 11.1 ALU32 | | | 11.1.1 ALU32 ALU | | | Add | | | Logical operations | | | Negate | | | Nop | | | Subtract | | | Sign extend | | | Transfer immediate | | | Transfer register | | | Vector add halfwords | | | Vector average halfwords | | | Vector subtract halfwords | | | Zero extend | 151 | | 11.1.2 ALU32 PERM | 152 | | Combine words into doubleword | 152 | | Mux | 154 | | Shift word by 16 | 156 | | Pack high and low halfwords | 157 | | 11.1.3 ALU32 PRED | 158 | | Conditional add | 158 | | Conditional shift halfword | 160 | | Conditional combine | 162 | | Conditional logical operations | 163 | | Conditional subtract | 165 | | Conditional sign extend | 166 | | Conditional transfer | 167 | | Conditional zero extend | 168 | | Compare | 169 | | Compare to general register | 171 | | 11.2 CR | 172 | | End loop instructions | 172 | | Corner detection acceleration | 173 | | Logical reductions on predicates | 174 | | Looping instructions | 175 | | Add to PC | 177 | | | Pipelined loop instructions | 178 | | |---------|-------------------------------------------------|-----|-----| | | Logical operations on predicates | 180 | | | | User control register transfer | 182 | | | 11.3 JR | | | 183 | | | Call subroutine from register | 183 | | | | Hint an indirect jump address | 184 | | | | Jump to address from register | 185 | | | 11.4 J | | | 186 | | | Call subroutine | 186 | | | | Compare and jump | 188 | | | | Jump to address | 192 | | | | Jump to address conditioned on new predicate | 193 | | | | Jump to address condition on register value | 194 | | | | Transfer and jump | 196 | | | 11.5 LD | | | 197 | | | Load doubleword | 197 | | | | Load-acquire doubleword | 199 | | | | Load doubleword conditionally | 200 | | | | Load byte | 202 | | | | Load byte conditionally | 204 | | | | Load byte into shifted vector | 206 | | | | Load half into shifted vector | 209 | | | | Load halfword | 212 | | | | Load halfword conditionally | 214 | | | | Memory copy | 216 | | | | Load unsigned byte | 217 | | | | Load unsigned byte conditionally | 219 | | | | Load unsigned halfword | 221 | | | | Load unsigned halfword conditionally | 223 | | | | Load word | 225 | | | | Load-acquire word | 227 | | | | Load word conditionally | 228 | | | | Deallocate stack frame | 230 | | | | Deallocate frame and return | 232 | | | | Load and unpack bytes to halfwords | 234 | | | 11.6 ME | MOP | | 242 | | | Operation on memory byte | 242 | | | | Operation on memory halfword | 244 | | | | Operation on memory word | 245 | | | 11.7 NV | · | | 246 | | 11.7 | .1 NV J | | 246 | | | Jump to address condition on new register value | 246 | | | 11.7 | .2 NV ST | | 250 | | Store new-value byte | 250 | |-----------------------------------------------|-----| | Store new-value byte conditionally | 252 | | Store new-value halfword | 254 | | Store new-value halfword conditionally | 256 | | Store new-value word | 258 | | Store new-value word conditionally | 260 | | 11.8 ST | 262 | | Store doubleword | 262 | | Store-release doubleword | 264 | | Store doubleword conditionally | 265 | | Store byte | 267 | | Store byte conditionally | 269 | | Store halfword | 271 | | Store halfword conditionally | 274 | | Release | 277 | | Store word | 278 | | Store-release word | 280 | | Store word conditionally | 281 | | Allocate stack frame | 283 | | 11.9 SYSTEM | 285 | | 11.9.1 SYSTEM USER | 285 | | Load locked | 285 | | Store conditional | 286 | | Zero a cache line | 287 | | Memory barrier | 288 | | Breakpoint | | | Data cache prefetch | 290 | | Data cache maintenance user operations | 291 | | Send value to DIAG trace | 293 | | Instruction cache maintenance user operations | 294 | | Instruction synchronization | 295 | | L2 cache prefetch | 296 | | Pause | 298 | | Memory thread synchronization | 299 | | Send value to ETM trace | 300 | | Trap | 301 | | 11.10 XTYPE | 302 | | 11.10.1 XTYPE ALU | | | Absolute value doubleword | 302 | | Absolute value word | 303 | | Add and accumulate | 304 | | Add doublewords | 306 | | Add halfword | 308 | | Add or subtract doublewords with carry | 310 | |---------------------------------------------------------------|-----| | Clip to unsigned | 311 | | Logical doublewords | 312 | | Logical-logical doublewords | 313 | | Logical-logical words | 314 | | Maximum words | 316 | | Maximum doublewords | 317 | | Minimum words | 318 | | Minimum doublewords | 319 | | Modulo wrap | 320 | | Negate | 321 | | Round | 322 | | Subtract doublewords | 325 | | Subtract and accumulate words | 326 | | Subtract halfword | 327 | | Sign extend word to doubleword | 329 | | Vector absolute value halfwords | 330 | | Vector absolute value words | 331 | | Vector absolute difference bytes | 332 | | Vector absolute difference halfwords | 333 | | Vector absolute difference words | 334 | | Vector add compare and select maximum bytes | 335 | | Vector add compare and select maximum halfwords | 336 | | Vector add halfwords | 338 | | Vector add halfwords with saturate and pack to unsigned bytes | 339 | | Vector reduce add unsigned bytes | 340 | | Vector reduce add halfwords | 342 | | Vector add bytes | 344 | | Vector add words | 345 | | Vector average halfwords | 346 | | Vector average unsigned bytes | 348 | | Vector average words | 349 | | Vector clip to unsigned | 351 | | Vector conditional negate | 352 | | Vector maximum bytes | 353 | | Vector maximum halfwords | 354 | | Vector reduce maximum halfwords | 355 | | Vector reduce maximum words | 357 | | Vector maximum words | 358 | | Vector minimum bytes | 359 | | Vector minimum halfwords | 360 | | Vector reduce minimum halfwords | 361 | | Vector reduce minimum words | 363 | | Vector minimum words | 365 | | |--------------------------------------------------------------|-----|-----| | Vector sum of absolute differences unsigned bytes | 366 | | | Vector subtract halfwords | 368 | | | Vector subtract bytes | 369 | | | Vector subtract words | 370 | | | 11.10.2 XTYPE BIT | | 371 | | Count leading | 371 | | | Count population | 373 | | | Count trailing | 374 | | | Extract bit field | 375 | | | Insert bit field | 378 | | | Interleave/deinterleave | 380 | | | Linear feedback-shift iteration | 381 | | | Masked parity | 382 | | | Bit reverse | 383 | | | Set/clear/toggle bit | 384 | | | Split bit field | 386 | | | Table index | 388 | | | 11.10.3 XTYPE COMPLEX | | 390 | | Complex add/sub halfwords | 390 | | | Complex add/sub words | 393 | | | Complex multiply | 395 | | | Complex multiply real or imaginary | 398 | | | Complex multiply with round and pack | 400 | | | Complex multiply 32 x 16 | 402 | | | Complex multiply real or imaginary 32-bit | 404 | | | Vector complex multiply real or imaginary | 408 | | | Vector complex conjugate | 411 | | | Vector complex rotate | 412 | | | Vector reduce complex multiply by scalar | 414 | | | Vector reduce complex multiply by scalar with round and pack | 417 | | | Vector reduce complex rotate | 419 | | | 11.10.4 XTYPE FP | | 422 | | Floating point addition | 422 | | | Classify floating-point value | 423 | | | Compare floating-point value | 424 | | | Convert floating-point value to other format | 426 | | | Convert integer to floating-point value | 427 | | | Convert floating-point value to integer | | | | Floating point extreme value assistance | 431 | | | Floating point fused multiply-add | 432 | | | Floating point fused multiply-add with scaling | 433 | | | Floating point reciprocal square root approximation | 434 | | | Floating point fused multiply-add for library routines | 435 | |--------------------------------------------------------|-----| | Create floating-point constant | 437 | | Floating point maximum | 438 | | Floating point minimum | 439 | | Floating point multiply | 440 | | Floating point reciprocal approximation | 441 | | Floating point subtraction | 442 | | 11.10.5 XTYPE MPY | 443 | | Multiply and use lower result | 443 | | Vector multiply word by signed half (32×16) | 446 | | Vector multiply word by unsigned half (32×16) | 450 | | Multiply signed halfwords | 454 | | Multiply unsigned halfwords | 461 | | Polynomial multiply words | 465 | | Vector reduce multiply word by signed half (32×16) | 467 | | Multiply and use upper result | 469 | | Multiply and use full result | 471 | | Vector dual multiply | 473 | | Vector dual multiply with round and pack | 475 | | Vector reduce multiply bytes | 477 | | Vector dual multiply signed by unsigned bytes | 479 | | Vector multiply even halfwords | 481 | | Vector multiply halfwords | 483 | | Vector multiply halfwords with round and pack | 485 | | Vector multiply halfwords, signed by unsigned | 487 | | Vector reduce multiply halfwords | 488 | | Vector multiply bytes | 490 | | Vector polynomial multiply halfwords | 492 | | 11.10.6 XTYPE PERM | 494 | | CABAC decode bin | 494 | | Saturate | 496 | | Swizzle bytes | 498 | | Vector align | 499 | | Vector round and pack | 500 | | Vector saturate and pack | 502 | | Vector saturate without pack | 504 | | Vector shuffle | 506 | | Vector splat bytes | 508 | | Vector splat halfwords | 509 | | Vector splice | 510 | | Vector sign extend | 511 | | Vector truncate | 513 | | Vector zero extend | 515 | | 11.10 | .7 XTYPE PRED | | .517 | |-------|----------------------------------------------------------|-----|------| | | Bounds check | 517 | | | | Compare byte | 518 | | | | Compare half | 520 | | | | Compare doublewords | 522 | | | | Compare bit mask | 523 | | | | Mask generate from predicate | 524 | | | | Check for TLB match | 525 | | | | Predicate transfer | 526 | | | | Test bit | 527 | | | | Vector compare halfwords | 528 | | | | Vector compare bytes for any match | 530 | | | | Vector compare bytes | 531 | | | | Vector compare words | 533 | | | | Viterbi pack even and odd predicate bits | 535 | | | | Vector mux | 536 | | | 11.10 | .8 XTYPE SHIFT | | .537 | | | Mask generate from immediate | 537 | | | | Shift by immediate | 538 | | | | Shift by immediate and accumulate | 540 | | | | Shift by immediate and add | 543 | | | | Shift by immediate and logical | 544 | | | | Shift right by immediate with rounding | 548 | | | | Shift left by immediate with saturation | 550 | | | | Shift by register | 551 | | | | Shift by register and accumulate | 553 | | | | Shift by register and logical | 556 | | | | Shift by register with saturation | 559 | | | | Vector shift halfwords by immediate | 560 | | | | Vector arithmetic shift halfwords with round | 561 | | | | Vector arithmetic shift halfwords with saturate and pack | 562 | | | | Vector shift halfwords by register | 564 | | | | Vector shift words by immediate | 566 | | | | Vector shift words by register | 567 | | | | Vector shift words with truncate and pack | 569 | | 12 # **Figures** | Figure 1-1 | Hexagon V71 processor architecture | 19 | |-------------|--------------------------------------|-----| | Figure 1-2 | Vector instruction example | 23 | | Figure 1-3 | Instruction classes and combinations | | | Figure 1-4 | Register field symbols | 29 | | Figure 2-1 | General registers | | | Figure 2-2 | Control registers | | | Figure 3-1 | Packet grouping combinations | 51 | | Figure 4-1 | Vector byte operation | 58 | | Figure 4-2 | Vector halfword operation | 58 | | Figure 4-3 | Vector word operation | 58 | | Figure 4-4 | 64-bit shift and add/sub/logical | 67 | | Figure 4-5 | Vector halfword shift right | 70 | | Figure 5-1 | Hexagon processor byte order | 80 | | Figure 5-2 | L2fetch instruction | 99 | | Figure 6-1 | Vector byte compare | 110 | | Figure 6-2 | Vector halfword compare | 110 | | Figure 6-3 | Vector mux instruction. | 111 | | Figure 7-1 | Stack structure | 115 | | Figure 10-1 | Instruction packet encoding | 158 | # **Tables** | Table 1-1 | Register symbols | . 28 | |------------|---------------------------------------------------------|------| | Table 1-2 | Register bit field symbols | . 29 | | Table 1-3 | Instruction operands | .30 | | Table 1-4 | Data symbols. | .31 | | Table 2-1 | General register aliases | .34 | | Table 2-2 | General register pairs | .34 | | Table 2-3 | Aliased control registers | .36 | | Table 2-4 | Control register pairs | .37 | | Table 2-5 | Loop registers | .38 | | Table 2-6 | User status register | .39 | | Table 2-7 | Modifier registers (indirect auto-increment addressing) | .41 | | Table 2-8 | Modifier registers (circular addressing) | .41 | | Table 2-9 | Modifier registers (bit-reversed addressing) | .42 | | Table 2-10 | Predicate registers | .42 | | Table 2-11 | Circular start registers | .43 | | Table 2-12 | User general pointer register | .43 | | Table 2-13 | Global pointer register | .43 | | Table 2-14 | Cycle count registers | .44 | | Table 2-15 | Frame limit register | .44 | | Table 2-16 | Frame key register | .45 | | Table 2-17 | Packet count registers | .45 | | Table 2-18 | Qtimer registers | .46 | | Table 3-1 | Instruction symbols | .47 | | Table 3-2 | Instruction classes | .48 | | Table 4-1 | Single-precision multiply options | .65 | | Table 4-2 | Double precision multiply options | .65 | | Table 4-3 | Control register transfer instructions | .71 | | Table 5-1 | Memory alignment restrictions | .81 | | Table 5-2 | Load instructions | .81 | | Table 5-3 | Store instructions | .82 | | Table 5-4 | Mem-ops | .84 | | Table 5-5 | Addressing modes | .84 | | Table 5-6 | Offset ranges (global pointer relative) | .86 | | Table 5-7 | Offset ranges (indirect with offset) | .87 | | Table 5-8 | Increment ranges (indirect with auto-inc immediate) | .88 | | Table 5-9 | Increment ranges (circular with auto-inc immediate) | .89 | | Table 5-10 | Increment ranges (circular with auto-inc register) | .91 | | Table 5-11 | Addressing modes (conditional load/store) | .93 | | Table 5-12 | Conditional offset ranges (indirect with offset) | .94 | | Table 5-13 | Cache instructions (user-level) | .96 | | Table 5-14 | Memory ordering instructions | 100 | | Table 5-15 | Atomic instructions | 101 | |-------------|----------------------------------------------|-----| | Table 6-1 | Scalar predicate-generating instructions | 104 | | Table 6-2 | Vector mux instruction | 111 | | Table 6-3 | Predicate register instructions | 113 | | Table 7-1 | Stack registers | 117 | | Table 7-2 | Stack instructions | 118 | | Table 8-1 | Loop instructions | 121 | | Table 8-2 | Software pipelined loop | 125 | | Table 8-3 | Software pipelined loop (using spNloop0) | 126 | | Table 8-4 | Software branch instructions | 127 | | Table 8-5 | Jump instructions | 128 | | Table 8-6 | Call instructions | 128 | | Table 8-7 | Return instructions | 129 | | Table 8-8 | Speculative jump instructions | 131 | | Table 8-9 | Compare jump instructions | 133 | | Table 8-10 | New-value compare jump instructions | 134 | | Table 8-11 | Register transfer jump instructions | 135 | | Table 8-12 | Dual jump instructions | 135 | | Table 8-13 | Jump hint instruction | 136 | | Table 8-14 | Pause instruction | 137 | | Table 8-15 | V71 exceptions | 138 | | Table 9-1 | V71 processor events symbols | 141 | | Table 10-1 | Instruction fields | 151 | | Table 10-2 | Sub-instructions | 153 | | Table 10-3 | Sub-instruction registers | 154 | | Table 10-4 | Duplex instruction | 155 | | Table 10-5 | Duplex ICLASS field | 155 | | Table 10-6 | Instruction class encoding | 157 | | Table 10-7 | Loop packet encoding | 159 | | Table 10-8 | Scaled immediate encoding (indirect offsets) | 160 | | Table 10-9 | Constant extender encoding | 161 | | Table 10-10 | Constant extender instructions | 162 | | Table 10-11 | Instruction mapping | 165 | | Table 11-1 | Instruction operand symbols | 166 | | Table 11-2 | Instruction behavior symbols | 167 | # 1 Introduction The Qualcomm Hexagon™ processor is a general-purpose digital signal processor designed for high performance and low power across a wide variety of multimedia and modem applications. V71 is a member of the sixth generation of the Hexagon processor architecture. ### 1.1 Conventions Courier new font is used for computer text and code samples, for example, hexagon <function name>(). The following notation is used to define command syntax: - Square brackets enclose optional items, for example, [label]. - **Bold** indicates literal symbols for example, [comment]. - The vertical bar character, |, indicates a choice of items. - Parentheses enclose a choice of items for example, (add|del). - An ellipsis, . . . , follows items that can appear more than once. ### 1.2 Technical assistance For assistance or clarification on information in this document, submit a case to Qualcomm Technologies, Inc. (QTI) at https://createpoint.qti.qualco For assistance or clarification on information in this document, open a technical support case at https://support.qualcomm.com/. You will need to register for a Qualcomm ID account and your company must have support enabled to access our Case system. Other systems and support resources are listed on https://qualcomm.com/support. If you need further assistance, you can send an email to qualcomm.support@qti.qualcomm.com. # 2 Registers The Hexagon processor has two sets of registers: - General registers are for general-purpose computation including address generation and scalar and vector arithmetic. - Control registers support special-purpose processor features such as hardware loops and predicates. # 2.1 Register operands The following notation describes register operands in the syntax and behavior of instructions: ``` Rds[.elst] ``` The ds field indicates the register operand type and bit size (as defined in Table 2-1). Table 2-1 Register symbols | Symbol | Operand type | Size (in bits) | |--------|------------------------|----------------| | d | Destination | 32 | | dd | | 64 | | s | First source | 32 | | ss | | 64 | | t | Second source | 32 | | tt | | 64 | | u | Third source | 32 | | uu | | 64 | | х | Source and destination | 32 | | xx | | 64 | Examples of ds field (describing instruction syntax): Examples of ds field (describing instruction behavior): ``` Rdd = Rss + Rtt // Rdd, Rss, Rtt -> 64-bit registers ``` The optional elst field (short for element size and type) specifies parts of a register when the register is used as a vector. It can specify the following values: - A signed or unsigned byte, halfword, or word within the register (as defined in Figure 2-1) - A bit field within the register (as defined in Table 2-2). Figure 2-1 Register field symbols Table 2-2 Register bit field symbols | Symbol | Meaning | |--------|-----------------------------------------------------------------------------------------------------------| | .sN | Bits [N-1:0] are treated as a N-bit signed number. | | | For example, R0.s16 means that the least significant 16-bits of R0 are treated as a 16-bit signed number. | | .uN | Bits [N-1:0] are treated as a N-bit unsigned number. | | .Н | The most-significant 16 bits of a 32-bit register. | | .L | The least-significant 16 bits of a 32-bit register. | #### Examples of *elst* field: ``` EA = Rt.h[1] // .h[1] -> bit field 31:16 in Rt Pd = (Rss.u64 > Rtt.u64) // .u64 -> unsigned 64-bit value Rd = mpyu(Rs.L,Rt.H) // .L/.H -> low/high 16-bit fields ``` The control and predicate registers use the same notation as the general registers, but are written as Cx and Px (respectively) instead of Rx. ## 2.2 General registers The Hexagon processor has thirty-two 32-bit general-purpose registers (named R0 through R31), which can be accessed either as single registers or as aligned 64-bit register pairs. The general registers contain pointer, scalar, vector, and accumulator data. These registers store operands in virtually all the instructions: - Memory addresses for load/store instructions - Data operands for arithmetic/logic instructions - Vector operands for vector instructions #### For example: ``` R1 = memh(R0) // Load from address R0 R4 = add(R2,R3) // Add R28 = vaddh(R11,R10) // Vector add halfword ``` Figure 2-2 General registers #### Aliased registers Three of the general registers – R29 through R31 – support subroutines (Section 8.3.2) and the software stack (Chapter 7). The subroutine and stack instructions implicitly modify these registers They have symbol aliases that indicate when these registers are accessed as subroutine and stack registers. #### For example: ``` SP = add(SP, \#-8) // SP is alias of R29 allocframe // Modifies SP (R29) and FP (R30) call init // Modifies LR (R31) ``` #### Register pairs The general registers can be specified as register pairs that represent a single 64-bit register. For example: Table 2-3 General register aliases | Register | Alias | Name | Description | |----------|-------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------| | R29 | SP | Stack pointer | Points to the top element of stack in memory. | | R30 | FP | Frame pointer | Points to the current procedure frame on stack. Used by external debuggers to examine the stack and determine call sequence, parameters, local variables, and so on. | | R31 | LR | Link register | Stores return address of a subroutine call. | NOTE: The first register in a register pair must always be odd-numbered, and the second must be the next lower register. Table 2-4 General register pairs | Register | Register pair | |----------|----------------| | R0 | R1:0 | | R1 | | | R2 | R3:2 | | R3 | | | R4 | R5:4 | | R5 | | | R6 | R7:6 | | R7 | | | | | | R24 | R25:24 | | R25 | | | R26 | R27:26 | | R27 | | | R28 | R29:28 | | R29 (SP) | | | R30 (FP) | R31:30 (LR:FP) | | R31 (LR) | | # 2.3 Control registers The Hexagon processor includes a set of 32-bit control registers that provide access to processor features such as the program counter, hardware loops, and vector predicates. Unlike general registers, control registers care instruction operands only in the following cases: - Instructions that require a specific control register as an operand - Register transfer instructions #### For example: NOTE: When a control register is used in a register transfer, the other operand must be a general register. Figure 2-3 Control registers ### **Aliased registers** The control registers have numeric aliases (C0 through C31). Table 2-5 Aliased control registers | Register | Alias | Name | |------------|--------|-------------------------------| | SA0 | C0 | Loop start address register 0 | | LC0 | C1 | Loop count register 0 | | SA1 | C2 | Loop start address register 1 | | LC1 | C3 | Loop count register 1 | | P3:0 | C4 | Predicate registers 3:0 | | reserved | C5 | _ | | MO | C6 | Modifier register 0 | | M1 | C7 | Modifier register 1 | | USR | C8 | User status register | | PC | C9 | Program counter | | UGP | C10 | User general pointer | | GP | C11 | Global pointer | | CS0 | C12 | Circular start register 0 | | CS1 | C13 | Circular start register 1 | | UPCYCLELO | C14 | Cycle count register (low) | | UPCYCLEHI | C15 | Cycle count register (high) | | UPCYCLE | C15:14 | Cycle count register | | FRAMELIMIT | C16 | Frame limit register | | FRAMEKEY | C17 | Frame key register | | PKTCOUNTLO | C18 | Packet count register (low) | | PKTCOUNTHI | C19 | Packet count register (high) | | PKTCOUNT | C19:18 | Packet count register | | reserved | C20-29 | - | | UTIMERLO | C30 | Qtimer register (low) | | UTIMERHI | C31 | Qtimer register (high) | | UTIMER | C31:30 | Qtimer register | NOTE: The control register numbers (0 through 31) specify the control registers in instruction encodings (Chapter 10). #### Register pairs The control registers can be specified as register pairs that represent a single 64-bit register. Control registers specified as pairs must use their numeric aliases. For example: ``` C1:0 = R5:4 // C1:0 specifies the LCO/SAO register pair ``` NOTE: The first register in a control register pair must always be odd-numbered, and the second must be the next lower register. Table 2-6 Control register pairs | Register | Register pair | |----------|---------------| | C0 | C1:0 | | C1 | | | C2 | C3:2 | | C3 | | | C4 | C5:4 | | C5 | | | C6 | C7:6 | | C7 | | | | | | C30 | C31:30 | | C31 | | ## 2.3.1 Program counter The program counter (PC) register points to the next instruction packet to execute. It is modified implicitly by instruction execution, but can be read directly. For example: ``` R7 = PC // Get program counter ``` NOTE: The PC register is read-only: writing to it has no effect. # 2.3.2 Loop registers The Hexagon processor includes two sets of loop registers to support nested hardware loops (Section 8.2). Each hardware loop is implemented with a pair of registers containing the loop count and loop start address. The loop registers are modified implicitly by the loop instruction, but are also accessed directly. For example: ``` loop0(start, R4) // Modifies LC0 and SA0 (LC0=R4, SA0=&start) LC1 = R22 // Set loop1 count R9 = SA1 // Get loop1 start address ``` Table 2-7 Loop registers | Register | Name | Description | |----------|--------------------|---------------------------------------| | LCO, LC1 | Loop count | Number of loop iterations to execute. | | SAO, SA1 | Loop start address | Address of first instruction in loop. | ### 2.3.3 User status register The user status register (USR) stores processor status and control bits that are accessible by user programs. The status bits contain the status results of certain instructions, while the control bits contain user-settable processor modes for hardware prefetching. For example: USR stores the following status and control values: - Cache prefetch enable (Section 5.10.6) - Cache prefetch status (Section 5.10.6) - Floating point modes (Section 4.3.1) - Floating point status (Section 4.3.1) - Hardware loop configuration (Section 8.2) - Sticky saturation overflow (Section 4.2.2) NOTE: A user control register transfer to USR cannot be grouped in an instruction packet with a floating point instruction (Section 4.3.1). Whenever a transfer to USR changes the enable trap bits [29:25], an isync instruction (Section 5.11) must execute before the new exception programming can take effect. Table 2-8 User status register | RW | Bits | Field | Description | |----|------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | 32 | | User status register | | R | 31 | PFA | L2 prefetch active. | | | | | 1: I2fetch instruction in progress 0: I2fetch finished (or inactive) | | | | | Set when nonblocking l2fetch instruction is prefetching requested data. | | | | | Remains set until l2fetch prefetch operation completes (or inactive). | | R | 30 | reserved | Return 0 if read. | | | | | Reserved for future expansion. To remain compatible with future processor versions, software should always write this field with the same value read from the field. | | RW | 29 | FPINEE | Enable trap on IEEE inexact. | | RW | 28 | FPUNFE | Enable trap on IEEE underflow. | | RW | 27 | FPOVFE | Enable trap on IEEE overflow. | Table 2-8 User status register (cont.) | RW | Bits | Field | Description | | |----|-------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | RW | 26 | FPDBZE | Enable trap on IEEE divide-by-zero. | | | RW | 25 | FPINVE | Enable trap on IEEE invalid. | | | R | 24 | reserved | Reserved | | | RW | 23:22 | FPRND | Rounding mode for floating-point instructions. | | | | | | 00: Round to nearest, ties to even (default) 01: Toward zero 10: Downward (toward negative infinity) 11: Upward (toward positive infinity) | | | R | 21:20 | reserved | Return 0 if read. | | | | | | Reserved for future expansion. To remain compatible with future processor versions, software should always write this field with the same value read from the field. | | | R | 19:18 | reserved | Reserved | | | R | 17 | reserved | Return 0 if read. | | | | | | Reserved for future expansion. To remain compatible with future processor versions, software should always write this field with the same value read from the field. | | | RW | 16:15 | HFI | L1 instruction prefetch. | | | | | | 00: Disable 01: Enable (1 line) 10: Enable (2 lines) | | | RW | 14:13 | HFD | L1 data cache prefetch. | | | | | | Four levels are defined from disabled to aggressive. It is implementation-defined how these levels should be interpreted. 00: disable | | | | | | 01: conservative<br>10: moderate | | | | | | 11: aggressive | | | RW | 12 | PCMME | Enable packet counting in Monitor mode. | | | RW | 11 | PCGME | Enable packet counting in Guest mode. | | | RW | 10 | PCUME | Enable packet counting in User mode. | | | RW | 9:8 | LPCFGE | Hardware loop configuration. | | | | | | Number of loop iterations (0-3) remaining before pipeline predicate should be set. | | | R | 7:6 | reserved | Return 0 if read. | | | | | | Reserved for future expansion. To remain compatible with future processor versions, software should always write this field with the same value read from the field. | | | RW | 5 | FPINPF | Floating-point IEEE inexact sticky flag. | | | RW | 4 | FPUNFF | Floating-point IEEE underflow sticky flag. | | | RW | 3 | FPOVFF | Floating-point IEEE overflow sticky flag. | | | RW | 2 | FPDBZF | Floating-point IEEE divide-by-zero sticky flag. | | Table 2-8 User status register (cont.) | RW | Bits | Field | Description | |----|------|--------|--------------------------------------------------------------------------------------------| | RW | 1 | FPINVF | Floating-point IEEE invalid sticky flag. | | RW | 0 | OVF | Sticky saturation overflow. | | | | | Saturation occurred No saturation | | | | | Set when saturation occurs while executing instruction that specifies optional saturation. | | | | | Remains set until explicitly cleared by a USR = Rs instruction. | ## 2.3.4 Modifier registers The following addressing modes use modifier registers (M0 and M1). #### Indirect autoincrement register addressing In indirect autoincrement register addressing (Section 5.8.9), the modifier registers store a signed 32-bit value that specifies the increment (or decrement) value. For example: ``` M1 = R0 // Set modifier register R3 = memw(R2++M1) // Load word ``` Table 2-9 Modifier registers used in indirect auto-increment addressing | Register | Name | Description | |----------|-----------|------------------------------| | M0, M1 | Increment | Signed auto-increment value. | #### Circular addressing In circular addressing (Section 5.8.10) the modifier registers store the circular buffer length and related "I" values. For example: Table 2-10 Modifier registers as used in circular addressing | Name | RW | Bits | Field | Description | |--------|----|-------|---------|------------------------------------| | M0, M1 | | 32 | | Circular buffer specifier. | | | RW | 31:28 | I[10:7] | I value (MSB - see Section 5.8.11) | | | RW | 27:24 | | 0x0 | | | RW | 23:17 | I[6:0] | I value (LSB) | | | RW | 16:0 | Length | Circular buffer length | #### Bit-reversed addressing In bit-reversed addressing (Section 5.8.12) the modifier registers store a signed 32-bit value that specifies the increment (or decrement) value. For example: Table 2-11 Modifier registers as used in bit-reversed addressing | Register | Name | Description | |----------|-----------|-----------------------------| | M0, M1 | Increment | Signed autoincrement value. | ### 2.3.5 Predicate registers The predicate registers (P0 through P3) store the status results of the scalar and vector compare instructions (Chapter 6). For example: ``` P1 = cmp.eq(R2, R3) // Scalar compare if (P1) jump end // Jump to address (conditional) R8 = P1 // Get compare status (P1 only) P3:0 = R4 // Set compare status (P0-P3) ``` The four predicate registers can be specified as a register quadruple (P3:0), which represents a single 32-bit register. **Table 2-12 Predicate registers** | Register | Bits | Description | |----------------|-------|-------------------------| | PO, P1, P2, P3 | 8 | Compare status results. | | P3:0 | 32 | Compare status results. | | | 31:24 | P3 register | | | 23:16 | P2 register | | | 15:8 | P1 register | | | 7:0 | P0 register | NOTE: Unlike the other control registers, the predicate registers are only eight bits wide because vector compares return a maximum of eight status results. ### 2.3.6 Circular start registers The circular start registers (CS0 through CS1) store the start address of a circular buffer in circular addressing (Section 5.8.10). For example: Table 2-13 Circular start registers | Register | Name | Description | |----------|----------------|--------------------------------| | CSO, CS1 | Circular start | Circular buffer start address. | ## 2.3.7 User general pointer register The user general pointer (UGP) register is a general-purpose control register. For example: ``` R9 = UGP // Get UGP UGP = R3 // Set UGP ``` NOTE: UGP typically stores the address of thread local storage. Table 2-14 User general pointer register | Register | Name | Description | |----------|----------------------|-----------------------------------| | UGP | User general pointer | General-purpose control register. | # 2.3.8 Global pointer The global pointer (GP) is used in GP-relative addressing. For example: ``` GP = R7 // Set GP R2 = memw(GP+#200) // GP-relative load ``` Table 2-15 Global pointer register | Name | RW | Bits | Field | Description | |------|----|------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | GP | | 32 | | Global pointer register | | | RW | 31:6 | GDP | Global data pointer (Section 5.8.4). | | | R | 5:0 | reserved | Return 0 if read. | | | | | | Reserved for future expansion. To remain forward-compatible with future processor versions, software should always write this field with the same value read from the field. | ### 2.3.9 Cycle count registers The cycle count registers (UPCYCLELO through UPCYCLEHI) store a 64-bit value containing the number of executed processor cycles since the Hexagon processor was last reset. For example: ``` R5 = UPCYCLEHI // Get cycle count (high) R4 = UPCYCLELO // Get cycle count (low) R5:4 = UPCYCLE // Get cycle count ``` NOTE: The RTOS must grant permission to access these registers. Without this permission, reading these registers from user code always returns zero. Table 2-16 Cycle count registers | Register | Name | Description | |-----------|--------------------|--------------------------------------| | UPCYCLELO | Cycle count (low) | Processor cycle count (low 32 bits) | | UPCYCLEHI | Cycle count (high) | Processor cycle count (high 32 bits) | | UPCYCLE | Cycle count | Processor cycle count (64 bits) | ## 2.3.10 Frame limit register The frame limit register (FRAMELIMIT) stores the low address of the memory area reserved for the software stack (Section 7.3.1). For example: Table 2-17 Frame limit register | Register | Name | Description | |------------|-------------|-------------------------------------| | FRAMELIMIT | Frame limit | Low address of software stack area. | # 2.3.11 Frame key register The frame key register (FRAMEKEY) stores the key value that XOR-scrambles return addresses when they are stored on the software stack (Section 7.3.2). For example: ``` R2 = FRAMEKEY // Get frame key register FRAMEKEY = R1 // Set frame key register ``` Table 2-18 Frame key register | Register | Name | Description | |----------|-----------|------------------------------------------------------------| | FRAMEKEY | Frame key | Key to scramble return addresses stored on software stack. | ### 2.3.12 Packet count registers The packet count registers (PKTCOUNTLO through PKTCOUNTHI) store a 64-bit value containing the current number of instruction packets executed since a PKTCOUNT register was last written to. For example: ``` R9 = PKTCOUNTHI // Get packet count (high) R8 = PKTCOUNTLO // Get packet count (low) R9:8 = PKTCOUNT // Get packet count ``` Packet counting can be configured to operate only in specific sets of processor modes (for example, User mode only, or Guest and Monitor modes only). Bits [12:10] in the user status register control the configuration for each mode (Section 2.3.3). Packets with exceptions are not counted as committed packets. NOTE: Each hardware thread has its own set of packet count registers. The RTOS must grant permission to access these registers. Without this permission, reading these registers from user code always returns zero. When a value is written to a PKTCOUNT register, the 64-bit packet count value is incremented before the value is stored in the register. Table 2-19 Packet count registers | Register | Name | Description | |------------|---------------------|---------------------------------------| | PKTCOUNTLO | Packet count (low) | Processor packet count (low 32 bits) | | PKTCOUNTHI | Packet count (high) | Processor packet count (high 32 bits) | | PKTCOUNT | Cycle count | Processor packet count (64 bits) | # 2.3.13 Qtimer registers The Qtimer registers (UTIMERLO through UTIMERHI) provide access to the Qtimer global reference count value. They enable Hexagon software to read the 64-bit time value without having to perform an expensive AHB load. For example: ``` R5 = UTIMERHI // Get Qtimer reference count (high) R4 = UTIMERLO // Get Qtimer reference count (low) R5:4 = UTIMER // Get Qtimer reference count ``` These registers are read-only – they are automatically updated by hardware to always contain the current Qtimer value. NOTE: The RTOS must grant permission to access these registers. Without this permission, reading these registers from user code always returns zero. Table 2-20 Qtimer registers | Register | Name | Description | |----------|---------------|----------------------------------------------| | UTIMERLO | Qtimer (low) | Qtimer global reference count (low 32 bits) | | UTIMERHI | Qtimer (high) | Qtimer global reference count (high 32 bits) | | UTIMER | Qtimer | Qtimer global reference count (64 bits) | # 3 Instructions Instructions can be grouped into very long instruction word (VLIW) packets for parallel execution, with each packet containing from one to four instructions. Vector instructions operate on single instruction multiple data (SIMD) vectors. Instruction encoding is described in Chapter 10. For detailed descriptions of the Hexagon processor instructions see Chapter 11. # 3.1 Instruction syntax Most Hexagon processor instructions have the following syntax: ``` dest = instr_name(source1, source2,...)[:option1][:option2]... ``` The item specified on the left-hand side (LHS) of the equation is assigned the value specified by the right-hand side (RHS). For example: ``` R2 = add(R3,R1) // Add R3 and R1, assign result to R2 ``` | Symbol | Meaning | Example | Min | Max | |--------|---------------------------------------------------------------------------|-----------------------------|------------------------|---------------------| | = | Assignment of RHS to LHS | R2 = R3; | _ | _ | | , | Marks the end of an instruction or group of instructions | R2 = R3; | - | _ | | { } | Instruction packet delimiter; indicates a group of parallel instructions. | {R2 = R3; R5 = R6;} | _ | _ | | ( ) | Source list delimiter | R2 = memw(R0 + #100) | | | | 0x | Indicates hexadecimal number | R2 = #0x1fe; | _ | _ | | MEMxx | Access memory. xx specifies the size and type of access. | R2 = MEMxx(R3) | - | _ | | :rnd | Perform optional rounding | R2 = mpy(r1.h,<br>r2.h):rnd | - | _ | | :<<16 | Shift left by a halfword | R2 = add(r1.1,r2.1): << 16 | - | _ | | # | Immediate constant value | #100 | _ | _ | | #uN | Unsigned N-bit immediate value | R2 = #u16 | 0 | 2 <sup>N</sup> -1 | | #sN | Signed N-bit immediate value | R2 = add(R3, #s16) | -2 <sup>N-1</sup> | 2 <sup>N-1</sup> -1 | | #mN | Signed N-bit immediate value | Rd = mpyi(Rs, #m9) | -(2 <sup>N-1</sup> -1) | 2 <sup>N-1</sup> -1 | | Symbol | Meaning | Example | Min | Max | |-------------------|-----------------------------------------------------------------------------------------------------|--------------------------------------------------|---------------------------------------|----------------------------------------| | #uN:S | Unsigned N-bit immediate value representing integral multiples of 2 <sup>S</sup> in specified range | R2 = memh(#u16:1) | 0 | $(2^{N}-1)\times 2^{S}$ | | #sN:S | Signed N-bit immediate value representing integral multiples of 2 <sup>S</sup> in specified range | Rd = memw(Rs++#s4:2) | (-2 <sup>N-1</sup> ) × 2 <sup>S</sup> | (2 <sup>N-1</sup> -1) × 2 <sup>S</sup> | | #rN:S | Same as #sN:S, but value is offset from PC of current packet | call #r22:2 | $(-2^{N-1}) \times 2^S$ | (2 <sup>N-1</sup> -1) × 2 <sup>S</sup> | | ## | 32-bit immediate constant value. Same as #, but associated value (u, s, m, r) is 32 bits | ##2147483647 | - | _ | | usat <sub>N</sub> | Saturate value to unsigned N-bit number | usat <sub>16</sub> (Rs) | 0 | 2 <sup>N</sup> -1 | | sat <sub>N</sub> | Saturate value to signed N-bit number | sat <sub>16</sub> (Rs) | -2 <sup>N-1</sup> | 2 <sup>N-1</sup> -1 | | sxt x->y | Sign-extend value from x to y bits | sxt32->64(Rs) | _ | _ | | zxt x->y | Zero-extend value from x to y bits | zxt32->64(Rs) | _ | _ | | >>> | Logical right shift | Rss >>> offset | _ | _ | | :endloopX | Loop end X specifies loop instruction (0 or 1) | :endloop0 | | | | :t | Direction hint (jump taken) | <pre>if (P0.new) jump:t target</pre> | | | | :nt | Direction hint (jump not taken) | <pre>if (!P1.new) jump:nt target</pre> | | | | :carry | Predicate used as carry input and output | R5:4 = add(R1:0,<br>R3:2, P1):carry | | | | :<<16 | Shift result left by halfword | R2 = add(R1.L,<br>R2.L): << 16 | | | | :mem_noshuf | Inhibit load/store reordering (Section 5.5) | {memw(R5) = R2;<br>R3 = memh(R6)}<br>:mem_noshuf | | | #uN, #sN, and #mN specify immediate operands in instructions. The # symbol appears in the actual instruction to indicate the immediate operand. #rN specifies loop and branch destinations in instructions. In this case, the # symbol does not appear in the actual instruction; instead, the entire #rN symbol (including its :S suffix) is expressed as a loop or branch symbol whose numeric value is determined by the assembler and linker. For example: ``` call my_proc // Instruction example ``` The :S suffix indicates that the S least-significant bits in a value are implied zero bits and therefore not encoded in the instruction. The implied zero bits are called scale bits. For example, #s4:2 denotes a signed immediate operand represented by four bits encoded in the instruction, and two scale bits. The possible values for this operand are -32, -28, -24, -20, -16, -12, -8, -4, 0, 4, 8, 12, 16, 20, 24, and 28. ## specifies a 32-bit immediate operand in an instruction (including a loop or branch destination). The ## symbol appears in the actual instruction to indicate the operand. Examples of operand symbols: When an instruction contains more than one immediate operand, the operand symbols are specified in upper and lower case (for example, #uNi and #uN) to indicate where they appear in the instruction encodings. Table 3-1 Data symbols in Hexagon processor instruction names for supported data types | Size | Symbol | Туре | |--------|--------|--------------------| | 8-bit | В | Byte | | 8-bit | UB | Unsigned byte | | 16-bit | Н | Half word | | 16-bit | UH | Unsigned half word | | 32-bit | W | Word | | 32-bit | UW | Unsigned word | | 64-bit | D | Double word | ### 3.2 Instruction classes The Hexagon processor instructions are assigned to specific instruction classes. Classes determine what combinations of instructions can be written in parallel. Instruction classes logically correspond with instruction types. For instance, the ALU32 class contains ALU instructions that operate on 32-bit operands. Table 3-2 Instruction classes and subclasses | Class | Subclass | Description | |------------------------------------------------|---------------|-------------------------------------------------------------| | ALU32: 32-bit ALU operations | ALU32 ALU | Arithmetic and logical | | | ALU32 PERM | Permute | | | ALU32 PRED | Predicate operations | | CR: Control register access, loops | <u>'</u> | | | JR: Jump from register (register indirect addr | ressing mode) | | | J: Jumps (PC-relative addressing mode) | | | | LD: Memory load operations | | | | MEMOP: Memory operations | | | | NV: New-value operations | NV J | New-value jumps | | | NV ST | New-value stores | | ST: Memory store operations; allocate stack | frame | | | SYSTEM: Operating system access | SYSTEM USER | Application-level access | | XTYPE: 32-bit and 64-bit operations | ALU32 ALU | 64-bit arithmetic and logical operations | | | XTYPE BIT | Bit operations | | | XTYPE COMPLEX | Complex math (using real and imaginary numbers) | | | XTYPE FP | Floating point operations | | | XTYPE MPY | Multiply operations | | | XTYPE PERM | Vector permute and format conversion (pack, splat, swizzle) | | | XTYPE PRED | Predicate operations | | | XTYPE SHIFT | Shift operations | | | | (with optional ALU operations) | # 3.3 Instruction packets Instructions can be grouped into packets of independent instructions for parallel execution, with a packet containing from one to four instructions. Packets of varying length can be freely mixed in a program. Instruction packets must be explicitly specified in software. They are expressed in assembly language by enclosing groups of instructions in curly braces. Brace characters delimit the start and end of an instruction packet. For example: ``` { R8 = memh(R3++#2) R12 = memw(R1++#4) R = mpy(R10,R6):<<1:sat R7 = add(R9,#2) }</pre> ``` Packets have restrictions on the allowable instruction combinations. The primary restriction is determined by the instruction class of the instructions in a packet. Rules and restrictions exist on what types of instructions can be grouped together, and in what order they can appear in the packet. In particular, packet formation is subject to the following constraints: - Resource constraints determine how many instructions of a specific type can appear in a packet. The Hexagon processor has a fixed number of execution units: each instruction executes on a particular type of unit, and each unit can process at most one instruction at a time. For example, because the Hexagon processor contains only two load units, an instruction packet with three load instructions is invalid. - Grouping constraints are a small set of rules that apply above and beyond the resource constraints. - Dependency constraints ensure that no write-after-write hazards exist in a packet. - Ordering constraints dictate the ordering of instructions within a packet. - Alignment constraints dictate the placement of packets in memory. NOTE: The Hexagon processor executes individual instructions (which are not explicitly grouped in packets) as packets containing a single instruction. ### 3.3.1 Packet execution semantics Packets are defined to have parallel execution semantics. The execution behavior of a packet is defined as follows: - First, instructions in the packet read their source registers in parallel. - Next, instructions in the packet execute. - Finally, instructions in the packet write their destination registers in parallel. For example, consider the following packet: ``` \{ R2 = R3; R3 = R2; \} ``` In the first phase, registers R3 and R2 are read from the register file. Aafter execution, R2 is written with the old value of R3 and R3 is written with the old value of R2. The result of this packet is that the values of R2 and R3 are swapped. NOTE: Dual stores, Dual jumps, New-value stores, New-value compare jumps, and Dot-new predicates have non-parallel execution semantics. ## 3.3.2 Sequencing semantics Packets of any length can freely mix in code. A packet is considered an atomic unit: in essence, a single large instruction. From the program perspective, a packet either executes to completion or not at all; it never executes only partially. For example, if a packet causes a memory exception, the exception point is established before the packet. A packet containing multiple load/store instructions can require service from the external system. For instance, consider a packet that performs two load operations that both miss in the cache. The packet requires the memory system to supply the data: - From the memory system perspective the two resulting load requests process serially. - From the program perspective, however, both load operations must complete before the packet can complete. Thus, the packet is atomic from the program perspective. Packets have a single PC address, which is the address of the start of the packet. Branches cannot be performed into the middle of a packet. Architecturally, packets execute to completion – including updating all registers and memory – before the next packet begins. As a result, application programs are not exposed to any pipeline artifacts. ### 3.3.3 Resource constraints A packet cannot use more hardware resources than are physically available on the processor. For instance, because the Hexagon processor has only two load units, a packet with three load instructions is invalid. The behavior of such a packet is undefined. The assembler automatically rejects packets that oversubscribe the hardware resources. The processor supports up to four parallel instructions. The instructions execute in four parallel pipelines, which are referred to as slots. The four slots are named Slot 0, Slot 1, Slot 2, and Slot 3. **NOTE:** endloopN instructions (Section 8.2.2) do not use any slots. Each instruction belongs to specific Instruction classes. For example, jumps belong to instruction class J, while loads belong to instruction class LD. The class of an instruction determines which slot it can execute in. Figure 3-1 shows which instruction classes can be assigned to each of the four slots. | Slot 0 | Slot 1 | Slot 2 | Slot 3 | |---------------------|---------------------|--------------------|--------------------| | LD instructions | LD instructions | XTYPE instructions | XTYPE instructions | | ST instructions | ST instructions | ALU32 instructions | ALU32 instructions | | ALU32 instructions | ALU32 instructions | J instructions | J instructions | | MEMOP instructions | Some J instructions | JR instructions | CR instructions | | NV instructions | | Some system | | | System instructions | | instructions | | | Some J instructions | | | | #### XTYPE instructions (32/64-bit) Arithmetic, logical, bit manipulation Multiply (integer, fractional, complex) Floating-point operations Permute/vector permute operations Predicate operations Shift / shift with add/sub/logical Vector byte ALU Vector halfword (ALU, shift, multiply) Vector word (ALU, shift) #### **ALU32 instructions** Arithmetic/logical (32- bit) Vector halfword #### **CR** instructions Control register transfers Hardware loop setup Predicate logicals and reductions #### **NV** instructions New-value jumps New-value stores **J** instructions Jump/call PC-relative #### JR instructions Jump/call register #### **LD** instructions Loads (8/16/32/64-bit) Deallocframe #### **ST instructions** Stores (8/16/32/64-bit) Allocframe #### **MEMOP** instructions Operation on memory (8/16/32-bit) ### **SYSTEM instructions** Prefetch Cache maintenance Bus operations Figure 3-1 Packet grouping combinations ### 3.3.4 Grouping constraints A small number of restrictions determines what constitutes a valid packet. The assembler ensures that packets follow valid grouping rules. If a packet executes that violates a grouping rule, the behavior is undefined. The following rules must be followed: - Dot-new conditional instructions (Section 6.1.4) must be grouped in a packet with an instruction that generates dot-new predicates. - ST-class instructions can be placed in Slot 1. In this case Slot 0 normally must contain a second ST-class instruction (Section 5.4). - J-class instructions can be placed in Slots 2 or 3. However, only certain combinations of program flow instructions (J or JR) can be grouped together in a packet (Section 8.7). Otherwise, at most one program flow instruction is allowed in a packet. Some jump and compare-jump instructions can execute on slots 0 or 1, excluding calls, such as the following: - Instructions of the form Pd = cmp.xx(); if(Pd.new)jump:hint <target> - Instructions of the form If (Pd[.new]) jump[:hint] <target> - □ The jump<target> instruction - JR-class instructions can be placed in Slot 2. However, when encoded in a duplex, jumpr R31 can be placed in Slot 0 (Section 10.3). - Restrictions exist that limit the instructions that can appear in a packet at the set up or end of a hardware loop (Section 8.2.4). - A user control register transfer to the control register USR cannot be grouped with a floating point instruction (Section 2.3.3). - The SYSTEM-class instructions include prefetch, cache operations, bus operations, load locked, and store conditional instructions (Section 5.10). These instructions have the following grouping rules: - □ brkpt, trap, pause, icinva, isync, and syncht are solo instructions. They must not be grouped with other instructions in a packet. - memw\_locked, memd\_locked, 12fetch, and trace must execute on Slot 0. They must be grouped only with ALU32 or (non-FP) XTYPE instructions. - dccleana, dcinva, dccleaninva, and dczeroa must execute on Slot 0. Slot 1 must be empty or an ALU32 instruction. ### 3.3.5 Dependency constraints Instructions in a packet cannot write to the same destination register. The assembler automatically flags such packets as invalid. If the processor executes a packet with two writes to the same general register, an error exception is raised. If the processor executes a packet that performs multiple writes to the same predicate or control register, the behavior is undefined. Three special cases exist for this rule: - Conditional writes are allowed to target the same destination register only if at most one of the writes is actually performed (Section 6.1.5). - The overflow flag in the status register has defined behavior when multiple instructions write to it. Do not group instructions that write to the entire User status register (for example, USR=R2) in a packet with any instruction that writes to a bit in the user status register. - Multiple compare instructions are allowed to target the same predicate register to perform a logical AND of the results (Section 6.1.3). ## 3.3.6 Ordering constraints In assembly code, instructions can appear in a packet in any order (with the exception of Dual jumps. The assembler automatically encodes instructions in the packet in the proper order. In the binary encoding of a packet, the instructions must be ordered from Slot 3 down to Slot 0. If the packet contains less than four instructions, any unused slot is skipped – a NOP is unnecessary as the hardware handles the proper spacing of the instructions. In memory, instructions in a packet must appear in decreasing slot order. Additionally, if an instruction can go in a higher-numbered slot, and that slot is empty, it must be moved into the higher-numbered slot. For example, if a packet contains three instructions and Slot 1 is not used, the instructions should be encoded in the packet as follows: - Slot 3 instruction at lowest address - Slot 2 instruction follows Slot 3 instruction - Slot 0 instructions at the last (highest) address If a packet contains a single load or store instruction, that instruction must go in Slot 0, which is the highest address. As an example, a packet containing both LD and ALU32 instructions must be ordered so the LD is in Slot 0 and the ALU32 in another slot. ### 3.3.7 Alignment constraints Packets have the following constraints on their placement or alignment in memory: - Packets must be word-aligned (32-bit). If the processor executes an improperly aligned packet, it raises an error exception (Section 8.10). - Packets should not wrap the 4 GB address space. If address wraparound occurs, the processor behavior is undefined. No other core-based restrictions exist for code placement or alignment. If the processor branches to a packet that crosses a 16-byte address boundary, the resulting instruction fetch stalls for one cycle. Packets that are jump targets or loop body entries can be explicitly aligned to ensure this does not occur (Section 8.3.5). ### 3.4 Instruction intrinsics "Rdd = vminh(Rtt, Rss)": To support efficient coding of the time-critical sections of a program (without resorting to assembly language), the C compilers support intrinsics that directly express Hexagon processor instructions from within C code. The following example shows how to use an instruction intrinsic to express the XTYPE instruction ``` #include <hexagon_protos.h> int main() { long long v1 = 0xFFFF0000FFFF0000LL; long long v2 = 0x0000FFFF0000FFFFLL; long long result; // Find the minimum for each half-word in 64-bit vector result = Q6_P_vminh_PP(v1,v2); } ``` Intrinsics are provided for instructions in the following classes: - ALU32 - XTYPE - CR (predicate operations only) - SYSTEM (dcfetch only) For more information on intrinsics see Chapter 11. # 3.5 Compound instructions The Hexagon processor supports compound instructions, which encode pairs of commonly-used operations in a single instruction. For example, each of the following is a single compound instruction: ``` dealloc return // Deallocate frame and return R2 \&= and(R1, R0) // And and and // Subtract and add R7 = add(R4, sub(#15, R3)) R3 = sub(#20, asl(R3, #16)) // Shift and subtract R5 = add(R2, mpyi(#8, R4)) // Multiply and add // Compare and jump P0 = cmp.eq (R2, R5) if (P0.new) jump:nt target // Register transfer and jump R2 = #15 jump target ``` Using compound instructions reduces code size and improves code performance. NOTE: Compound instructions (with the exception of X-and-jump, as shown above) have distinct assembly syntax from the instructions they are composed of. # 3.6 Duplex instructions To reduce code size the Hexagon processor supports duplex instructions, which encode pairs of commonly-used instructions in a 32-bit instruction container. Unlike Compound instructions, duplex instructions do not have distinctive syntax – in assembly code they appear identical to the instructions they are composed of. The assembler is responsible for recognizing when a pair of instructions can be encoded as a single duplex rather than a pair of regular instruction words. To fit two instructions into a single 32-bit word, duplexes are limited to a subset of the most common instructions (load, store, branch, ALU), and the most common register operands. For more information on duplexes, see Section 10.2 and Section 10.3. # 4 Data processing The Hexagon processor provides a rich set of operations for processing scalar and vector data. This chapter presents an overview of the operations provided by the following Hexagon processor instruction classes: - XTYPE General-purpose data operations - ALU32 Arithmetic/logical operations on 32-bit data NOTE: For detailed descriptions of these instruction classes see Chapter 11. # 4.1 Data types The Hexagon processor provides operations for processing the following data types. # 4.1.1 Fixed-point data The Hexagon processor provides operations to process 8-, 16-, 32-, or 64-bit fixed-point data. The data is either integer or fractional, and in signed or unsigned format. # 4.1.2 Floating-point data The Hexagon processor provides operations to process 32-bit floating-point numbers. The numbers are stored in IEEE single-precision floating-point format. Per the IEEE standard, certain floating-point values are defined to represent positive or negative infinity, as well as Not-a-Number (NaN), which represents values that have no mathematical meaning. Floating-point numbers can be held in a general register. # 4.1.3 Complex data The Hexagon processor provides operations to process 32- or 64-bit complex data. Complex numbers include a signed real portion and a signed imaginary portion. Given two complex numbers (a + bi) and (c + di), the complex multiply operations computes both the real portion (ac - bd) and the imaginary portion (ad + bc) in a single instruction. Complex numbers can be packed in a general register or register pair. When packed, the imaginary portion occupies the most-significant portion of the register or register pair. ### 4.1.4 Vector data The Hexagon processor provides operations to process 64-bit vector data. Vector data types pack multiple data items – bytes, halfwords, or words – into 64-bit registers. Vector data operations are common in video and image processing. Eight 8-bit bytes can be packed into a 64-bit register. NOTE: Certain vector operations support automatic scaling, saturation, and rounding. For example, the following instruction performs a vector operation: ``` R1:0 += vrmpyh(R3:2,R5:4) ``` It is defined to perform the following operations in one cycle: ``` R1:0 += ((R2.L * R4.L) + (R2.H * R4.H) + (R3.L * R5.L) + (R3.H * R5.H) ``` Figure 4-1 Vector byte operation Four 16-bit halfword values can be packed in a single 64-bit register pair. Figure 4-2 Vector halfword operation Two 32-bit word values can be packed in a single 64-bit register pair. Figure 4-3 Vector word operation # 4.2 Instruction options Some instructions support optional scaling, saturation, and rounding. There are no mode bits controlling these options – instead, they are explicitly specified as part of the instruction name. The options are described in this section. # 4.2.1 Fractional scaling In fractional data format, data is treated as fixed-point fractional values whose range is determined by the word length and radix point position. Fractional scaling is specified in an instruction by adding the :<<1 specifier. For example: ``` R3:2 = cmpy(R0,R1): << 1:sat ``` When two fractional numbers are multiplied, the product must be scaled to restore the original fractional data format. The Hexagon processor allows specification of the fractional scaling of the product in the instruction for shifts of 0 and 1. Perform a shift of 1 for Q1.15 numbers, perform a shift of 0 for integer multiplication. ### 4.2.2 Saturation Certain instructions are available in saturating form. If a saturating arithmetic instruction has a result that is smaller than the minimum value, the result is set to the minimum value. Similarly, if the operation has a result that is greater than the maximum value, the result is set to the maximum value. Saturation is specified in an instruction by adding the :sat specifier. For example: ``` R2 = abs(R1):sat ``` The OVF bit in the User status register is set whenever a saturating operation saturates to the maximum or minimum value. It remains set until explicitly cleared by a control register transfer to USR. For vector-type saturating operations, if any of the individual elements of the vector saturate, OVF is set. # 4.2.3 Arithmetic rounding Certain signed multiply instructions support optional arithmetic rounding (also known as biased rounding). The arithmetic rounding operation takes a double precision fractional value and adds 0x8000 to the low 16-bits (least significant 16-bit halfword). Rounding is specified in an instruction by adding the :rnd specifier. For example: ``` R2 = mpy(R1.h, R2.h):rnd ``` NOTE: Arithmetic rounding can accumulate numerical errors, especially when the number to round is exactly 0.5. This happens most frequently when dividing by 2 or averaging. # 4.2.4 Convergent rounding To address the problem of error accumulation in Arithmetic rounding, the Hexagon processor includes four instructions that support positive and negative averaging with a convergent rounding option. These instructions work as follows: - 1. Compute (A+B) or (A-B) for AVG and NAVG respectively. - 2. Based on the two least-significant bits of the result, add a rounding constant as follows: - ☐ If the two LSBs are 00, add 0 - ☐ If the two LSBs are 01, add 0 - ☐ If the two LSBs are 10, add 0 - ☐ If the two LSBs are 11, add 1 - 3. Shift the result right by one bit. # 4.2.5 Scaling for divide and square-root On the Hexagon processor, floating point divide and square-root operations are implemented in software using library functions. To enable the efficient implementation of these operations, the processor supports special variants of the multiply-accumulate instruction. These are named scale FMA. Scale FMA supports optional scaling of the product generated by the floating-point fused multiply-add instruction. Scaling is specified in the instruction by adding the :scale specifier and a predicate register operand. For example: ``` R3 += sfmpy(R0,R1,P2):scale ``` For single precision, the scaling factor is two raised to the power specified by the contents of the predicate register (which is treated as an 8-bit two's complement value). For double precision, the predicate register value is doubled before use as a power of two. NOTE: Do not use Scale FMA instructions outside of divide and square-root library routines. No guarantee is provided that future versions of the Hexagon processor will implement these instructions using the same semantics. Future versions assume only that compatibility for scale FMA is limited to the needs of divide and square-root library routines. # 4.3 XTYPE operations The XTYPE instruction class includes most of the data-processing operations performed by the Hexagon processor. These operations are categorized by their operation type. # 4.3.1 Floating point Floating-point operations manipulate single-precision floating point numbers. The Hexagon floating-point operations are defined to support the IEEE floating-point standard. However, certain IEEE-required operations – such as divide and square root – are not supported directly. Instead, special instructions are defined to support the implementation of the required operations as library routines. These instructions include: - A special version of the fused multiply-add instruction (designed specifically for use in library routines) - Reciprocal/square root approximations (which compute the approximate initial values used in reciprocal and reciprocal-square-root routines) - Extreme value assistance (which adjusts input values if they cannot produce correct results using convergence algorithms) For more information see Section 11.10.4. NOTE: The special floating-point instructions are not intended for use directly in user code – they should be used only in the floating point library. #### Format conversion The floating-point conversion instructions sfmake and dfmake convert an unsigned 10-bit immediate value into the corresponding floating-point value. The immediate value must be encoded so bits [5:0] contain the significand, and bits [9:6] the exponent. The exponent value is added to the initial exponent value (bias - 6). For example, to generate the single-precision floating point value 2.0, bits [5:0] must be set to 0, and bits [9:6] set to 7. Performing sfmake on this immediate value yields the floating point value 0x40000000, which is 2.0. NOTE: The conversion instructions are designed to handle common floating point values, including most integers and many basic fractions (1/2, 3/4, and so on). ### Rounding The Hexagon User status register includes the FPRND field, which specifies the IEEE-defined floating-point rounding mode. ### **Exceptions** The Hexagon User status register includes five status fields, which work as sticky flags for the five IEEE-defined exception conditions: inexact, overflow, underflow, divide by zero, and invalid. A sticky flag is set when the corresponding exception occurs, and remains set until explicitly cleared. The user status register also includes five mode fields that specify whether to perform an operating system trap if one of the floating-point exceptions occur. For every instruction packet that contains a floating point operation, if a floating point sticky flag and the corresponding trapenable bit are both set, a floating-point trap is generated. After the packet commits, the Hexagon processor automatically traps to the operating system. NOTE: Non-floating-point instructions never generate a floating point trap, regardless of the state of the sticky flag and trap-enable bits. # 4.3.2 Multiply Multiply operations support fixed-point multiplication, including both single- and double-precision multiplication, and polynomial multiplication. ### Single precision In single-precision arithmetic, a 16-bit value is multiplied by another 16-bit value. These operands can come from the high portion or low portion of any register. Depending on the instruction, the result of the $16 \times 16$ operation can optionally be accumulated, saturated, rounded, or shifted left by 0 to 1 bits. The instruction set supports operations on signed $\times$ signed, unsigned $\times$ unsigned, and signed $\times$ unsigned data. Table 4-1 shows the options available for $16 \times 16$ single precision multiplications. The symbols used in the table are as follows: - SS Perform signed × signed multiply - UU Perform unsigned × unsigned multiply - SU Perform signed × unsigned multiply - A+ Result added to accumulator - A- Result subtracted from accumulator - 0 Result not added to accumulator Table 4-1 Single-precision multiply options | Multiply | Result | Sign | Accumulate | Sat | Rnd | Scale | |----------|--------|------|------------|-----|-----|-------| | 16 ×16 | 32 | SS | A+, A- | Yes | No | 0-1 | | 16 ×16 | 32 | SS | 0 | Yes | Yes | 0-1 | | 16 ×16 | 64 | SS | A+, A- | No | No | 0-1 | | 16 ×16 | 64 | SS | 0 | No | Yes | 0-1 | | 16 ×16 | 32 | UU | A+, A-, 0 | No | No | 0-1 | | 16 ×16 | 64 | UU | A+, A-, 0 | No | No | 0-1 | | 116 ×16 | 32 | SU | A+, 0 | Yes | No | 0-1 | ### **Double precision** Double precision instructions are available for both $32 \times 32$ and $32 \times 16$ multiplication: - For $32 \times 32$ multiplication, the result is either 64 or 32 bits. The 32-bit result is either the high or low portion of the 64-bit product. - For $32 \times 16$ multiplication the result is always taken as the upper 32 bits. The operands are either signed or unsigned. Table 4-2 Double precision multiply options | Multiply | Result | Sign | Accumulate | Sat | Rnd | Scale | |----------|------------|--------|------------|-----|-----|-------| | 32 × 32 | 64 | SS, UU | A+, A-, 0 | No | No | 0 | | 32 × 32 | 32 (upper) | SS, UU | 0 | No | Yes | 0 | | 32 × 32 | 32 (low) | SS, UU | A+, 0 | No | No | 0 | | 32 × 16 | 32 (upper) | SS, UU | A+, 0 | Yes | Yes | 0-1 | | 32 × 32 | 32 (upper) | SU | 0 | No | No | 0 | ### **Polynomial** Polynomial multiply instructions are available for both words and vector halfwords. These instructions are useful for many algorithms including scramble code generation, cryptographic algorithms, convolutional, and Reed Solomon code. For more information on multiply operations, see Section 11.10.5. ### 4.3.3 Shift Scalar shift operations perform a variety of 32 and 64-bit shifts followed by an optional add/sub or logical operation. Figure 4-4 shows the general operation. Figure 4-4 64-bit shift and add/sub/logical Four shift types are supported: - ASR Arithmetic shift right - ASL Arithmetic shift left - LSR Logical shift right - LSL Logical shift left In register-based shifts, the Rt register is a signed two's-complement number. If this value is positive, the instruction opcode tells the direction of shift (right or left). If this value is negative, the shift direction indicated by the opcode is reversed. When arithmetic right shifts are performed, the sign bit is shifted in, whereas logical right shifts shift in zeros. Left shifts always shift in zeros. Some shifts are available with saturation and rounding options. For more information see Section 11.10.8. # 4.4 ALU32 operations The ALU32 instruction class includes general arithmetic/logical operations on 32-bit data. NOTE: ALU32 instructions can execute on any slot (Section 3.3.3). Chapter 6 describes the conditional execution and compare instructions. # 4.5 Vector operations Vector operations support arithmetic operations on vectors of bytes, halfwords, and words. The vector operations belong to the XTYPE instruction class (except for vector add, subtract, and average halfwords, which are ALU32). ### Vector byte operations The vector byte operations process packed vectors of signed or unsigned bytes. Vector halfword operations The vector halfword operations process packed 16-bit halfwords. Vector shift halfwords For example, Figure 4-5 shows the operation of the vector arithmetic shift right halfword (vasrh) instruction. In this instruction, each 16-bit half-word is shifted right by the same amount that is specified in a register or with an immediate value. Because the shift is arithmetic, the bits shifted in are copies of the sign bit. Figure 4-5 Vector halfword shift right ### **Vector word operations** The vector word operations process packed vectors of two words. For more information on vector operations, see Section 11.1.1 and Section 11.10.1. # 4.6 CR operations The CR instruction class includes operations that access the Control registers. Table 4-3 lists the instructions that access the control registers. Table 4-3 Control register transfer instructions | Syntax | Operation | |-----------|---------------------------------------------------------------| | Rd = Cs | Move control register to / from a general register. | | Cd = Rs | | | | NOTE: PC is not a valid destination register. | | Rdd = Css | Move control register pair to / from a general register pair. | | Cdd = Rss | | | | NOTE: PC is not a valid destination register. | NOTE: In register-pair transfers, Control registers must be specified using their numeric alias names. For more information see Section 11.2. # 4.7 Compound operations The instruction set includes some instructions that perform multiple logical or arithmetic operations in a single instruction. They include the following operations: - AND/OR with inverted input - Compound logical register - Compound logical predicate - Compound add-subtract with immediates - Compound shift-operation with immediates (arithmetic or logical) - Multiply-add with immediates For more information see Section 11.10.1. # 4.8 Special operations The instruction set includes some special-purpose instructions to support specific applications. # 4.8.1 H.264 CABAC processing H.264 or advanced video coding (AVC) is used in a diverse range of multimedia applications, for example, full HD video and audio. Context adaptive binary arithmetic coding (CABAC) is one of the two alternative entropy coding methods specified in the H.264 main profile. CABAC offers superior coding efficiency at the expense of greater computational complexity. The Hexagon processor includes a dedicated instruction (decbin) to support CABAC decoding. Binary arithmetic coding is based on the principle of recursive interval subdivision, and its state is characterized by two quantities: - The current interval range - The current offset in the current code interval The offset is read from the encoded bit stream. When decoding a bin, the interval range is subdivided in two intervals based on the estimation of the probability $_{pLPS}$ of LPS: one interval with width of rLPS = range x pLPS, and another with width of rMPS = range x pMPS = range -rLPS, where LPS stands for least probable symbol, and MPS for most probable symbol. Depending on the subinterval that the offset falls into, the decoder decides whether the bin is decoded as MPS or LPS, after which the two quantities are iteratively updated, as shown in Figure 4-1. Figure 4-1 Arithmetic decoding for one bin ### 4.8.1.1 CABAC implementation In H.264 range is a 9-bit quantity, and offset is 9 bits in regular mode and 10 bits in bypass mode during the whole decoding process. The calculation of rLPS is approximated by a $64 \times 4$ table of 256 bytes, where the range and the context state (selected for the bin to decode) address the lookup table. To maintain the precision of the whole decoding process, the new range must renormalize to ensure that the most significant bit is always 1, and that the offset is synchronously refilled from the bit stream. To simplify the renormalization/refilling process, the decoding scheme shown in Figure 4-2 significantly reduces the frequency of renormalization and refilling bits from the bit-stream, while also being suitable for DSP implementation. Figure 4-2 CABAC decoding engine for regular bin The Hexagon processor can use the decbin instruction to decode one regular bin in two cycles (not counting the bin refilling process). For more information on the decbin instruction see Section 11.10.6. For example: ``` Rdd = decbin(Rss,Rtt) INPUT: Rss and Rtt register pairs as: Rtt.w1[5:0] = state Rtt.w1[8] = valMPS Rtt.w0[4:0] = bitpos Rss.w0 = range Rss.w1 = offset OUTPUT: Rdd register pair is packed as Rdd.w0[5:0] = state Rdd.w0[8] = valMPS Rdd.w0[8] = valMPS Rdd.w0[22:16] = '0' Rdd.w1 = offset (normalized) OUTPUT: P0 P0 = (bin) ``` ### 4.8.1.2 Code example ``` H264CabacGetBinNC: /********************* * Nonconventional call: * Input: R1:0 = offset : range, R2 = dep, R3 = ctxIdx, R4 = (*ctxIdx), R5 = bitpos * Return: R1: 0 - offset : range P0 - (bin) ********************** // Cycle #1 { R1:0= decbin(R1:0,R5:4) // Decoding one bin R6 = asl(R22, R5) // where R22 = 0x100 } // Cycle #2 memb(R3) = R0 // Save context to *ctxIdx R1:0 = vlsrw(R1:0,R5) // Realign range and offset P1 = cmp.gtu(R6,R1) // Need refill? i.e., P1= (range<0x100) \{ \text{ memb (R3)} = \text{R0} IF (!P1.new) jumpr:t LR // Return RENORM REFILL: ``` ### 4.8.2 IP Internet checksum The key features of the Internet checksum<sup>1</sup> include: - The checksum can be summed in any order - Carries can be accumulated using an accumulator larger than size being added, and added back in at any time Using standard data-processing instructions, the Internet checksum can be computed at 8 bytes per cycle in the main loop, by loading words and accumulating into doublewords. After the loop, the upper word is added to the lower word; then the upper halfword is added to the lower halfword, and any carries are added back in. The Hexagon processor supports a dedicated instruction (vradduh) that computes the Internet checksum at a rate of 16 bytes per cycle. The vradduh instruction accepts the halfwords of the two input vectors, adds them all together, and places the result in a 32-bit destination register. This operation can both compute the sum of 16 bytes of input while preserving the carries, and accumulate carries at the end of computation. For more information on the vradduh instruction, see Vector reduce add halfwords. NOTE: This operation utilizes the maximum load bandwidth available in the Hexagon processor. ### 4.8.2.1 Code example ``` .global fast ip check // Assumes data is 8-byte aligned // Assumes data is padded at least 16 bytes afterwords with 0's. // Input R0 points to data // Input R1 is length of data // Returns IP checksum in R0 fast ip check: R1 = lsr(R1, #4) // 16-byte chunks, rounded down, +1 R9:8 = combine(#0,#0) R3:2 = combine(#0,#0) } loop0(1f,R1) R7:6 = memd(R0+#8) R5:4 = memd(R0++#16) } .falign 1: R7:6 = memd(R0+#8) R5:4 = memd(R0++#16) R2 = vradduh(R5:4,R7:6) // Accumulate 8 halfwords ``` <sup>&</sup>lt;sup>1</sup> See RFC 1071 (http://www.faqs.org/rfcs/rfc1071.html) ``` R8 = vradduh(R3:2,R9:8) // Accumulate carries }:endloop0 // Drain pipeline R2 = vradduh(R5:4,R7:6) R8 = vradduh(R3:2,R9:8) R5:4 = combine(#0,#0) } { R8 = vradduh(R3:2,R9:8) R1 = #0 // May have some carries to add back in R0 = vradduh(R5:4,R9:8) // Possible for one more to pop out R0 = vradduh(R5:4,R1:0) R0 = not(R0) jumpr LR ``` ### 4.8.3 Software-defined radio The Hexagon processor includes six special-purpose instructions that support the implementation of software-defined radio. The instructions greatly accelerate the following algorithms: - Rake despreading - Scramble code generation - Polynomial field processing ### 4.8.3.1 Rake despreading A fundamental operation in despreading is the PN multiply operation. This operation compares the received complex chips against a pseudo-random sequence of QAM constellation points and accumulated. Figure 4-3 shows the vrcrotate instruction that performs this operation. The products are summed to form a soft 32-bit complex symbol. The instruction has both accumulating and non-accumulating versions. Figure 4-3 Vector reduce complex rotate For more information on the vrcrotate instruction, see Vector reduce complex rotate. NOTE: Using this instruction, the Hexagon processor can process 5.3 chips per cycle, and a 12-finger WCDMA user requires only 15 MHz. ### 4.8.3.2 Polynomial operations The polynomial multiply instructions support the following operations: - Scramble code generation (at a rate of 8 symbols per cycle for WCDMA) - Cryptographic algorithms (such as Elliptic Curve) - CRC checks (at a rate of 21 bits per cycle) - Convolutional encoding - Reed Solomon codes The versions of this instruction support $32 \times 32$ and vector $16 \times 16$ multiplication, both with and without accumulation, as shown in Figure 4-4. Figure 4-4 Polynomial multiply For more information on the pmpy instructions, see Polynomial multiply words and Vector polynomial multiply halfwords. # 5 Memory The Hexagon processor features a load/store architecture, where numeric and logical instructions operate on registers. Explicit load instructions move operands from memory to registers, while store instructions move operands from registers to memory. A small number of instructions (known as mem-ops) perform numeric and logical operations directly on memory. The address space is unified: all accesses target the same linear address space, which contains both instructions and data. # 5.1 Memory model for the Hexagon processor # 5.1.1 Address space The Hexagon processor has a 32-bit byte-addressable memory address space. The entire 4G linear address space is addressable by the user application. A virtual-to-physical address translation mechanism is handled by a resident OS. Virtual memory supports the implementation of memory management and memory protection in a hardware-independent manner. # 5.1.2 Byte order The Hexagon processor is a little-endian machine: the lowest address byte in memory is held in the least significant byte of a register, as shown in Figure 5-1. Figure 5-1 Hexagon processor byte order # 5.1.3 Alignment Even though the Hexagon processor memory is byte-addressable, instructions and data must align in memory on specific address boundaries: - Instructions and instruction packets must be 32-bit aligned - Data must be aligned to its native access size. Any unaligned memory access causes a memory-alignment exception. Use the permute instructions in applications that must reference unaligned vector data. The loads and stores still must be memory-aligned; however, the permute instructions enable easy rearrangement of the data in registers. Table 5-1 Memory alignment restrictions | Data type | Size (bits) | Exception when | |-----------------------------------|-------------|--------------------------| | Byte unsigned byte | 8 | Never | | Halfword unsigned halfword | 16 | LSB[0] != 0 <sup>1</sup> | | Word unsigned word | 32 | LSB[1:0] != 00 | | Doubleword | 64 | LSB[2:0] != 000 | | Instruction<br>Instruction packet | 32 | LSB[1:0] != 00 | <sup>&</sup>lt;sup>1</sup> LSB = Least significant bits of address # 5.2 Memory loads Memory is loaded in byte, halfword, word, or doubleword sizes. The data types supported are signed or unsigned. The syntax used is memXX, where XX denotes the data type. Table 5-2 Load instructions | Syntax | Source size (bits) | Destination size (bits) | Data<br>placement | Comment | |------------------|--------------------|-------------------------|-------------------|-----------------------------------| | Rd = memub(Rs) | 8 | 32 | Low 8 bits | Zero-extend 8 to 32 bits | | Rd = memb(Rs) | 8 | 32 | Low 8 bits | Sign-extend 8 to 32 bits | | Rd = memuh(Rs) | 16 | 32 | Low 16 bits | Zero-extend 16 to 32 bits | | Rd = memh(Rs) | 16 | 32 | Low 16 bits | Sign-extend 16 to 32 bits | | Rd = memubh(Rs) | 16 | 32 | Bytes 0 and 2 | Bytes 1 and 3 zeroed <sup>1</sup> | | Rd = membh(Rs) | 16 | 32 | Bytes 0 and 2 | Bytes 1 and 3 sign-extended | | Rd = memw(Rs) | 32 | 32 | All 32 bits | Load word | | Rdd = memubh(Rs) | 32 | 64 | Bytes 0,2,4,6 | Bytes 1,3,5,7 zeroed | | Rdd = membh(Rs) | 32 | 64 | Bytes 0,2,4,6 | Bytes 1,3,5,7 sign-extended | Table 5-2 Load instructions | Syntax | Source size (bits) | Destination size (bits) | Data<br>placement | Comment | |---------------------|--------------------|-------------------------|-------------------|--------------------------------| | Rdd = memd(Rs) | 64 | 64 | All 64 bits | Load doubleword | | Ryy = memh_fifo(Rs) | 16 | 64 | High 16 bits | Shift vector and load halfword | | deallocframe | 64 | 64 | All 64 bits | See Chapter 7 | | dealloc_return | 64 | 64 | All 64 bits | See Chapter 7 | <sup>1</sup> The memubh and membh instructions load contiguous bytes from memory (either 2 or 4 bytes) and unpack these bytes into a vector of halfwords. The instructions are useful when bytes are used as input into halfword vector operations, which is common in video and image processing.. NOTE: The memory load instructions belong to instruction class LD, and execute only in Slots 0 or 1. # **5.3 Memory stores** Memory is stored in byte, halfword, word, or doubleword sizes. The syntax used is memX, where X denotes the data type. Table 5-3 Store instructions | Syntax | Source<br>size<br>(bits) | Destination<br>size<br>(bits) | Comment | |------------------|--------------------------|-------------------------------|-------------------------------| | memb(Rs) = Rt | 32 | 8 | Store byte (bits 7:0) | | memb(Rs) = #s8 | 8 | 8 | Store byte | | memh(Rs) = Rt | 32 | 16 | Store lower half (bits 15:0) | | memh(Rs) = Rt.H | 32 | 16 | Store upper half (bits 31:16) | | memh(Rs) = #s8 | 8 | 16 | Sign-extend 8 to 16 bits | | memw(Rs) = Rt | 32 | 32 | Store word | | memw(Rs) = #s8 | 8 | 32 | Sign-extend 8 to 32 bits | | memd(Rs) = Rtt | 64 | 64 | Store doubleword | | allocframe(#u11) | 64 | 64 | See Chapter 7 | NOTE: The memory store instructions belong to instruction class ST, and execute only in slot 0 or when part of a dual store (Section 5.4) slot 1. ### 5.4 Dual stores Two memory store instructions can appear in the same instruction packet. The resulting operation is considered a dual store. For example: Unlike most packetized operations, dual stores do not execute in parallel (Section 3.3.1). Instead, the store instruction in Slot 1 effectively executes first, followed by the store instruction in Slot 0. NOTE: The store instructions in a dual store must belong to instruction class ST (Section 5.3), and execute only in Slots 0 and 1. ### 5.5 Slot 1 store with slot 0 load A slot 1 store operation with a slot 0 load operation can appear in a packet. The packet attribute :mem\_noshuf inhibits the instruction reordering that would otherwise be done by the assembler. For example: ``` { memw(R5) = R2 // Slot 1 store R3 = memh(R6) // Slot 0 load }:mem_noshuf ``` Unlike most packetized operations, these memory operations do not execute in parallel (Section 3.3.1). Instead, the store instruction in slot 1 executes first, followed by the load instruction in Slot 0. If the addresses of the two operations overlap, the load receives the newly stored data. This feature is supported in processor versions V65 or greater. # 5.6 New-value stores A memory store instruction can store a register that is assigned a new value in the same instruction packet (Section 3.3). This feature is expressed in assembly language by appending the suffix <code>.new</code> to the source register. In many cases, a predicate or general register are both generated and used in the same instruction packet. This feature is expressed in assembly language by appending the suffix <code>.new</code> to the specified register. For example: New-value store instructions have the following restrictions: - If an instruction uses auto-increment or absolute-set Addressing modes, its address register cannot be the new-value register. - If an instruction produces a 64-bit result, its result registers cannot be the new-value register. ■ If the instruction that sets a new-value register is conditional (Section 6.1.2), it must always execute. NOTE: The new-value store instructions belong to instruction class NV, and execute only in Slot 0. # 5.7 Mem-ops Mem-ops perform basic arithmetic, logical, and bit operations directly on memory operands, without the need for a separate load or store. Mem-ops are performed on byte, halfword, or word sizes. Table 5-4 Mem-ops | Syntax | Operation | |-----------------------------|------------------------------| | memXX(Rs+#u6) [+- &] = Rt | Arithmetic/logical on memory | | memXX(Rs+#u6) [+-] = #u5 | Arithmetic on memory | | memXX(Rs+#u6) = clrbit(#u5) | Clear bit in memory | | memXX(Rs+#u6) = setbit(#u5) | Set bit in memory | NOTE: The mem-op instructions belong to instruction class MEMOP, and execute only in Slot 0. # 5.8 Addressing modes Table 5-5 Addressing modes supported by the Hexagon processor | Mode | Syntax | Operation <sup>1</sup> | |---------------------------------------|----------------------------------------|--------------------------------------| | 32-bit absolute | memXX(##address) | EA = address | | 32-bit absolute-set | memXX(Re=##address) | EA = address<br>Re = address | | Absolute with register offset | memXX(Ru<<#u2+##U32) | EA = imm + (Ru << #u2) | | Global-pointer-relative | memXX(GP+#immediate) memXX(#immediate) | EA = GP + immediate | | Indirect | memXX(Rs) | EA = Rs | | Indirect with offset | memXX(Rs+#s11) | EA = Rs + imm | | Indirect with register offset | memXX(Rs+Ru<<#u2) | EA = Rs + (Ru << #u2) | | Indirect with autoincrement immediate | memXX(Rx++#s4) | EA = Rx;<br>Rx += (imm) | | Indirect with autoincrement register | memXX(Rx++Mu) | EA = Rx;<br>Rx += Mu | | Circular with autoincrement immediate | memXX(Rx++#s4:circ(Mu)) | EA = Rx;<br>Rx = circ_add(Rx,imm,Mu) | Table 5-5 Addressing modes supported by the Hexagon processor | Mode | Syntax | Operation <sup>1</sup> | | | |------------------------------------------|-----------------------|------------------------------------------------|--|--| | Circular with autoincrement register | memXX(Rx++I:circ(Mu)) | <pre>EA = Rx;<br/>Rx = circ_add(Rx,I,Mu)</pre> | | | | Bit-reversed with autoincrement register | memXX(Rx++Mu:brev) | EA = Rx.H + bit_reverse(Rx.L) Rx += Mu | | | <sup>&</sup>lt;sup>1</sup> EA (Effective Address) is equivalent to VA (Virtual Address). ### 5.8.1 Absolute The absolute addressing mode uses a 32-bit constant value as the effective memory address. For example: ``` R2 = memw(\#\#100000) // Load R2 with word from addr 100000 memw(\#\#200000) = R4 // Store R4 to word at addr 200000 ``` ### 5.8.2 Absolute-set The absolute-set addressing mode assigns a 32-bit constant value to the specified general register, then uses the assigned value as the effective memory address. For example: ``` R2 = memw (R1=\#400000) \\ // Load R2 with word from addr 400000 \\ // and load R1 with value 400000 \\ memw (R3=\#600000) = R4 \\ // Store R4 to word at addr 600000 \\ // and load R3 with value 600000 \\ ``` # 5.8.3 Absolute with register offset The absolute with register offset addressing mode performs an arithmetic left shift of a 32-bit general register value by the amount specified in a 2-bit unsigned immediate value, and then adds the shifted result to an unsigned 32-bit constant value to create the 32-bit effective memory address. For example: ``` R2 = memh(R3 << #3 + ##100000) // Load R2 with signed halfword // from addr [100000 + (R3 << 3)] ``` The 32-bit constant value is the base address, and the shifted result is the byte offset. NOTE: This addressing mode is useful for loading an element from a global table, where the immediate value is the name of the table, and the register holds the index of the element. ### 5.8.4 Global pointer relative The global pointer relative addressing mode adds an unsigned offset value to the Hexagon processor global data pointer GP to create the 32-bit effective memory address. This addressing mode accesses global and static data in C. Global pointer relative addresses can be expressed two ways in assembly language: - By explicitly adding an unsigned offset value to register GP - By specifying only an immediate value as the instruction operand #### For example: Specifying only an immediate value causes the assembler and linker to automatically subtract the value of the special symbol <code>\_SDA\_BASE\_</code> from the immediate value, and use the result as the effective offset from <code>GP</code>. The global data pointer is programmed in the GDP field of register GP (Section 2.3.8). This field contains an unsigned 26-bit value that specifies the most significant 26 bits of the 32-bit global data pointer. The least significant 6 bits of the pointer are always defined as zero. The memory area referenced by the global data pointer is known as the *global data area*. It can be up to 512 kB in length, and – because of the way the global data pointer is defined – must align to a 64-byte boundary in virtual memory. When expressed in assembly language, the offset values used in global pointer relative addressing always specify byte offsets from the global data pointer. The offsets must be integral multiples of the size of the instruction data type. | Table E C | Officet reneree | /alabal | | "alativa\ | |-----------|-----------------|---------|--------|-----------| | rable 5-6 | Offset ranges | tulobai | bomter | relative | | Data type | Offset range | Offset must be multiple of | |------------|--------------|----------------------------| | doubleword | 0 524280 | 8 | | word | 0 262140 | 4 | | halfword | 0 131070 | 2 | | byte | 0 65535 | 1 | NOTE: When using global pointer relative addressing, the immediate operand should be a symbol in the .sdata or .sbss section to ensure that the offset is valid. ### 5.8.5 Indirect The indirect addressing mode uses a 32-bit value stored in a general register as the effective memory address. For example: ``` R2 = memub(R1) // load R2 with unsigned byte from addr R1 ``` ### 5.8.6 Indirect with offset The indirect with offset addressing mode adds a signed offset value to a general register value to create the 32-bit effective memory address. For example: ``` R2 = memh(R3 + #100) // load R2 with signed halfword // from [R3 + 100 bytes] ``` When expressed in assembly language, the offset values always specify byte offsets from the general register value. The offsets must be integral multiples of the size of the instruction data type. Table 5-7 Offset ranges (indirect with offset) | Data type | Offset range | Offset must be multiple of | |------------|--------------|----------------------------| | doubleword | -8192 8184 | 8 | | word | -4096 4092 | 4 | | halfword | -2048 2046 | 2 | | byte | -1024 1023 | 1 | NOTE: The offset range is smaller for conditional instructions (Section 5.9). ## 5.8.7 Indirect with register offset The indirect with register offset addressing mode adds a 32-bit general register value to the result created by performing an arithmetic left shift of a second 32-bit general register value by the amount specified in a 2-bit unsigned immediate value, forming the 32-bit effective memory address. For example: ``` R2 = memh(R3+R4<<\#1) // load R2 with signed halfword // from [R3 + (R4 << 1)] ``` The register values always specify byte addresses. ### 5.8.8 Indirect with autoincrement immediate The indirect with autoincrement immediate addressing mode uses a 32-bit value stored in a general register to specify the effective memory address. However, after the address is accessed, a signed value (known as the increment) is added to the register so it specifies a different memory address (which is accessed in a subsequent instruction). For example: ``` R2 = memw(R3++#4) // R3 contains the effective address // R3 is then incremented by 4 ``` When expressed in assembly language, the increment values always specify byte offsets from the general register value. The offsets must be integral multiples of the size of the instruction data type. | Data type | Increment range | Increment must be multiple of | |------------|-----------------|-------------------------------| | doubleword | -64 56 | 8 | | word | -32 28 | 4 | | halfword | -16 14 | 2 | | byte | -8 7 | 1 | Table 5-8 Increment ranges (indirect with auto-increment immediate) ### 5.8.9 Indirect with autoincrement register The indirect with autoincrement register addressing mode is functionally equivalent to indirect with autoincrement immediate, but uses one of the two dedicated address-modify register M0 and M1 (which are part of the control registers Section 2.3.4) instead of an immediate value to hold the increment. For example: ``` R2 = memw(R0++M1) // The effective addr is the value of R0. // Next, M1 is added to R0 and the result // is stored in R0. ``` When autoincrementing with a modifier register, the increment is a signed 32-bit value that is added to the general register. This offers two advantages over auto-increment immediate: - A larger increment range - Variable increments (because the modifier register can be programmed at runtime) The increment value always specifies a byte offset from the general register value. NOTE: The signed 32-bit increment range is identical for all instruction data types (doubleword, word, halfword, byte). ### 5.8.10 Circular with autoincrement immediate The circular with autoincrement immediate addressing mode is a variant of indirect with autoincrement addressing – it accesses data buffers in a modulo wrap-around fashion. Circular addressing is common in data stream processing. Circular addressing is expressed in assembly language with the address modifier :circ(Mx), where Mx specifies a modifier register that is programmed to specify the circular buffer (Section 2.3.4). ### For example: ``` R0 = memb(R2++#4:circ(M0)) // Load from R2 in circ buf specified // by M0 memw(R2++#8:circ(M1)) = R0 // Store to R2 in circ buf specified // by M1 ``` Circular addressing is set up by programming the following elements: - The Length field of the Mx register is set to the length (in bytes) of the circular buffer to access. A circular buffer is from 4 to (128K-1) bytes long. - Bits 27:24 of the Mx register are always set to 0. - The circular start register CSx that corresponds to Mx (CSO for MO, CS1 for M1) is set to the start address of the circular buffer. In circular addressing, after memory is accessed at the address specified in the general register, the general register is incremented by the immediate increment value and then modulo'd by the circular buffer length to implement wrap-around access of the buffer. When expressed in assembly language, the increment values always specify byte offsets from the general register value. The offsets must be integral multiples of the size of the instruction data type. | Data type Increment range | | Increment must be multiple of | |---------------------------|--------|-------------------------------| | doubleword | -64 56 | 8 | | word | -32 28 | 4 | | halfword | -16 14 | 2 | | byte | -8 7 | 1 | Table 5-9 Increment ranges (circular with auto-increment immediate addressing) When programming a circular buffer, the following rules apply: - The start address must align to the native access size of the buffer elements. - ABS(Increment) < Length. The absolute value of the increment must be less than the buffer length. - Access size < (Length-1). The memory access size (1 for byte, 2 for halfword, 4 for word, 8 for doubleword) must be less than (Length-1). - Buffers must not wrap around in the 32-bit address space. NOTE: If any of these rules are not followed, the execution result is undefined. For example, a 150-byte circular buffer can be set up and accessed as follows: The following C function describes the behavior of the circular add function: ``` unsigned int fcircadd(unsigned int pointer, int offset, unsigned int M reg, unsigned int CS reg) ``` ``` unsigned int length; int new_pointer, start_addr, end_addr; length = (M_reg&0x01ffff); // lower 17-bits gives buffer size new_pointer = pointer+offset; start_addr = CS_reg; end_addr = CS_reg + lenth; if (new_pointer >= end_addr) { new_pointer -= length; } else if (new_pointer < start_addr) { new_pointer += length; } return (new_pointer); }</pre> ``` # 5.8.11 Circular with autoincrement register The circular with autoincrement register addressing mode is functionally equivalent to circular with autoincrement immediate, but uses a register instead of an immediate value to hold the increment. Register increments are specified in circular addressing instructions by using the symbol I as the increment (instead of an immediate value). For example: When autoincrementing with a register, the increment is a signed 11-bit value that is added to the general register. This offers two advantages over circular addressing with immediate increments: - Larger increment ranges - Variable increments (because the increment register can be programmed at runtime) The circular register increment value is programmed in the $\[mu]$ field of the modifier register $\[mu]$ x (Section 2.3.4) as part of setting up the circular data access. This register field holds the signed 11-bit increment value. Increment values are expressed in units of the buffer element data type, and automatically scale at runtime to the proper data access size. | Table 5-10 Increment ra | anges (circu | lar with autoincrem | ent register a | iddressing) | |-------------------------|--------------|---------------------|----------------|-------------| |-------------------------|--------------|---------------------|----------------|-------------| | Data type | Increment range | Increment must be multiple of | |------------|-----------------|-------------------------------| | doubleword | -8192 8184 | 8 | | word | -4096 4092 | 4 | | halfword | -2048 2046 | 2 | | byte | -1024 1023 | 1 | When programming a circular buffer (with either a register or immediate increment), the rules that apply to circular addressing must be followed – for details see Section 5.8.10. NOTE: If any of these rules are not followed, the execution result is undefined. ### 5.8.12 Bit-reversed with autoincrement register The bit-reversed with autoincrement register addressing mode is a variant of indirect with autoincrement addressing – it accesses data buffers using an address value that is the bit-wise reversal of the value stored in the general register. Bit-reversed addressing is used in fast Fourier transforms (FFT) and Viterbi encoding. The bit-wise reversal of a 32-bit address value is defined as follows: - The lower 16 bits are transformed by exchanging bit 0 with bit 15, bit 1 with bit 14, and so on. - The upper 16 bits remain unchanged. Bit-reversed addressing is expressed in assembly language with the address modifier : brev. #### For example: ``` R2 = memub(R0++M1:brev) // The address is (R0.H | bitrev(R0.L)) // The original R0 (not reversed) is added // to M1 and written back to R0 ``` The initial values for the address and increment must be set in bit-reversed form, with the hardware bit-reversing the bit-reversed address value to form the effective address. The buffer length for a bit-reversed buffer must be an integral power of 2, with a maximum length of 64 kB. To support bit-reversed addressing, buffers must be properly aligned in memory. A bit-reversed buffer is properly aligned when its starting byte address is aligned to a power of 2 greater than or equal to the buffer size (in bytes). For example: ``` int bitrev buf[256] attribute ((aligned(1024))); ``` The bit-reversed buffer declared above is aligned to 1024 bytes because the buffer size is 1024 bytes (256 integer words $\times$ 4 bytes), and 1024 is an integral power of 2. The buffer location pointer for a bit-reversed buffer must be initialized so the least-significant 16 bits of the address value are bit-reversed. The increment value must be initialized to the following value: ``` bitreverse(buffer_size_in_bytes/2) ``` where bitreverse is defined as bit-reversing the least-significant 16 bits while leaving the remaining bits unchanged. NOTE: To simplify the initialization of the bit-reversed pointer, bit-reversed buffers can be aligned to a 64 kB boundary. This initializes the bit-reversed pointer to the base address of the bit-reversed buffer, with no bit-reversing required for the least-significant 16 bits of the pointer value (which are all set to 0 by the 64 kB alignment). In most cases, because buffers allocated on the stack only have an alignment of eight bytes or less, bit-reversed buffers should not be declared on the stack. After a bit-reversed memory access is complete, the general register is incremented by the register increment value. The value in the general register is never affected by the bit-reversal that is performed as part of the memory access. NOTE: The Hexagon processor supports only register increments for bit-reversed addressing – it does not support immediate increments. ### 5.9 Conditional load/stores Some load and store instructions can execute conditionally based on predicate values that were set in a previous instruction. The compiler generates conditional loads and stores to increase instruction-level parallelism. Conditional loads and stores are expressed in assembly language with the instruction prefix "if (pred\_expr)", where pred\_expr specifies a predicate register expression (Section 6.1). For example: ``` if (P0) R0 = memw(R2) // conditional load if (!P2) memh(R3 + \#100) = R1 // conditional store if (P1.new) R3 = memw(R3++\#4) // conditional load ``` Not all addressing modes are supported in conditional loads and stores. Table 5-11 shows supported modes. | <b>Table 5-11</b> | Addressing | modes | (conditional | load/store) | |-------------------|------------|-------|--------------|-------------| | | | | | | | Addressing mode | Conditional | |------------------------------------------|-------------| | Absolute | Yes | | Absolute-set | No | | Absolute with register offset | No | | Global pointer relative | No | | Indirect | Yes | | Indirect with offset | Yes | | Indirect with register offset | Yes | | Indirect with autoincrement immediate | Yes | | Indirect with autoincrement register | No | | Circular with autoincrement immediate | No | | Circular with autoincrement register | No | | Bit-reversed with autoincrement register | No | When a conditional load or store instruction uses indirect-with-offset addressing mode, the offset range is smaller than the range normally defined for indirect-with-offset addressing (Section 5.8.6). Table 5-12 Conditional and normal offset ranges (indirect with offset addressing) | Data type | Offset range (conditional) | Offset range<br>(normal) | Offset must be multiple of | |------------|----------------------------|--------------------------|----------------------------| | doubleword | 0 504 | -8192 8184 | 8 | | word | 0 252 | -4096 4092 | 4 | | halfword | 0 126 | -2048 2046 | 2 | | byte | 0 63 | -1024 1023 | 1 | NOTE: For more information on conditional execution, see Chapter 6. # 5.10 Cache memory Memory accesses can be cached or uncached. Separate L1 instruction and data caches exist for program code and data. A unified L2 cache can be partly or wholly configured as tightly-coupled memory (TCM) The Hexagon processor has a cache-based memory architecture: - A level 1 instruction cache holds recently-fetched instructions. - A level 1 data cache holds recently-accessed data memory. Load/store operations that access memory through the level 1 caches are referred to as cached accesses. Load/stores that bypass the level 1 caches are referred to as uncached accesses. Specific memory areas can be configured so they perform cached or uncached accesses. This configuration is performed by the Hexagon processor's memory management unit (MMU). The operating system is responsible for programming the MMU. Two types of caching are supported (as cache modes): - Write-through caching keeps the cache data consistent with external memory by always writing to the memory any data that is stored in the cache. - Write-back caching stores data in the cache without being immediately written to external memory. Cached data that is inconsistent with external memory is referred to as dirty. The Hexagon processor includes dedicated cache maintenance instructions that push dirty data out to external memory. # 5.10.1 Uncached memory In some cases load/store operations must bypass the cache memories and be serviced externally (for example, when accessing memory-mapped I/O, registers, and peripheral devices, or other system defined entities). The operating system is responsible for configuring the MMU to generate uncached memory accesses. Uncached memory is categorized into two distinct types: - Device-type is for accessing memory that has side-effects (such as a memory-mapped FIFO peripheral). The hardware ensures that interrupts do not cancel a pending device access. The hardware does not reorder device accesses. Peripheral control registers should be marked as device-type. - Uncached-type is for memory-like memory. No side effects are associated with an access. The hardware can load from uncached memory multiple times. The hardware can reorder uncached accesses. For instruction accesses, device-type memory is functionally identical to uncached-type memory. For data accesses, they are different. Code can execute directly from the L2 cache, bypassing the L1 cache. # 5.10.2 Tightly coupled memory The Hexagon processor supports tightly-coupled instruction memory at Level 1, which is defined as memory with similar access properties to the instruction cache. Tightly-coupled memory is also supported at level 2, which is defined as backing store to the primary caches. For more information see Chapter 9. # 5.10.3 Cache maintenance operations The Hexagon processor includes dedicated cache maintenance instructions that invalidate cache data or push dirty data out to external memory. The cache maintenance instructions operate on specific memory addresses. If the instruction causes an address error (due to a privilege violation), the processor raises an exception. NOTE: The exception to this rule is defetch, which never causes a processor exception. Whenever maintenance operations are performed on the instruction cache, the isync instruction (Section 5.11) must execute immediately afterwards. This instruction ensures that the maintenance operations are observed by subsequent instructions. Table 5-13 Cache instructions (user-level) | Syntax | Permitted In packet | Operation | |-----------------|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------| | icinva(Rs) | Solo <sup>1</sup> | Instruction cache invalidate. | | | | Look up instruction cache at address Rs. If the address is in the cache, invalidate it. | | dccleaninva(Rs) | Slot 1 | Data cache clean and invalidate. | | | empty or | Look up data cache at address Rs. | | | ALU32 only | If the address is in the cache and has dirty data, flush that data out to memory. The cache line is then invalidated, whether or not dirty data was written. | | dccleana(Rs) | Slot 1 | Data cache clean. | | | empty or | Look up data cache at address Rs. | | | ALU32 only | If the address is in the cache and has dirty data, flush that data out to memory. | | dcinva(Rs) | Slot 1 | Equivalent to dccleaninva (Rs). | | | empty or | | | | ALU32 only | | | dcfetch(Rs) | Normal <sup>2</sup> | Data cache prefetch. | | | | Prefetch data at address Rs into the data cache. | | | | NOTE - This instruction does not cause an exception. | | 12fetch(Rs,Rt) | ALU32 or | L2 cache prefetch. | | | XTYPE only | Prefetch data from memory specified by Rs and Rt into L2 cache. | <sup>&</sup>lt;sup>1</sup> Solo means that the instruction must not be grouped with other instructions in a packet. # 5.10.4 L2 cache operations The cache maintenance operations (Section 5.10.3) operate on both the L1 and L2 caches. The data cache coherency operations (including clean, invalidate, and clean and invalidate) affect both the L1 and L2 caches, and ensure that the memory hierarchy remains coherent. However, the instruction cache invalidate operation affects only the L1 cache. Therefore, invalidating instructions that might be in the L1 or L2 caches requires a two-step procedure: - 1. Use icinva to invalidate instructions from the L1 cache. - 2. Use dcinva separately to invalidate instructions from the L2 cache. #### 5.10.5 Cache line zero The Hexagon processor includes the dczeroa instruction. This instruction allocates a line in the L1 data cache and clears it (by storing all zeros). The behavior is as follows: ■ The Rs register value must be 32-byte aligned. If it is unaligned, the processor raises an unaligned error exception. <sup>&</sup>lt;sup>2</sup> Normal means that the normal instruction-grouping constraints apply. - For a cache hit, the specified cache line is cleared (written with all zeros) and made dirty. - For a cache miss, the specified cache line is not fetched from external memory. Instead, the line is allocated in the data cache, cleared, and made dirty. This instruction is useful in optimizing write-only data. It allows for the use of write-back pages – the most power and performance efficient – without the need to initially fetch the line to write. This removes unnecessary read bandwidth and latency. NOTE: dczeroa has the same exception behavior as write-back stores. A packet with dozeroa must have Slot 1 either empty or containing an ALU32 instruction. # 5.10.6 Cache prefetch The Hexagon processor supports the following types of cache prefetching: - Hardware-based instruction cache prefetching - Software-based data cache prefetching - Software-based L2fetch - Hardware-based data cache prefetching #### Hardware-based instruction cache prefetching L1 and L2 instruction cache prefetching can be enabled or disabled on a per-thread basis – this is done by setting the HFI field in the user status register (Section 2.3.3). Software-based data cache prefetching The Hexagon processor includes the instruction defetch. This instruction queries the L1 data cache based on the address specified in the instruction: - If the address is present in the cache, no action is taken. - If the cache line for the address is missing, the processor attempts to fill the cache line from the next level of memory. The thread does not stall, but rather continues executing while the cache line fill occurs in the background. - If the address is invalid, no exception is generated and the defetch instruction is treated as a NOP. #### Software-based L2fetch More powerful L2 prefetching of data or instructions is provided by the l2fetch instruction, which specifies an area of memory that is prefetched by the hardware prefetch engine of the Hexagon processor. l2fetch specifies two registers (Rs and Rt) as operands. Rs contains the 32-bit virtual start address of the memory area to prefetch. Rt contains three bit fields that further specify the memory area: - Rt[15:8] Width, specifies the width (in bytes) of a block of memory to fetch. - Rt[7:0] Height, specifies the number of Width-sized blocks to fetch. ■ Rt[31:16] — Stride, specifies an unsigned byte offset that increments the pointer after each Width-sized block is fetched. The 12fetch instruction is nonblocking: it initiates a prefetch operation that is performed in the background by the prefetch engine while the thread continues to execute Hexagon processor instructions. The prefetch engine requests all lines in the specified memory area. If the line(s) of interest are already resident in the L2 cache, the prefetch engine performs no action. If the lines are not in the L2 cache, the prefetch engine attempts to fetch them. The prefetch engine makes a best effort to prefetch the requested data, and attempts to perform prefetching at a lower priority than demand fetches. This prevents the prefetch engine from adding bus traffic when the system is under a heavy load. If a program executes an 12fetch instruction while the prefetch operation from a previous 12fetch is still active, the prefetch engine halts the current prefetch operation. NOTE: Executing 12 fetch with any bit field operand programmed to zero cancels prefetch activity. The status of the current prefetch operation is maintained in the PFA field of the user status register (Section 2.3.3). This field can determine whether a prefetch operation has completed. With respect to MMU permissions and error checking, the 12fetch instruction behaves similarly to a load instruction. If the virtual address causes a processor exception, the exception is taken. This differs from the defetch instruction, which is treated as a NOP in the presence of a translation/protection error. NOTE: Prefetches are dropped when the generated prefetch address resides on a different page than the start address. The programmer must use sufficiently large pages to ensure this does not occur. Figure 5-2 shows two examples of using the 12fetch instruction. The first shows a box prefetch, where a 2D range of memory is defined within a larger frame. The second example shows a prefetch for a large linear memory area of size (Lines \* 128). Figure 5-2 L2fetch instruction #### Hardware-based data cache prefetching L1 data cache prefetching can be enabled or disabled on a per-thread basis – this is done by setting the HFD field in the User status register. When data cache prefetching is enabled, the Hexagon processor observes patterns of data cache misses, and attempts to predict future misses based on any recurring patterns of misses where the addresses are separated by a constant stride. If such patterns are found, the processor attempts to automatically prefetch future cache lines. Data cache prefetching is user-enabled at four levels of aggressiveness: - HFD = 00: No prefetching - HFD = 01: Prefetch up to four lines for misses originating from a load, with a post-update addressing mode that occurs within a hardware loop - HFD = 10: Prefetch up to four lines for misses originating from loads that occur within a hardware loop - HFD = 11: Prefetch up to eight lines for misses originating from loads # 5.11 Memory ordering Some devices might require synchronization of stores and loads when they are accessed. In this case, a set of processor instructions enable programmer control of the synchronization and ordering of memory accesses. Table 5-14 Memory ordering instructions | Syntax | Operation | |---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | isync | Instruction synchronize. | | | This instruction should be executed after any instruction cache maintenance operation. | | syncht | Synchronize transactions. | | | Perform "heavyweight" synchronization. Ensure that all previous program transactions (for example, memw_locked, cached and uncached load/store) complete before execution resumes past this instruction. | | | The syncht operation ensures that outstanding memory operations from threads are complete before the syncht instruction is committed. | | barrier | Set memory barrier. | | | Ensure proper ordering between the program accesses performed before the instruction and those performed after the instruction. | | | Accesses before the barrier are globally observable before any access occurring after the barrier can be observed. | | | The barrier operation ensures that outstanding memory operations from the thread executing the barrier are complete before the instruction is committed. | Data memory accesses and program memory accesses are treated separately and held in separate caches. Software should ensure coherency between data and program code if necessary. For example, with generated or self-modified code, the modified code is placed in the data cache and can be inconsistent with program cache. The software must explicitly force modified data cache lines to memory (either by using a write-through policy, or through explicit cache clean instructions). Use a barrier instruction to ensure completion of the stores. Finally, invalidate relevant instruction cache contents so the new instructions can be refetched. Here is the recommended code sequence to change and then execute an instruction: ``` ICINVA(R1) // Clear code from instruction cache ISYNC // Ensure that ICINVA is finished MEMW(R1)=R0 // write the new instruction DCCLEANINVA(R1) // force data out of data cache SYNCHT // Ensure that it's in memory JUMPR R1 // Can now execute code at R1 ``` NOTE: The memory-ordering instructions must not be grouped with other instructions in a packet, otherwise the behavior is undefined. This code sequence differs from the one used in previous processor versions. # 5.12 Atomic operations Atomic memory operations (load locked/store conditional) are supported to implement multithread synchronization. The Hexagon processor includes an LL/SC (load locked / store conditional) mechanism to provide the atomic read-modify-write operation that is necessary to implement synchronization primitives such as semaphores and mutexes. These primitives synchronize the execution of different software programs running concurrently on the Hexagon processor. They also provide atomic memory support between the Hexagon processor and external blocks. **Table 5-15 Atomic instructions** | Syntax | Description | |--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rd = memw_locked(Rs) | Load locked word. | | | Reserve lock on word at address Rs. | | memw_locked(Rs,Pd) = Rt | Store conditional word. | | | If no other atomic operation has been performed at the address (that is, atomicity is ensured), perform the store to the word at address Rs and return TRUE in Pd; otherwise return FALSE. | | | TRUE indicates that the LL and SC operations have performed atomically. | | Rdd = memd_locked(Rs) | Load locked doubleword. | | | Reserve lock on doubleword at address Rs. | | memd_locked(Rs,Pd) = Rtt | Store conditional doubleword. | | | If no other atomic operation has been performed at the address (that is, atomicity is ensured), perform the store to the doubleword at address Rs and return TRUE in Pd; otherwise return FALSE. | | | TRUE indicates that the LL and SC operations have performed atomically. | #### Here is the recommended code sequence to acquire a mutex: #### Here is the recommended code sequence to release a mutex: ``` // Assume mutex address is held in R0 // Assume R1 is scratch ``` $$R1 = #0$$ memw(R0) = R1 Atomic memX\_locked operations are supported for external accesses that use the AXI bus and support atomic operations. To perform load-locked operations with external memory, the operating system must define the memory page as uncacheable, otherwise the processor behavior is undefined. If a load locked operation is performed on an address that does not support atomic operations, the behavior is undefined. For atomic operations on cacheable memory, the page attributes must be set to cacheable and write-back, otherwise the behavior is undefined. Cacheable memory must be used when threads need to synchronize with each other. NOTE: External memX\_locked operations are not supported on the AHB. If they are performed on the AHB, the behavior is undefined. # 6 Conditional execution The Hexagon processor uses a conditional execution model based on compare instructions that set predicate bits in one of four 8-bit predicate registers (P0 through P3). These predicate bits can conditionally execute certain instructions. Conditional scalar operations examine only the least-significant bit in a predicate register, while conditional vector operations examine multiple bits in the register. Branch instructions are the main consumers of the predicate registers. # 6.1 Scalar predicates Scalar predicates are 8-bit values in conditional instructions to represent truth values: - 0xFF represents true - 0x00 represents false The Hexagon processor provides the four 8-bit predicate registers P0-P3 to hold scalar predicates (Section 2.3.5). These registers are assigned values by the predicate-generating instructions, and examined by the predicate-consuming instructions. # 6.1.1 Generating scalar predicates The following instructions generate scalar predicates: - Compare byte, halfword, word, doubleword - Compare single- and double-precision floating point - Classify floating-point value - Compare bitmask - Bounds check - TLB match - Store conditional Table 6-1 Scalar predicate-generating instructions | Syntax | Operation | |-------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------| | Pd = cmpb.eq(Rs, {Rt, #u8}) Pd = cmph.eq(Rs, {Rt, #s8}) | Equal (signed). | | Pd = [!]cmp.eq(Rs, {Rt, #s10}) Pd = cmp.eq(Rss, Rtt) Pd = cmp.eq(Rss, Rtt) Pd = sfcmp.eq(Rss, Rtt) Pd = dfcmp.eq(Rss, Rtt) | Compare register Rs to Rt or a signed immediate for equality. Assign Pd the resulting truth value. | | Pd = cmpb.gt(Rs, {Rt, #s8} | Greater than (signed). | | <pre>Pd = cmph.gt(Rs, {Rt, #s8}) Pd = [!]cmp.gt(Rs, {Rt, #s10}) Pd = cmp.gt(Rss, Rtt) Pd = sfcmp.gt(Rs, Rt) Pd = dfcmp.gt(Rss, Rtt)</pre> | Compare register Rs to Rt or a signed immediate for signed greater than. Assign Pd the resulting truth value. | | Pd = cmpb.gtu(Rs, {Rt, #u7}) | Greater than (unsigned). | | <pre>Pd = cmph.gtu(Rs, {Rt, #u7}) Pd = [!]cmp.gtu(Rs, {Rt, #u9}) Pd = cmp.gtu(Rss, Rtt)</pre> | Compare register Rs to Rt or an unsigned immediate for unsigned greater than. Assign Pd the resulting truth value. | | Pd = cmp.ge(Rs, #s8)<br>Pd = sfcmp.ge(Rs, Rt) | Greater than or equal (signed). | | Pd = dfcmp.ge(Rss,Rtt) | Compare register Rs to Rt or a signed immediate for signed greater than or equal. Assign Pd the resulting truth value. | | Pd = cmp.geu(Rs, #u8) | Greater than or equal (unsigned). | | | Compare register Rs to an unsigned immediate for unsigned greater than or equal. Assign Pd the resulting truth value. | | Pd = cmp.lt(Rs,Rt) | Less than (signed). | | | Compare register Rs to Rt for signed less than. Assign Pd the resulting truth value. | | Pd = cmp.ltu(Rs,Rt) | Less than (unsigned). | | | Compare register Rs to Rt for unsigned less than. Assign Pd the resulting truth value. | | Pd = sfcmp.uo(Rs,Rt)<br>Pd = dfcmp.uo(Rss,Rtt) | Unordered (signed). | | ru - dremp. uo (NSS, NCC) | Determine if register Rs or Rt is set to the value NaN.<br>Assign Pd the resulting truth value. | | Pd=sfclass(Rs,#u5)<br>Pd=dfclass(Rss,#u5) | Classify value (signed). | | ru-urcrass (NSS, #uS) | Determine if register Rs is set to any of the specified classes. Assign Pd the resulting truth value. | | Pd = [!]tstbit(Rs, {Rt, #u5}) | Test if bit set. | | | Rt or an unsigned immediate specifies a bit position. | | | Test if the bit in Rs that is specified by the bit position is set. Assign Pd the resulting truth value. | | Pd = [!]bitsclr(Rs, {Rt, #u6}) | Test if bits clear. | | | Rt or an unsigned immediate specifies a bitmask. | | | Test if the bits in Rs that are specified by the bitmask are all clear. Assign Pd the resulting truth value. | | Pd = [!]bitsset(Rs,Rt) | Test if bits set. | | | Rt specifies a bitmask. Test if the bits in Rs that are specified by the bitmask | | | are all set. Assign Pd the resulting truth value. | | Table 6-1 Scalar predicate-generating instructions ( | |------------------------------------------------------| |------------------------------------------------------| | memw_locked(Rs,Pd) = Rt | Store conditional. | |--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | memd_locked(Rs,Pd) = Rtt | If no other atomic operation has been performed at<br>the address (i.e., atomicity is ensured), perform the<br>store to the word at address Rs. Assign Pd the<br>resulting truth value. | | Pd = boundscheck(Rs,Rtt) | Bounds check. | | | Determine if Rs falls in the numeric range defined by Rtt. Assign Pd the resulting truth value. | | Pd = tlbmatch(Rss,Rt) | Determine if the TLB entry in Rss matches the ASID:PPN specified in Rt. Assign Pd the resulting truth value. | NOTE: One of the compare instructions (cmp.eq) includes a variant that stores a binary predicate value (0 or 1) in a general register not a predicate register. # 6.1.2 Consuming scalar predicates Certain instructions can conditionally execute based on the value of a scalar predicate (or alternatively specify a scalar predicate as an input to their operation). The conditional instructions that consume scalar predicates examine only the least-significant bit of the predicate value. In the simplest case, this bit value directly determines whether the instruction executes: - 1 indicates that the instruction executes - 0 indicates that the instruction does not execute If a conditional instruction includes the operator! in its predicate expression, the logical negation of the bit value determines whether the instruction executes. Conditional instructions are expressed in assembly language with the instruction prefix if (pred expr), where pred expr specifies the predicate expression. For example: The following instructions can be conditional instructions: - Jumps and calls (Section 8.3) - Many load and store instructions (Section 5.9) - Logical instructions (including AND/OR/XOR) - Shift halfword - 32-bit add/subtract by register or short immediate - Sign and zero extend - 32-bit register transfer and 64-bit combine word - Register transfer immediate - Deallocate frame and return When a conditional load or store executes and the predicate expression is false, the instruction is canceled (including any exceptions that might occur). For example, if a conditional load uses an address with a memory permission violation, and the predicate expression is false, the load does not execute and the exception is not raised. The mux instruction accepts a predicate as one of its basic operands: ``` Rd = mux(Ps,Rs,Rt) ``` The mux instruction elects either Rs or Rt based on the least significant bit in Ps. If the least-significant bit in Ps is a 1, Rd is set to Rs, otherwise it is set to Rt. ## 6.1.3 Auto-AND predicates If multiple compare instructions in a packet write to the same predicate register, the result is the logical AND of the individual compare results. For example: To perform the corresponding OR operation, the following instructions can compute the negation of an existing compare (using De Morgan's law): ``` Pd = !cmp.{eq,gt}(Rs, {#s10,Rt}) Pd = !cmp.gtu(Rs, {#u9,Rt}) Pd = !tstbit(Rs, {#u5,Rt}) Pd = !bitsclr(Rs, {#u6,Rt}) Pd = !bitsset(Rs,Rt) ``` Auto-AND predicates have the following restrictions: - If a packet contains endloopN, it cannot perform an auto-AND with predicate register P3. - If a packet contains a register transfer from a general register to a predicate register, no other instruction in the packet can write to the same predicate register. As a result, a register transfer to P3:0 or C5:4 cannot be grouped with any other predicate-writing instruction. - The instructions spNloop0, decbin, tlbmatch, memw\_locked, memd\_locked, add:carry, sub:carry, sfcmp, and dfcmp cannot be grouped with another instruction that sets the same predicate register. NOTE: A register transfer from a predicate register to a predicate register has the same auto-AND behavior as a compare instruction. # 6.1.4 Dot-new predicates The Hexagon processor can generate and use a scalar predicate in the same instruction packet (Section 3.3). This feature is expressed in assembly language by appending the suffix .new to the specified predicate register. For example: ``` if (P0.new) R3 = memw(R4) ``` To see how to use dot-new predicates, consider the following C statement and the corresponding assembly code that is generated from it by the compiler: #### C statement ``` if (R2 == 4) R3 = *R4; else R5 = 5; ``` #### **Assembly code** ``` { P0 = cmp.eq(R2,#4) if (P0.new) R3 = memw(R4) if (!P0.new) R5 = #5 } ``` In the assembly code, a scalar predicate is generated and then consumed twice within the same instruction packet. The following conditions apply to using dot-new predicates: - The predicate must be generated by an instruction in the same packet. The assembler normally enforces this restriction, but if the processor executes a packet that violates this restriction, the execution result is undefined. - A single packet can contain both the dot-new and normal forms of predicates. The normal form examines the old value in the predicate register, rather than the newly-generated value. For example: # 6.1.5 Dependency constraints Two instructions in an instruction packet should not write to the same destination register (Section 3.3.5). An exception to this rule is when the two instructions are conditional, and only one of them has the predicate expression value true when the packet is executed. For example, the following packet is valid as long as P2 and P3 never both evaluate to true when the packet is executed: Because predicate values change at runtime, the programmer must ensure that such packets are always valid during program execution. If they are invalid, the processor takes the following actions: - When writing to general registers, an error exception is raised. - When writing to predicate or control registers, the result is undefined. # 6.2 Vector predicates The predicate registers are also used for conditional vector operations. Unlike scalar predicates, vector predicates contain multiple truth values that are generated by vector predicate-generating operations. For example, a vector compare instruction compares each element of a vector and assigns the compare results to a predicate register. Each bit in the predicate vector contains a truth value indicating the outcome of a separate compare performed by the vector instruction. The vector mux instruction uses a vector predicate to selectively merge elements from two separate vectors into a single destination vector. This operation is useful for enabling the vectorization of loops with control flow (for example, branches). The vector instructions that use predicates are described in the following sections. # 6.2.1 Vector compare A vector compare instruction inputs two 64-bit vectors, performs separate compares for each pair of vector elements, and generates a predicate value that contains a bit vector of truth values. In Figure 6-1 two 64-bit vectors of bytes (contained in Rss and Rtt) are being compared. The result is assigned as a vector predicate to the destination register Pd. In the example vector predicate shown in Figure 6-1, every other compare result in the predicate is true (that is, 1). Figure 6-1 Vector byte compare Figure 6-2 shows comparison of two 64-bit vectors of halfwords. The result is assigned as a vector predicate to the destination register Pd. Because a vector halfword compare yields only four truth values, each truth value is encoded as two bits in the generated vector predicate. Figure 6-2 Vector halfword compare generating a vector predicate #### 6.2.2 Vector mux instruction A vector mux instruction conditionally selects the elements from two vectors. The instruction takes as input two source vectors and a predicate register. For each byte in the vector, the corresponding bit in the predicate register is used to choose from one of the two input vectors. The combined result is written to the destination register. Figure 6-3 Vector mux instruction Table 6-2 Vector mux instruction | Syntax | Operation | |------------------------|-------------------------------| | Rdd = vmux(Ps,Rss,Rtt) | Select bytes from Rss and Rtt | Changing the order of the source operands in a mux instruction enables formation of both senses of the result. For example: ``` R1:0 = vmux(P0,R3:2,R5:4) // Choose bytes from R3:2 if true R1:0 = vmux(P0,R5:4,R3:2) // Choose bytes from R3:2 if false ``` NOTE: By replicating the predicate bits generated by word or halfword compares, the vector mux instruction can select words or halfwords. # 6.2.3 Using vector conditionals Vector conditional support is used to vectorize loops with conditional statements. Consider the following C statement: ``` for (i=0; i<8; i++) { if (A[i]) { B[i] = C[i]; } }</pre> ``` Assuming arrays of bytes, this code can be vectorized as follows: # 6.3 Predicate operations The Hexagon processor provides a set of operations to manipulate and move predicate registers. Table 6-3 Predicate register instructions | Syntax | Operation | |----------------------------|------------------------------------------------------| | Pd = Ps | Transfer predicate Ps to Pd | | Pd = Rs | Transfer register Rs to predicate Pd | | Rd = Ps | Transfer predicate Ps to register Rd | | Pd = and(Ps,[!]Pt) | Set Pd to bitwise AND of Ps and [NOT] Pt | | Pd = or(Ps,[!]Pt) | Set Pd to bitwise OR of Ps and [NOT] Pt | | Pd = and(Ps, and(Pt,[!]Pu) | Set Pd to AND of Ps and (AND of Pt and [NOT] Pu) | | Pd = and(Ps, or(Pt,[!]Pu) | Set Pd to AND of Ps and (OR of Pt and [NOT] Pu) | | Pd = or(Ps, and(Pt,[!]Pu) | Set Pd to OR of Ps and (AND of Pt and [NOT] Pu) | | Pd = or(Ps, or(Pt,[!]Pu) | Set Pd to OR of Ps and (OR of Pt and [NOT] Pu) | | Pd = not(Ps) | Set Pd to bitwise inversion of Ps | | Pd = xor(Ps,Pt) | Set Pd to bitwise exclusive OR of Ps and Pt | | Pd = any8(Ps) | Set Pd to 0xFF if any bit in Ps is 1, 0x00 otherwise | | Pd = all8(Ps) | Set Pd to 0x00 if any bit in Ps is 0, 0xFF otherwise | NOTE: These instructions belong to instruction class CR. Predicate registers can transfer to and from the general registers either individually or as register quadruples (Section 2.3.5). # 7 Software stack The Hexagon processor includes dedicated registers and instructions to support a call stack for subroutine execution. The stack structure follows standard C conventions. # 7.1 Stack structure The stack is defined to grow from high addresses to low addresses. The stack pointer register SP points to the data element that is on the top of the stack. # Higher address Procedure local data on stack Saved LR Saved LR Saved FP FP register Procedure local data on stack Saved FP Procedure local data on stack SP register Unallocated stack Stack in memory Figure 7-1 Stack structure NOTE: The Hexagon processor supports three dedicated stack instructions: allocframe, deallocframe, and dealloc return (Section 7.5). The SP address must always remain 8-byte aligned for the stack instructions to work properly. #### 7.2 Stack frames The stack stores stack frames, which are data structures that store state information on the active subroutines in a program (that is, those that were called but have not yet returned). Each stack frame corresponds to an active subroutine in the program. A stack frame contains the following elements: - The local variables and data used by the subroutine - The return address for the subroutine call (pushed from the link register LR) - The address of the previous stack frame allocated on the stack (pushed from the frame pointer register FP) The frame pointer register FP always contains the address of the saved frame pointer in the current stack frame. It facilitates debugging by enabling a debugger to examine the stack in memory and easily determine the call sequence, function parameters, and so on. NOTE: For leaf functions, it is often unnecessary to save FP and LR. In this case FP contains the frame pointer of the calling function, not the current function. # 7.3 Stack protection The Hexagon V71 processor supports the following features to protect the integrity of the software stack. # 7.3.1 Stack bounds checking Stack bounds checking prevents a stack frame from being allocated past the lower boundary of the software stack. FRAMELIMIT is a 32-bit control register that stores a memory address that specifies the lower bound of the memory area reserved for the software stack. When the allocframe instruction allocates a new stack frame, it compares the new stack pointer value in SP with the stack bound value in FRAMELIMIT. If SP is less than FRAMELIMIT, the Hexagon processor raises exception 0x27 (Section 8.10). NOTE: Stack bounds checking is performed when the processor is in User and Guest modes, but not in Monitor mode. # 7.3.2 Stack smashing protection Stack smashing is a technique malicious code uses to gain control over an executing program. Malicious code causes buffer overflows in the local data of a procedure, with the goal of modifying the subroutine return address stored in a stack frame so it points to the malicious code instead of the intended return code. Stack smashing protection prevents this from happening by scrambling the subroutine return address when a new stack frame is allocated, and then unscrambling the return address when the frame is deallocated. Because the value in FRAMEKEY changes regularly and varies from device to device, it becomes difficult to precalculate a malicious return address. FRAMEKEY is a 32-bit control register that scrambles return addresses stored on the stack: - In the allocframe instruction, the 32-bit return address in link register LR is XOR-scrambled with the value in FRAMEKEY before it is stored in the new stack frame. - In deallocframe and dealloc\_return, the return address loaded from the stack frame is unscrambled with the value in FRAMEKEY before it is stored in LR. After a processor reset, the default value of FRAMEKEY is 0. If this value is not changed, stack smashing protection is effectively disabled. NOTE: Each hardware thread has its own instance of the FRAMEKEY register. # 7.4 Stack registers Table 7-1 Stack registers | Register | Name | Description | Alias | |------------|----------------------|----------------------------------------------|-------| | SP | Stack pointer | Points to topmost stack element in memory | R29 | | FP | Frame pointer | Points to previous stack frame on stack | R30 | | LR | Link register | Contains return address of subroutine call | R31 | | FRAMELIMIT | Frame limit register | Contains lowest address of stack area | C16 | | FRAMEKEY | Frame key register | Contains scrambling key for return addresses | C17 | NOTE: SP, FP, and LR are aliases of three general registers (Section 2-1). These general registers are conventionally dedicated for use as stack registers. # 7.5 Stack instructions The Hexagon processor includes the instructions allocframe and deallocframe to efficiently allocate and deallocate stack frames on the call stack. **Table 7-2 Stack instructions** | Syntax | Operation | |--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | allocframe(#u11:3) | Allocate stack frame is used after a call. It first XORs the values in LR and FRAMEKEY, and pushes the resulting scrambled return address and FP to the top of stack. | | | Next, it subtracts an unsigned immediate from SP to allocate room for local variables. If the resulting SP is less than FRAMELIMIT, the processor raises exception 0x27. Otherwise, SP is set to the new value, and FP is set to the address of the old frame pointer on the stack. | | | The immediate operand as expressed in assembly syntax specifies the byte offset. This value must be 8-byte aligned. The valid range is from 0 to 16 kB. | | deallocframe | Deallocate stack frame | | | Use this instruction before a return to free a stack frame. It first loads the saved FP and LR values from the address at FP, and XORs the restored LR with the value in FRAMEKEY to unscramble the return address. SP is then pointed back to the previous frame. | | dealloc_return | Deallocate frame and return is a subroutine return with stack frame deallocate. | | | Perform deallocframe operation, and then perform subroutine return (Section 8.3.3) to the target address loaded from LR by deallocframe. | NOTE: allocframe and deallocframe load and store the LR and FP registers on the stack as a single aligned 64-bit register pair (that is, LR:FP). # **8** Program flow The Hexagon processor supports the following program flow facilities: - Conditional instructions - Hardware loops - Software branches - Pauses - Exceptions # 8.1 Conditional instructions Many Hexagon processor instructions can conditionally execute. For example: ``` if (P0) R0 = memw(R2) // Conditionally load word if P0 if (!P1) jump label // Conditionally jump if not P1 ``` The following instructions can be specified as conditional: - Jumps and calls - Many load and store instructions - Logical instructions (including AND/OR/XOR) - Shift halfword - 32-bit add/subtract by register or short immediate - Sign and zero extend - 32-bit register transfer and 64-bit combine word - Register transfer immediate - Deallocate frame and return For more information, see Section 5.9 and Chapter 6. # 8.2 Hardware loops The Hexagon processor includes hardware loop instructions that perform loop branches with zero overhead. For example: The loop instructions support nestable loops, with few restrictions on their use. Two sets of hardware loop instructions, - loop0 and loop1 -, nest hardware loops one level deep. For example: ``` // Sum the rows of a 100x200 matrix. loop1(outer_start,#100) outer_start: R0 = #0 loop0(inner_start,#200) inner_start: R3 = memw(R1++#4) { R0 = add(R0,R3) }:endloop0 { memw(R2++#4) = R0 }:endloop1 ``` Use the hardware loop instructions as follows: - Use loop0 for non-nested loops. - Use loop0 for the inner loop and loop1 for the outer loop for nested loops. NOTE: If a program must create loops nested more than one level deep, the two innermost loops can be implemented as hardware loops, with the remaining outer loops implemented as software branches. Each hardware loop is associated with a pair of dedicated loop registers: - The loop start address register SAn is set to the address of the first instruction in the loop (which is typically expressed in assembly language as a label). - The loop count register LCn is set to a 32-bit unsigned value, which specifies the number of loop iterations to perform. When the PC reaches the end of the loop, LCn is examined to determine whether the loop should repeat or exit. The hardware loop setup instruction sets both of these registers at the same time — typically there is no need to set them individually. However, because the loop registers completely specify the hardware loop state, saving and restoring the registers (either automatically by a processor interrupt or manually by the programmer) enables a suspended hardware loop to resume normally after its loop registers reload with the saved values. The Hexagon processor provides two sets of loop registers for the two hardware loops: - SA0 and LC0 are used by loop0 - SA1 and LC1 are used by loop1 Table 8-1 Hardware loop instructions | Syntax | Description | |------------------|------------------------------------------------------------------------------| | loopN(start, Rs) | Hardware loop with register loop count. | | | Set registers SAn and LCn for hardware loop N: | | | <ul> <li>SAn is assigned the specified start address of the loop.</li> </ul> | | | ■ LCn is assigned the value of general register Rs. | | | The loop start operand is encoded as a PC-relative immediate value. | | loopN(start, | Hardware loop with immediate loop count. | | #count) | Set registers SAn and LCn for hardware loop N: | | | <ul> <li>SAn is assigned the specified start address of the loop.</li> </ul> | | | ■ LCn is assigned the specified immediate value (0-1023). | | | The loop start operand is encoded as a PC-relative immediate value. | | :endloopN | Hardware loop end instruction. | | | Performs the following operation: | | | if (LCn > 1) {PC = SAn; LCn = LCn-1} | | | This instruction appears in assembly as a suffix appended to the last | | | packet in the loop. It is encoded in the last packet. | | SAn = Rs | Set loop start address to general register Rs | | LCn = Rs | Set loop count to general register Rs | NOTE: The loop instructions are assigned to instruction class CR. # 8.2.1 Loop setup To set up a hardware loop, the loop registers SAn and LCn must be set to the proper values. This can be done in two ways: - A loopN instruction - Register transfers to SAn and LCn The loopN instruction performs the work of setting SAn and LCn. For example: In this example, the hardware loop (consisting of a single multiply instruction) executes three times. The <code>loop0</code> instruction sets register SAO to the address value of label start, and LCO to 3. Loop counts are limited to the range 0 to 1023 when they are expressed as immediate values in loopN. If the desired loop count exceeds this range, it must be specified as a register value. For example: If a loopN instruction is located too far from its loop start address, the PC-relative offset value that specifies the start address can exceed the maximum range of the instruction's start-address operand. If this occurs, either move the loopN instruction closer to the loop start, or specify the loop start address as a 32-bit constant (Section 10.9). For example, using 32-bit constants: ``` R1 = #20000; loop0(##start,R1) // LC0=20000, SA0=&start ``` # 8.2.2 Loop end The loop end instruction indicates the last packet in a hardware loop. It is expressed in assembly language by appending the packet with the symbol ":endloopN", where N specifies the hardware loop (0 or 1). For example: ``` loop0(start,#3) start: { R0 = mpyi(R0,R0) } :endloop0 // Last packet in loop ``` The last instruction in the loop must always be expressed in assembly language as a packet (using curly braces), even if it is the only instruction in the packet. Nested hardware loops can specify the same instruction as the end of both the inner and outer loops. For example: Though <code>endloopN</code> behaves like a regular instruction (by implementing the loop test and branch), it does not execute in any instruction slot, and does not count as an instruction in the packet. Therefore a single instruction packet that is marked as a loop end can perform up to six operations: - Four regular instructions (the normal limit for an instruction packet) - The endloop0 test and branch - The endloop1 test and branch NOTE: The endloopN instruction is encoded in the instruction packet (Section 10.6). # 8.2.3 Loop execution After a hardware loop is set up, the loop body always executes at least once regardless of the specified loop count (because the loop count is not examined until the last instruction in the loop). Therefore, if a loop must optionally execute zero times, it must be preceded with an explicit conditional branch. For example: ``` loop0(start,R1) P0 = cmp.eq(R1,#0) if (P0) jump skip start: { R0 = mpyi(R0,R0) } :endloop0 skip: ``` In this example, a hardware loop is set up with the loop count in R1, but if the value in R1 is zero a software branch skips over the loop body. After the loop end instruction of a hardware loop executes, the Hexagon processor examines the value in the corresponding loop count register: - If the value is greater than 1, the processor decrements the loop count register and performs a zero-cycle branch to the loop start address. - If the value is less than or equal to 1, the processor resumes program execution at the instruction immediately following the loop end instruction. NOTE: Because nested hardware loops can share the same loop end instruction, the processor can examine both loop count registers in a single operation. # 8.2.4 Pipelined hardware loops Software pipelined loops are common for VLIW architectures such as the Hexagon processor. They offer increased code performance in loops by overlapping multiple loop iterations. Pipeline hazards are resolved by the hardware: instruction scheduling is not constrained by pipeline restrictions. A software pipeline has three sections: - A prologue in which the loop is primed - A kernel (or steady-state) portion An epilogue that drains the pipeline #### Table 8-2 Software pipelined loop ``` int foo(int *A, int *result) int i; for (i=0; i<100; i++) { result[i] = A[i] *A[i]; } } foo: R3 = R1 loop0(.kernel, #98) // Decrease loop count by 2 R1 = memw(R0++#4) // First prologue stage R1 = memw(R0++#4) // Second prologue stage R2 = mpyi(R1,R1) .falign .kernel: // Kernel R1 = memw(R0++#4) R2 = mpyi(R1,R1) memw(R3++#4) = R2 }:endloop0 R2 = mpyi(R1,R1) // First epilogue stage memw(R3++#4) = R2 memw(R3++#4) = R2 // Second epilogue stage jumpr lr ``` In Table 8-2 the kernel section of the pipelined loop performs three iterations of the loop in parallel: - The load for iteration N+2 - The multiply for iteration N+1 - The store for iteration N A drawback to software pipelining is the extra code necessary for the prologue and epilogue sections of a pipelined loop. The Hexagon processor provides the $\mathfrak{spNloop0}$ instruction to address this issue, where the "N" in the instruction name indicates a digit in the range 1 to 3. For example: ``` P3 = sp2loop0(start, #10) // Set up pipelined loop ``` The spNloop0 instruction is a variant of the loop0 instruction: it sets up a normal hardware loop using SAO and LCO, but also performs the following additional operations: • When the spNloop0 instruction executes, it assigns the truth value false to the predicate register P3. ■ After the associated loop executes N times, P3 is automatically set to true. This feature is known as automatic predicate control. It enables the store instructions in the kernel section of a pipelined loop to conditionally execute by P3 and thus – because of the way ${\tt spNloop0}$ controls P3 – not execute during the pipeline warm-up. This can reduce the code size of software pipelined loops by eliminating the need for prologue code. The spNloop0 instruction cannot eliminate the epilogue code from a pipelined loop; however, in some cases it is possible to do this through the use of programming techniques. Typically, the issue affecting the removal of epilogue code is *load safety*. If the kernel section of a pipelined loop can safely access past the end of its arrays – either because it is known as safe, or because the arrays are padded at the end – epilogue code is unnecessary. However, if load safety cannot be ensured, explicit epilogue code is required to drain the software pipeline. Table 8-3 shows how spNloop0 and load safety simplify the code shown in Table 8-2. Table 8-3 Software pipelined loop (using spNloop0) ``` int foo(int *A, int *result) { int i; for (i=0; i<100; i++) { result[i] = A[i] *A[i]; } foo: { // load safety assumed P3 = sp2loop0(.kernel, #102) // Set up pipelined loop R3 = R1 .falign .kernel: // Kernel R1 = memw(R0++#4) R2 = mpyi(R1,R1) if (P3) memw(R3++#4) = R2 }:endloop0 jumpr lr ``` NOTE: The count value that spNloop0 uses to control the P3 setting is stored in the user status register USR.LPCFG. # 8.2.5 Loop restrictions Hardware loops have the following restrictions: - The loop setup packet in loopN or spNloop0 (Section 8.2.4) cannot contain a speculative indirect jump, new-value compare jump, or dealloc return. - The last packet in a hardware loop cannot contain any program flow instructions (including jumps or calls). - The loop end packet in loop0 cannot contain any instruction that changes SAO or LCO. Similarly, the loop end packet in loop1 cannot contain any instruction that changes SA1 or LC1. - The loop end packet in spNloop0 cannot contain any instruction that changes P3. NOTE: SA1 and LC1 can be changed at the end of loop0, while SA0 and LC0 can be changed at the end of loop1. #### 8.3 Software branches Unlike hardware loops, software branches use an explicit instruction to perform a branch operation. Software branches include jumps, calls, and returns. Several types of jumps are supported: - Speculative jumps - Compare jumps - Register transfer jumps - Dual jumps The target address for branch instructions can be specified as register indirect or PC-relative offsets. PC-relative offsets are normally less than 32 bits, but can be specified as 32 bits by using the appropriate syntax in the target operand (Section 8.3.4). Branch instructions are unconditional or conditional, with the execution of conditional instructions controlled by a predicate expression. Explicit compare instructions generate a predicate bit, which is then tested by conditional branch instructions. For example: ``` P1 = cmp.eq(R2, R3) if (P1) jump end ``` Jumps and subroutine calls are conditional or unconditional, and support both PC-relative and register indirect addressing modes. For example: ``` jump end jumpr R1 call function callr R2 ``` The subroutine call instructions store the return address in register R31. Subroutine returns are performed using a jump indirect instruction through this register. For example: ``` jumpr R31 // Subroutine return ``` Table 8-4 Software branch instructions | Syntax | Operation | |------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| | <pre>[if (pred_expr)] jump label [if (pred_expr)] jumpr Rs</pre> | Branch to address specified by register Rs or PC-relative offset. Can conditionally execute. | | <pre>[if (pred_expr)] call label [if (pred_expr)] callr Rs</pre> | Branch to address specified by register Rs or PC-relative offset. Store subroutine return address in link register LR. Can conditionally execute. | | <pre>[if (pred_expr)] jumpr LR</pre> | Branch to subroutine return address contained in link register LR. Can conditionally execute. | # 8.3.1 Jumps Jump instructions change the program flow to a target address that can be specified by either a register or a PC-relative immediate value. Jump instructions can be conditional based on the value of a predicate expression. Table 8-5 lists the jump instructions. Table 8-5 Jump instructions | Syntax | Operation | |---------------------------------|--------------------------------------------------------------------------------------------------| | jump label | Direct jump. | | | Branch to address specified by label.<br>Label is encoded as PC-relative signed immediate value. | | jumpr Rs | Indirect jump. | | | Branch to address contained in general register Rs. | | if ([!]Ps) jump | Conditional jump. | | label<br>if ([!]Ps) jumpr<br>Rs | Perform jump if predicate expression evaluates to true. | NOTE: Conditional jumps can be specified as speculative (Section 8.4). #### 8.3.2 Calls Call instructions jump to subroutines. The instruction performs a jump to the target address and also stores the return address in the link register LR. The forms of call are functionally similar to jump instructions and include both PC-relative and register indirect in both unconditional and conditional forms. Table 8-6 lists the call instructions. Table 8-6 Call instructions | Syntax | Operation | |-----------------|----------------------------------------------------------------------------------------------------------------------------------------| | call label | Direct subroutine call. | | | Branch to address specified by label, and store return address in register LR. Label is encoded as PC-relative signed immediate value. | | callr Rs | Indirect subroutine call. | | | Branch to address contained in general register Rs, and store return address in register LR. | | if ([!]Ps) call | Conditional call. | | label | If predicate expression evaluates to true, perform subroutine call to | | if ([!]Ps) | specified target address. | | callr Rs | | #### 8.3.3 Returns Return instructions return from a subroutine. The instruction performs an indirect jump to the subroutine return address stored in link register LR. Returns are implemented as jump register indirect instructions, and support both unconditional and conditional forms. Table 8-7 lists the return instructions. Table 8-7 Return instructions | Syntax | Operation | | |---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | jumpr LR | Subroutine return. | | | | Branch to subroutine return address contained in link register LR. | | | if ([!]Ps) jumpr LR | Conditional subroutine return. | | | | If predicate expression evaluates to true, perform subroutine return to specified target address. | | | dealloc_return | Subroutine return with stack frame deallocate. | | | | Perform deallocframe operation (Section 7.5) and then perform subroutine return to the target address loaded by deallocframe from the link register. | | | if ([!]Ps) dealloc_return | Conditional subroutine return with stack frame deallocate. | | | | If predicate expression evaluates to true, perform deallocframe and subroutine return to the target address loaded by deallocframe from the link register. | | NOTE: The link register LR is an alias of general register R31. Therefore subroutine returns can be performed with the instruction jumpr R31. The conditional subroutine returns (including dealloc\_return) can be specified as speculative (Section 8.4). #### 8.3.4 Extended branches When a jump or call instruction specifies a PC-relative offset as the branch target, the offset value is normally encoded in significantly less than 32 bits. This can limit the ability for programs to specify "long" branches, which span a large range of the processor's memory address space. To support long branches, the <code>jump</code> and <code>call</code> instructions have special versions that encode a full 32-bit value as the PC-relative offset. NOTE: Such instructions use an extra word to store the 32-bit offset (Section 10.9). The size of a PC-relative branch offset is expressed in assembly language by optionally prefixing the target label with the symbol "##" or "#": - "##" specifies that the assembler must use a 32-bit offset - "#" specifies that the assembler must not use a 32-bit offset - No "#" specifies that the assembler use a 32-bit offset only if necessary #### For example: ``` jump ##label // 32-bit offset call #label // Non 32-bit offset jump label // Offset size determined by assembler ``` ## 8.3.5 Branches to and from packets Instruction packets are atomic: even if they contain multiple instructions, they are referenced only by the address of the first instruction in the packet. Therefore, branches to a packet can target only the first instruction of the packet. Packets can contain up to two branches (Section 8.7). The branch destination can target the current packet or the beginning of another packet. A branch does not interrupt the execution of the current packet: all the instructions in the packet execute, even if they appear in the assembly source after the branch instruction. If a packet is at the end of a hardware loop, it cannot contain a branch instruction. # 8.4 Speculative jumps Conditional instructions normally depend on predicates that are generated in a previous instruction packet. However, dot-new predicates (Section 6.1.4) enable conditional instructions to use a predicate generated in the same packet that contains the conditional instruction. When dot-new predicates are used with a conditional jump, the resulting instruction is called a speculative jump. For example: ``` P0 = cmp.eq(R9,#16) // Single-packet compare-and-jump IF (P0.new) jumpr:t R11 // ... enabled by use of P0.new ``` Speculative jumps require the programmer to specify a direction hint in the jump instruction, indicating whether the conditional jump is expected. The hint initializes the dynamic branch predictor of the Hexagon processor. When the predictor is wrong, the speculative jump instruction takes two cycles to execute instead of one (due to a pipeline stall). Hints can improve program performance by indicating how speculative jumps are expected to execute over the course of a program: the more often the specified hint indicates how the instruction actually executes, the better the performance. Hints are expressed in assembly language by appending the suffix ":t" or ":nt" to the jump instruction symbol. For example: - jump:t The jump instruction is most often taken - jump:nt The jump instruction is most often not taken In addition to dot-new predicates, speculative jumps also accept conditional arithmetic expressions (=0, !=0, >=0, <=0) involving the general register Rs. Table 8-8 Speculative jump instructions | Syntax | Operation | |-------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | <pre>if ([!]Ps.new) jump:t label if ([!]Ps.new) jump:nt label</pre> | Speculative direct jump. If predicate expression evaluates to true, jump to address specified by label. | | <pre>if ([!]Ps.new) jumpr:t Rs if ([!]Ps.new) jumpr:nt Rs</pre> | Speculative indirect jump. If predicate expression evaluates to true, jump to address in register Rs. | | <pre>if (Rs == #0) jump:t label if (Rs == #0) jump:nt label</pre> | Speculative direct jump. If predicate Rs = 0 is true, jump to address specified by label. | | <pre>if (Rs != #0) jump:t label if (Rs != #0) jump:nt label</pre> | Speculative direct jump. If predicate Rs != 0 is true, jump to address specified by label. | | <pre>if (Rs &gt;= #0) jump:t label if (Rs &gt;= #0) jump:nt label</pre> | Speculative direct jump. If predicate Rs >= 0 is true, jump to address specified by label. | | <pre>if (Rs &lt;= #0) jump:t label if (Rs &lt;= #0) jump:nt label</pre> | Speculative direct jump. If predicate Rs <= 0 is true, jump to address specified by label. | NOTE: The hints: t and: nt interact with the predicate value to determine the instruction cycle count. Speculative indirect jumps are not supported with register Rs predicates. # 8.5 Compare jumps To reduce code size, the Hexagon processor supports a compound instruction that combines a compare with a speculative jump in a single 32-bit instruction. #### For example: The register operands in a compare jump are limited to R0 through R7 or R16 through R23 (Table 10-3). The compare and jump instructions in a compare jump are limited to the instructions listed in Table 8-9. The compare can use predicate P0 or P1, while the jump must specify the same predicate that is set in the compare. A compare jump instruction is expressed in assembly source as two independent compare and jump instructions in a packet. The assembler translates the two instructions into a single compound instruction. **Table 8-9 Compare jump instructions** | Compare instruction | Jump instruction | |------------------------|---------------------------------------| | Pd = cmp.eq (Rs, Rt) | IF (Pd.new) jump:t label | | Pd = cmp.gt (Rs, Rt) | <pre>IF (Pd.new) jump:nt label</pre> | | Pd = cmp.gtu (Rs, Rt) | <pre>IF (!Pd.new) jump:t label</pre> | | Pd = cmp.eq (Rs, #U5) | <pre>IF (!Pd.new) jump:nt label</pre> | | Pd = cmp.gt (Rs, #U5) | | | Pd = cmp.gtu (Rs, #U5) | | | Pd = cmp.eq (Rs, #-1) | | | Pd = cmp.gt (Rs, #-1) | | | Pd = tstbit (Rs, #0) | | # 8.5.1 New-value compare jumps A compare jump instruction can access a register that is assigned a new value in the same instruction packet (Section 3.3). This feature is expressed in assembly language by the following changes: - Appending the suffix ".new" to the new-value register in the compare - Rewrite the compare jump so its constituent compare and jump operations appear as a single conditional instruction #### For example: ``` // Load-compare-and-jump packet enabled by new-value compare jump { R0 = memw(R2+#8) if (cmp.eq(R0.new, #0)) jump:nt target } ``` New-value compare jump instructions have the following restrictions: - They are limited to the instruction forms listed in Table 8-10. - They cannot be combined with another jump instruction in the same packet. - If an instruction produces a 64-bit result or performs a floating-point operation (Section 4.1.2), its result registers cannot be used as the new-value register. - If an instruction uses auto-increment or absolute-set addressing mode (Section 5.8), its address register cannot be used as the new-value register. - If the instruction that sets a new-value register is conditional (Section 6.1.2), it must always execute. If the specified jump direction hint is wrong (Section 8.4), a new-value compare jump takes three cycles to execute instead of one. While this penalty is one cycle longer than in a regular speculative jump, the overall performance is still better than using a regular speculative jump (which must execute an extra packet in all cases). NOTE: New-value compare jump instructions are assigned to instruction class NV, which execute only in Slot 0. The instruction that assigns the new value must execute in Slot 1, 2, or 3. Table 8-10 New-value compare jump instructions ``` if ([!]cmp.eq (Rs.new, Rt)) jump:[hint] label if ([!]cmp.gt (Rs.new, Rt)) jump:[hint] if ([!]cmp.gtu (Rs.new, Rt)) jump:[hint] label if ([!]cmp.gt (Rs, Rt.new)) jump:[hint] if ([!]cmp.gtu (Rs, Rt.new)) jump:[hint] label if ([!]cmp.eq (Rs.new, #u5)) jump:[hint] label if ([!]cmp.qt (Rs.new, #u5)) jump:[hint] if ([!]cmp.gtu (Rs.new ,#u5)) jump:[hint] label if ([!]cmp.eq (Rs.new, #-1)) jump:[hint] if ([!]cmp.qt (Rs.new, #-1)) jump:[hint] label if ([!]tstbit (Rs.new, #0)) jump:[hint] label ``` ## 8.6 Register transfer jumps To reduce code size, the Hexagon processor supports a compound instruction that combines a register transfer with an unconditional jump in a single 32-bit instruction. For example: ``` jump target // Jump to label "target" R1 = R2 // Assign contents of reg R2 to R1 } ``` The source and target register operands in the register transfer are limited to R0 through R7 or R16 through R23 (Table 2-4). The target address in the jump is a scaled 9-bit PC-relative address value (as opposed to the 22-bit value in the regular unconditional jump instruction). A register transfer jump instruction is expressed in assembly source as two independent instructions in a packet. The assembler translates the instructions into a single compound instruction. Table 8-11 Register transfer jump instructions | Syntax | Operation | | |-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | jump label;<br>Rd=Rs | Register transfer jump. | | | | Perform register transfer and branch to address specified by label. Label is encoded as PC-relative 9-bit signed immediate value. | | | jump label;<br>Rd=#u6 | Register transfer immediate jump. | | | | Perform register transfer (of 6-bit unsigned immediate value) and branch to address specified by label. Label is encoded as PC-relative 9-bit signed immediate value. | | # 8.7 Dual jumps Two software branch instructions (referred to here as "jumps") can appear in the same instruction packet, under the conditions listed in Table 8-12. The first jump is defined as the jump instruction at the lower address, and the second jump as the jump instruction at the higher address. Unlike most packetized operations, dual jumps do not execute in parallel (Section 3.3.1). Instead, the two jumps are processed in a well-defined order in a packet: - 1. Evaluate the predicate in the first jump. - 2. If the first jump is taken, ignore the second jump. - 3. If the first jump is not taken, perform the second jump. Table 8-12 Dual jump instructions | Instruction | Description | First jump in packet? | Second jump in packet? | |---------------------------------------------------------------------------------------------------------------|----------------------------------------------|-----------------------|------------------------| | jump | Direct jump | No | Yes | | if ([!]Ps[.new]) jump | Conditional jump | Yes | Yes | | call<br>if ([!]Ps) call | Direct calls | No | Yes | | Pd=cmp.xx; if ([!]Pd.new) jump | Compare jump | Yes | Yes | | <pre>if ([!]cmp.xx(Rs.new, Rt)) jump</pre> | New-value compare jump | No | No | | <pre>jumpr if ([!]Ps[.new]) jumpr callr if ([!]Ps) callr dealloc_return if ([!]Ps[.new]) dealloc_return</pre> | Indirect jumps Indirect calls dealloc_return | No | No | | endloopN | Hardware loop end | No | No | NOTE: If a call is ignored in a dual jump, the link register LR does not change. # 8.8 Hint indirect jump target Because it obtains the jump target address from a register, the jumpr instruction (Section 8.3.1) normally causes the processor to stall for one cycle. To avoid the stall penalty caused by a jumpr instruction, the Hexagon processor supports the jump hint instruction hintjr, which can be specified before the jumpr instruction. The hintjr instruction indicates that the program is about to execute a jumpr to the address contained in the specified register. Table 8-13 Jump hint instruction | Syntax | Operation | |--------|---------------------------------------------------------------------------------------------------------------| | | Informs the processor that the $\mathtt{jumpr}\left(\mathtt{Rs}\right)$ instruction is about to be performed. | NOTE: To prevent a stall, the hintjr instruction must execute at least two packets before the corresponding jumpr instruction. The hintjr instruction is not needed for jumpr instructions used as returns (Section 8.3.3), because in this case the Hexagon processor automatically predicts the jump targets based on the most recent nested call instructions. ## 8.9 Pauses Pauses suspend the execution of a program for a period of time, and put it into low-power mode. The program remains suspended for the duration specified in the instruction. The pause instruction accepts an unsigned 8-bit immediate operand that specifies the pause duration in terms of cycles. The maximum possible duration is 263 cycles (255+8). Hexagon processor interrupts cause a program to exit the paused state before its specified duration has elapsed. The pause instruction is useful for implementing user-level low-power synchronization operations (such as spin locks). Table 8-14 Pause instruction | Syntax | Operation | |------------|-----------------------------------------------------------------| | pause(#u8) | Suspend program in low-power mode for specified cycle duration. | # 8.10 Exceptions Exceptions are internally-generated disruptions to the program flow. The Hexagon processor OS handles fatal exceptions by terminating the execution of the application system. The user is responsible for fixing the problem and recompiling their applications. The error messages generated by exceptions include the following information to assist in locating the problem: - Cause code Hexadecimal value indicating the type of exception that occurred - User IP PC value indicating the instruction executed when exception occurred - Bad VA Virtual address indicating the data accessed when exception occurred NOTE: The cause code, user IP, and Bad VA values are stored in the Hexagon processor system control registers SSR[7:0], ELR, and BADVA respectively. If multiple exceptions occur simultaneously, the exception with the lowest error code value has the highest exception priority. If a packet contains multiple loads, or a load and a store, and both operations have an exception of any type, all Slot 1 exceptions process before any Slot 0 exception processes. NOTE: V65 defines an additional event (with cause code 0x17) to indicate an instruction-cache error. Table 8-15 V71 processor exceptions | Cause code | Event type | Event description | Notes | |------------|---------------------------|-----------------------------------------------------------------------------------------|-----------------------------------| | 0x0 | Reset | Software thread reset. | Non-maskable,<br>highest priority | | 0x01 | Precise,<br>unrecoverable | Unrecoverable BIU error (bus error, timeout, L2 parity error, and so on). | Non-maskable | | 0x03 | Precise, unrecoverable | Double exception (exception occurs while SSR[EX]=1). | Non-maskable | | 0x11 | Precise | Privilege violation: User/Guest mode execute to page with no execute permissions (X=0). | Non-maskable | | 0x12 | Precise | Privilege violation: User mode execute to a page with no user permissions (X=1, U=0). | Non-maskable | | 0x15 | Precise | Invalid packet. | Non-maskable | | 0x16 | Precise | Illegal execution of coprocessor instruction. | Non-maskable | | 0x17 | Precise | Instruction cache error. | Non-maskable | | 0x1A | Precise | Privilege violation: Executing a Guest mode instruction in User mode. | Non-maskable | | 0x1B | Precise | Privilege violation: Executing a supervisor instruction in User/Guest mode. | Non-maskable | | 0x1D | Precise,<br>unrecoverable | Packet with multiple writes to the same destination register. | Non-maskable | | 0x1E | Precise,<br>unrecoverable | Program counter values that are not properly aligned. | Non-maskable | | 0x20 | Precise | Load to misaligned address. | Non-maskable | | 0x21 | Precise | Store to misaligned address. | Non-maskable | | 0x22 | Precise | Privilege violation: User/Guest mode read to page with no read permission (R=0). | Non-maskable | | 0x23 | Precise | Privilege violation: User/Guest mode write to page with no write permissions (W=0). | Non-maskable | | 0x24 | Precise | Privilege violation: User mode read to page with no user permission (R=1, U=0). | Non-maskable | | 0x25 | Precise | Privilege violation: User mode write to page with no user permissions (W=1, U=0). | Non-maskable | | 0x26 | Precise | Coprocessor VMEM address error. | Non-maskable | | 0x27 | Precise | Stack overflow: Allocframe instruction exceeded FRAMELIMIT. | Non-maskable, | | 0x42 | Imprecise | Data abort. | Non-maskable | | 0x43 | Imprecise | NMI. | Non-maskable | | 0x44 | Imprecise | Multiple TLB match. | Non-maskable | | 0x45 | Imprecise | Livelock exception. | Non-maskable | | 0x60 | TLB miss-X | Missing fetch address on PC-page. | Non-maskable | | 0x61 | TLB miss-X | Missing fetch on second page from packet that spans pages. | Non-maskable | Table 8-15 V71 processor exceptions (cont.) | Cause code | Event type | Event description | Notes | |------------|----------------|----------------------------------------------------------------|----------------------------------------------| | 0x62 | TLB miss-X | icinva. | Non-maskable | | | Reserved | | | | 0x70 | TLB miss-RW | Memory read. | Non-maskable | | 0x71 | TLB miss-RW | Memory write. | Non-maskable | | | Reserved | | | | #u8 | Trap0 | Software Trap0 instruction. | Non-maskable | | #u8 | Trap1 | Software Trap1 instruction. | Non-maskable | | | Reserved | | | | 0x80 | Debug | Single-step debug exception. | | | | Reserved | | | | 0xBF | Floating point | Execution of floating point instruction resulted in exception. | Non-maskable | | 0xC0 | Interrupt0 | General external interrupt. | Maskable, highest priority general interrupt | | 0xC1 | Interrupt 1 | General external interrupt | Maskable | | 0xC2 | Interrupt 2 | General external interrupt | VIC0 interface | | 0xC3 | Interrupt 3 | General external interrupt | VIC1 interface | | 0xC4 | Interrupt 4 | General external interrupt | VIC2 interface | | 0xC5 | Interrupt 5 | General external interrupt | VIC3 interface | | 0xC6 | Interrupt 6 | General external interrupt | | | 0xC7 | Interrupt 7 | General external interrupt | Lowest-priority interrupt | # 9 PMU events The Hexagon processor can collect execution statistics on the applications it executes. The statistics summarize the various types of Hexagon processor events that occurred while the application was running. Execution statistics can be collected in hardware or software: - Statistics can be collected in hardware with the performance monitor unit (PMU), which is defined as part of the Hexagon processor architecture. - Statistics can be collected in software using the Hexagon simulator. The simulator statistics are presented in the same format used by the PMU. Execution statistics are expressed in terms of processor events. This chapter defines the event symbols, along with their associated numeric codes. NOTE: Because the types of execution events vary across the Hexagon processor versions, different types of statistics are collected for each version. This chapter lists the event symbols defined for version V71. # 9.1 V71 processor event symbols Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|-------------------|-----------------------------------------------------------------------------| | 0x01 | COUNTER0_OVERFLOW | Use as the event detected by counter1 to build an effective 64-bit counter. | | 0x02 | COUNTER2_OVERFLOW | Use as the event detected by counter3 to build an effective 64-bit counter. | | 0x03 | COMMITTED_PKT_ANY | Thread committed a packet; packets executed. | | 0x04 | COMMITTED_PKT_BSB | Packet is committed two cycles after previous packet in same thread. | | 0x05 | COUNTER4_OVERFLOW | Can be the event detected by counter5 to build an effective 64-bit counter. | | 0x06 | COUNTER6_OVERFLOW | Use as the event detected by counter7 to build an effective 64-bit counter. | | 0x07 | COMMITTED_PKT_B2B | Packet committed one cycle after previous packet in same thread. | | 0x08 | COMMITTED_PKT_SMT | Two packets committed in the same cycle. | | 0x0c | COMMITTED_PKT_T0 | Thread 0 committed a packet. Packets are executed. | Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x0d | COMMITTED_PKT_T1 | Thread 1 committed a packet. Packets are executed. | | 0x0e | COMMITTED_PKT_T2 | Thread 2 committed a packet. Packets are executed. | | 0x0f | COMMITTED_PKT_T3 | Thread 3 committed a packet. Packets are executed. | | 0x10 | COMMITTED_PKT_T4 | Thread 4 committed a packet. Packets are executed. | | 0x11 | COMMITTED_PKT_T5 | Thread 5 committed a packet. Packets are executed. | | 0x12 | ICACHE_DEMAND_MISS | I-cache demand miss. Includes secondary miss. | | 0x13 | DCACHE_DEMAND_MISS | D-cache cacheable demand primary or secondary miss. Includes dczero stall. Excludes uncacheables, prefetches, and no-allocate store misses. | | 0x20 | ANY_IU_REPLAY | Any instruction unit stall other than an I-cache miss. Includes jump register stall, fetchcross stall, ITLB miss stall, and so on. Excludes control unit replay. | | 0x21 | ANY_DU_REPLAY | Any data unit replay; bank conflict, store buffer full, and so on. Excludes stalls due to cache misses. | | 0x22 | CU_REDISPATCH | Any case where a packet is redispatched. Most commonly, HVX FIFO becomes full while an HVX packet is in flight. It can also be a replay requested for a non-replayable instruction, or it can be forwarding a bus resource conflict. | | 0x25 | COMMITTED_PKT_1_THREAD_RUNNING | Committed packets with one thread running. The thread is not in Stop or Wait mode. | | 0x26 | COMMITTED_PKT_2_THREAD_RUNNING | Committed packets with two threads running. the threads are not in Stop or Wait mode. | | 0x27 | COMMITTED_PKT_3_THREAD_RUNNING | Committed packets with three threads running; the threads are not in Stop or Wait mode. | | 0x2a | COMMITTED_INSTS | Committed instructions. Increments by up to 8 per cycle. Duplex counts as two instructions. Does not include end loops. | | 0x2b | COMMITTED_TC1_INSTS | Committed TC1 class instructions. Increments by up to 8 per cycle. Duplex of two TC1 instructions counts as two TC1 instructions. Does not include nops. | | 0x2c | COMMITTED_PRIVATE_INSTS | Committed instructions that have per-cluster (private) execution resources. Increments by up to 8 per cycle. Duplex of two private instructions counts as two private instructions. | | 0x2f | COMMITTED_PKT_4_THREAD_RUNNING | Committed packets with four threads running. The threads are not in Stop or Wait mode. | | 0x30 | COMMITTED_LOADS | Committed load instructions. Includes cached and uncached. Increments by 2 for dual loads. Excludes prefetches, memory operations, and coprocessor loads. | | 0x31 | COMMITTED_STORES | Committed store instructions. Includes cached and uncached. Increments by 2 for dual stores. Excludes memory operations and coprocessor stores. | | 0x32 | COMMITTED_MEMOPS | Committed memop instructions. Cached or uncached. | Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|------------------------------|-------------------------------------------------------------------------------------------------------------------------------| | 0x37 | COMMITTED_PROGRAM_FLOW_INSTS | Committed packet contains program flow instructions. Includes CR jumps, endloop, J, JR, dealloc_return, | | | | system/trap, superset of event 56. Dual jumps count as two. | | 0x38 | COMMITTED_PKT_CHANGED_FLOW | Committed packet resulted in change of flow. Any taken jump. Includes endloop and dealloc return. | | 0.00 | COMMITTED DIST ENDI COD | . – | | 0x39 | COMMITTED_PKT_ENDLOOP | Committed packet contains an endloop that was taken | | 0x3b | CYCLES_1_THREAD_RUNNING | Processor cycles that exactly one thread is running. Running means not in Wait or Stop mode. | | 0x3c | CYCLES_2_THREAD_RUNNING | Processor cycles that exactly two threads are running. Running means not in Wait or Stop mode. | | 0x3d | CYCLES_3_THREAD_RUNNING | Processor cycles that exactly three threads are running. | | | | Running means not in Wait or Stop mode. | | 0x3e | CYCLES_4_THREAD_RUNNING | Processor cycles that exactly four threads are running. | | | | Running means not in Wait or Stop mode. | | 0x40 | AXI_READ_REQUEST | All read requests issued by the primary AXI master. Includes full lines, partial lines and all interleaved requests. | | 0x41 | AXI_LINE32_READ_REQUEST | 32-byte line read requests issued by the primary AXI master; includes all interleaved requests. | | 0x42 | AXI_WRITE_REQUEST | All write requests issued by the primary AXI master. | | | | Includes full lines, partial lines, and all interleaved requests | | 0x43 | AXI_LINE32_WRITE_REQUEST | 32-byte line write requests issued by the primary AXI master. | | | | Includes all interleaved requests. All bytes are valid. | | 0x44 | AHB_READ_REQUEST | Read requests issued by the AHB master. | | 0x45 | AHB_WRITE_REQUEST | Write requests issued by the AHB master. | | 0x47 | AXI_SLAVE_MULTI_BEAT_ACCESS | AXI slave multi-beat access. | | 0x48 | AXI_SLAVE_SINGLE_BEAT_ACCESS | AXI slave single-beat access. | | 0x49 | AXI2_READ_REQUEST | All read requests issued by the secondary AXI master. Includes full lines and partial lines. | | 0x4a | AXI2_LINE32_READ_REQUEST | 32-byte line read requests issued by the secondary AXI master. | | 0x4b | AXI2_WRITE_REQUEST | All write requests issued by the secondary AXI master. | | | | Includes full lines and partial lines. | | 0x4c | AXI2_LINE32_WRITE_REQUEST | 32-byte line write requests issued by the secondary AXI master. | | 0x4d | AXI2_CONGESTION | Secondary AXI command or data queue is full, and an operation is stuck at the head of the secondary AXI master command queue. | | 0x50 | COMMITTED_FPS | Committed floating point instructions. Increments by 2 for dual floating point operations. Excludes conversions. | | 0x58 | JTLB_MISS | Instruction or data address translation request missed in the JTLB. | Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x5a | COMMITTED_PKT_RETURN | Committed a return instruction. Includes canceled returns. | | 0x5b | COMMITTED_PKT_INDIRECT_JUMP | Committed an indirect jump or call instruction. Includes canceled instructions. Does not include JUMPR R31 returns. | | 0x5c | COMMITTED_BIMODAL_BRANCH_INSTS | Committed bimodal branch. Includes *.old and *.new. Increments by 2 for dual jumps. | | 0x60 | DU_STORE_LINK | Store is linked to an earlier request to the same location. | | 0x6b | IU_PREFETCHES_SENT_TO_L2 | Instruction unit prefetches sent to L2. Counts cache lines not dropped by L2. Excludes replayed prefetches, and only counts prefetches the L2 accepts. | | 0x6c | ITLB_MISS | ITLB miss that goes to JTLB. | | 0x76 | L2_IU_ACCESS | L2 cacheable access from an instruction unit. Any access to the L2 cache that was the result of an IU command, either demand or L1 prefetch access. Excludes any prefetches generated in the L2. Excludes L2FETCH, TCM accesses, and uncacheables. Address must target primary AXI | | 0x77 | L2_IU_MISS | L2 misses from an instruction unit. Of the events qualified by 0x76, the events that resulted in an L2 miss (demand miss or L1 prefetch miss). An L2 miss is any condition that prevents the immediate return of data to the instruction unit, excluding pipeline conflicts. | | 0x78 | L2_IU_PREFETCH_ACCESS | Prefetch from an instruction unit to the L2 cache; any instruction unit prefetch access sent to the L2 cache. Access must be L2 cacheable and target the primary AXI. Does not include L2fetch-generated accesses. | | 0x79 | L2_IU_PREFETCH_MISS | L2 prefetch from an instruction unit miss. | | OX O | EL_IO_, KEI LION_WIGO | Of the events qualified by 0x78, the events that resulted in an L2 miss. | | 0x7c | L2_DU_READ_ACCESS | L2 cacheable read access from the data unit. Any read access from the data unit that might cause a lookup in the L2 cache. Includes loads, L1 prefetch, DCfetch. Excludes initial L2fetchcommand, uncacheables, TCM accesses, and coprocessor loads. Must target the primary AXI. | | 0x7d | L2_DU_READ_MISS | L2 read miss from the data unit. Of the events qualified by 0x7C, any event that resulted in an L2 miss. (that is, the line was not previously allocated in the L2 cache and is fetched from backing memory.) | | 0x7e | L2FETCH_ACCESS | L2fetch access from the data unit. Any access to the L2 cache from the L2 prefetch engine that was initiated by programing the L2fetch engine. Includes only the cache inquire and fetch if not present in commands. | | 0x7f | L2FETCH_MISS | L2fetch miss from a programed inquiry. Of the events qualified by 0x7E, the events that resulted in an L2 miss (that is, the line was not previously allocated in the L2 cache and is fetched from backing memory). | Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x81 | L2_ACCESS | All requests to the L2. Does not include internally generated accesses like L2fetch, however the programming of the L2fetcj engine is counted. Accesses target odd or even interleaves, and can be L2 cacheable or TCM. | | 0x82 | L2_PIPE_CONFLICT | Request not taken by the L2 due to a pipe conflict. | | | | The conflict can be a tag bank, data bank, or other pipeline conflict. | | 0x83 | L2_TAG_ARRAY_CONFLICT | caused by a conflict with the tag array. | | 0x87 | TCM_DU_ACCESS | TCM access from the data unit. Data unit access to the L2 TCM space. Excludes HVX requests. | | 0x88 | TCM_DU_READ_ACCESS | TCM read access from the data unit. Data unit read access to the L2 TCM space. Excludes HVX requests. | | 0x89 | TCM_IU_ACCESS | TCM access from an instruction unit. Instruction unit access to the L2 TCM space. | | 0x8a | L2_CASTOUT | Triggers when L2 cache evicts a dirty line due to an allocation. Not triggered on cache operations. | | 0x8b | L2_DU_STORE_ACCESS | L2 cacheable store access from the data unit. Any store access from the data unit that might cause a lookup in the L2 cache. Excludes cache operations, uncacheables, TCM, and coprocessor stores. Must target primary AXI | | 0x8c | L2_DU_STORE_MISS | L2 miss from the data unit. Of the events qualified by 0x8B, the events that resulted in a miss. Specifically, the cases where the line is not in cache or a coalesce buffer. | | 0x8d | L2_DU_PREFETCH_ACCESS | L2 prefetch access from the data unit. Of the events qualified by 0x7C, the events that are DCfetch and dhwprefetch. These are L2 cacheable targeting the AXI primary. | | 0x8e | L2_DU_PREFETCH_MISS | L2 prefetch miss from the data unit. Of the events qualified by 0x8D, the events that missed the L2 cache. | | 0x90 | L2_DU_LOAD_SECONDARY_MISS | L2 load secondary miss from the data unit. Hit a busy line in the scoreboard, preventing a return. Busy condition can include pipeline bubbles caused by back-to-back loads, such as on L1UC loads. | | 0x91 | L2FETCH_COMMAND | L2fetch command. Excludes the Stop command. | | 0x92 | L2FETCH_COMMAND_KILLED | L2fetch command is killed because a Stop command was issued. Increments once for each L2fetch command that is killed. If multiple commands queue to the L2fetch engine, the kill of each one is recorded | | 0x93 | L2FETCH_COMMAND_OVERWRITE | L2fetch command is overwritten. Kills an old L2fetch command and replaces it with a new command. | | 0x94 | L2FETCH_ACCESS_CREDIT_FAIL | L2fetch access cannot get a credit. L2fetch is blocked because of a missing L2fetch or L2evict credit. | | 0x97 | L2_ACCESS_EVEN | Of the events in 0x81, the number of accesses made to the even L2 cache. | | 0xa0 | ANY_DU_STALL | Any data unit stall. Increments once when the thread has a data unit stall (D-cache miss or DTLB miss). | Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0xa1 | DU_BANK_CONFLICT_REPLAY | Data unit bank conflict replay.Dual memory access to the same bank but on different lines. | | 0xa3 | L2_FIFO_FULL_REPLAY | Counts L2 even/odd FIFO full replays. | | 0xa4 | DU_STORE_BUFFER_FULL_REPLAY | First packet puts access in the data unit store buffer (memop, store.new, load/store bank conflict, store/store bank conflict). A later packet tries to use the store buffer before the first one evicts, and thus it must replay so the store buffer can drain. | | 0xa8 | DU_FILL_REPLAY | Fill has a index conflict with an instruction from the same thread in a pipeline. Fills and demands might be from different threads if: There is a prefetch from the deferral queue Or if a fill has not be acknowledged for too long and forces itself into the pipeline | | 0xac | DU_READ_TO_L2 | Data unit read to L2. Total of everything that brings data from the L2 array. Includes prefetches (dcetch and hwprefetch). Excludes coprocessor loads. | | 0xad | DU_WRITE_TO_L2 | Data unit write to L2. Total of everything that is written out of the data unit to the L2 array. Includes dczeroa. Excludes dcclean, dccleaninv, tag writes, and coprocessor stores. | | 0xaf | DCZERO_COMMITTED | Committed a dczeroa instruction. | | 0xb3 | DTLB_MISS | DTLB miss that goes to JTLB. When both slots miss to different pages, increments by 2. When both slots miss to the same page, only counts S1, because S1 goes first and fills for S0. | | 0xb6 | STORE_BUFFER_HIT_REPLAY | Store buffer hit is replayed because a packet with two stores is going to the same bank but different cache lines, followed by a load from an address that was pushed into the store buffer. | | 0xb7 | STORE_BUFFER_FORCE_REPLAY | Store buffer must drain, forcing the current packet to replay. | | | | This typically happens on a cache index match between the current packet and store buffer. Can also be a store buffer timeout. | | 0xb9 | SMT_BANK_CONFLICT | Counts inter-thread SMT bank conflicts. | | 0xba | PORT_CONFLICT_REPLAY | Counts all port conflict replays, including the same cluster replays caused by high priority fills and store buffer force drains, and also inter-cluster replays. | | 0xbd | PAGE_CROSS_REPLAY | Page cross from a valid packet that caused a replay. Excludes pdkill packets. Counts twice if both slots cause a page cross. | | 0xbf | DU_DEMAND_SECONDARY_MISS | Data unit demand secondary miss. | | 0xc3 | DCFETCH_COMMITTED | Includes hit and dropped. Does not include convert-to-<br>prefetches. | | 0xc4 | DCFETCH_HIT | Defetch hit in deache. Includes hit valid or reserved line | | 0xc5 | DCFETCH_MISS | Dcfetch missed in the L1 cache. Counts the dcfetches issued to L2 FIFO. | Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0xc8 | DU_LOAD_UNCACHEABLE | Load instructions with addresses uncacheable in the L1 cache. | | 0xc9 | DU_DUAL_LOAD_UNCACHEABLE | Packets where both loads have addresses uncacheable in the L1 cache. | | 0xca | DU_STORE_UNCACHEABLE | Store instructions with addresses uncacheable in the L1 cache. | | 0xce | AXI_LINE64_READ_REQUEST | 64-byte line read requests issued by the primary AXI master. | | 0xcf | AXI_LINE64_WRITE_REQUEST | 64-byte line write requests issued by the primary AXI master. All bytes are valid. | | 0xd1 | AHB_8_READ_REQUEST | An 8 byte AHB read was made. | | 0xd3 | L2FETCH_COMMAND_PAGE_TERMINATI<br>ON | L2fetch command terminated because it cannot get a page translation from virtual address to physical address. Includes termination dues to permission errors. For example, an address translation fails because the virtual address to physical address is not in the TLB. Or the properties in the translation are not acceptable and the command terminates. | | 0xd5 | L2_DU_STORE_COALESCE | Events from 0x8b that were coalesced. | | 0xdc | L2_EVICTION_BUFFERS_FULL | Counts every cycle when all eviction buffers in any interleave are occupied. | | 0xdd | AHB_MULTI_BEAT_READ_REQUEST | 32-byte multi-beat AHB read was made. | | 0xdf | L2_DU_LOAD_SECONDARY_MISS_ON_S<br>W_PREFETCH | Of the events in 0x90, the events where the primary miss was a dcfetch or L2fetch. | | 0xe1 | REPLAY_MAXIMUM_FORCE | Maximum number of data unit replays reached. This thread is forced through the DU by stalling the other threads. | | 0xe2 | SCHEDULER_WATCHDOG_FORCE | Number of times the control unit scheduler watchdog forced a threadpick. The livelock warning randomizer can override this threadpick. | | 0xe3 | LIVELOCK_REFETCH | Cycles cluster could not commit because a thread off or wait. | | 0xe4 | CYCLES_LIVELOCK_WARNING | Cycles core is randomizing scheduler due to passing livelock warning watermark. This could be a sign the core was not making forward progress, or that the watermark was set too low for legitimate stalls in the system. | | 0xe6 | ARCH_LOCK_PVIEW_CYCLES | Cycles cluster cannot commit due to a kernel lock or TLB lock. | | 0xe7 | REDIRECT_PVIEW_CYCLES | Cycles cluster cannot commit due to redirects such as a branch mispredict. | | 0xe8 | IU_NO_PKT_PVIEW_CYCLES | Cycles cluster cannot commit because the issue queue is empty. | | 0xe9 | DU_CACHE_MISS_PVIEW_CYCLES | Cycles cluster cannot commit due to a D-cache cacheable miss. | | 0xea | DU_BUSY_OTHER_PVIEW_CYCLES | Cycles cluster cannot commit due to a data unit replay or bubble, or a DTLB miss. | ## Table 9-1 Symbols that represent V71 Hexagon processor events | Event | Symbol | Definition | |-------|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0xeb | CU_BUSY_PVIEW_CYCLES | Cycles cluster cannot commit because of a register interlock, register port conflict, bubbles due to a timing class such as tc_3stall, no branch target buffer HVX, or HVX FIFO full. | | 0xec | SMT_DU_CONFLICT_PVIEW_CYCLES | Cycles cluster cannot commit because of a data unit resource conflict. | # 10 Instruction encoding This chapter describes the binary encoding of Hexagon processor instructions and instruction packets. ## 10.1 Instructions Hexagon processor instructions are encoded in a 32-bit instruction word. The instruction word format varies according to the instruction type. The instruction words contain two types of bit fields: - Common fields appear in every processor instruction, and are defined the same in all instructions. - Instruction-specific fields appear only in some instructions, or vary in definition across the instruction set. Table 10-1 Instruction bit fields | Name | Description | Туре | |--------|-------------------|--------| | ICLASS | Instruction class | Common | | Parse | Packet/loop bits | | Table 10-1 Instruction bit fields | Name | Description | Туре | | |--------------|---------------------------------------------|------------------|--| | MajOp or Maj | Major opcode | Instruction- | | | MinOp or Min | Minor opcode specific | | | | RegType | Register type (32-bit, 64-bit) | | | | Туре | Operand type (such as byte or halfword) | | | | Amode | Addressing mode | | | | d <i>n</i> | Destination register operand | | | | sn | Source register operand | | | | tn | Source register operand #2 | | | | xn | Source and destination register operand | | | | un | Predicate or modifier register operand | | | | sH | Source register bit field (Rs.H or Rs.L) | | | | tH | Source register #2 bit field (Rt.H or Rt.L) | d (Rt.H or Rt.L) | | | UN | Unsigned operand | and | | | Rs | No source register read | | | | Р | Predicate expression | | | | PS | Predicate sense (Pu or !Pu) | | | | DN | Dot-new predicate | | | | PT | Predict taken | | | | sm | Supervisor mode only | | | NOTE: In some cases, instruction-specific fields encode instruction attributes other than the attibutes described for the fields in Table 10-1. ### Reserved bits Some instructions contain reserved bits that do not encode instruction attributes. Always set these bits to 0 to ensure compatibility with future changes in the instruction encoding. NOTE: Reserved bits appear as '-' characters in the instruction encoding tables. ## 10.2 Sub-instructions To reduce code size, the Hexagon processor supports the encoding of certain pairs of instructions in a single 32-bit container. Instructions encoded this way are sub-instructions, and the containers are duplexes (Section 10.3). Sub-instructions are limited to certain common instructions: - Arithmetic and logical operations - Register transfer - Loads and stores - Stack frame allocation/deallocation - Subroutine return Table 10-2 lists the sub-instructions along with the group identifiers that encode them in duplexes. Sub-instructions can only access a subset of the general registers (R0 through R7, R16 through R23). Table 10-3 lists the sub-instruction register encodings. NOTE: Certain sub-instructions implicitly access registers such as SP (R29). Table 10-2 Sub-instructions | Group | Instruction | Description | |-------|-----------------------------------------------------------------------|-----------------------------------| | L1 | Rd = memw(Rs+#u4:2) | Word load | | L1 | Rd = memub(Rs+#u4:0) | Unsigned byte load | | Group | Instruction | Instruction | | L2 | Rd = memh/memuh(Rs+#u3:1) | Halfword loads | | L2 | Rd = memb(Rs+#u3:0) | Signed byte load | | L2 | Rd = memw(r29+#u5:2) | Load word from stack | | L2 | Rdd = memd(r29+#u5:3) | Load pair from stack | | L2 | deallocframe | Deallocate stack frame | | L2 | <pre>if ([!]P0) dealloc_return if ([!]P0.new) dealloc_return:nt</pre> | Deallocate stack frame and return | | L2 | <pre>jumpr R31 if ([!]P0) jumpr R31 if ([!]P0.new) jumpr:nt R31</pre> | Return | | Group | Instruction | Instruction | | S1 | memw(Rs+#u4:2) = Rt | Store word | | S1 | memb(Rs+#u4:0) = Rt | Store byte | | Group | Instruction | Instruction | | S2 | memh(Rs+#u3:1) = Rt | Store halfword | | S2 | memw(r29+#u5:2) = Rt | Store word to stack | | S2 | memd(r29+#s6:3) = Rtt | Store pair to stack | Table 10-2 Sub-instructions (cont.) | Group | Instruction | Description | |-------|-------------------------------------|-------------------------------------| | S2 | memw(Rs+#u4:2) = #U1 | Store immediate word #0 or #1 | | S2 | memb(Rs+#u4) = #U1 | Store immediate byte #0 or #1 | | S2 | allocframe(#u5:3) | Allocate stack frame | | Group | Instruction | Instruction | | Α | Rx = add(Rx, #s7) | Add immediate | | Α | Rd = Rs | Transfer | | Α | Rd = #u6 | Set to unsigned immediate | | Α | Rd = #-1 | Set to -1 | | Α | if ([!]P0[.new]) Rd = #0 | Conditional clear | | Α | Rd = add(r29, #u6:2) | Add immediate to stack pointer | | Α | Rx = add(Rx,Rs) | Register add | | Α | P0 = cmp.eq(Rs, #u2) | Compare register equal immediate | | Α | Rdd = combine(#0,Rs) | Combine zero and register into pair | | Α | Rdd = combine(Rs, #0) | Combine register and zero into pair | | Α | Rdd = combine(#u2, #U2) | Combine immediates into pair | | А | Rd = add(Rs,#1)<br>Rd = add(Rs,#-1) | Add and subtract 1 | | Α | Rd = sxth/sxtb/zxtb/zxth(Rs) | Sign- and zero-extends | | Α | Rd = and(Rs, #1) | And with 1 | Table 10-3 Sub-instruction registers | Register | Encoding | |-------------|--------------| | Rs,Rt,Rd,Rx | 0000 = R0 | | | 0001 = R1 | | | 0010 = R2 | | | 0011 = R3 | | | 0100 = R4 | | | 0101 = R5 | | | 0110 = R6 | | | 0111 = R7 | | | 1000 = R16 | | | 1001 = R17 | | | 1010 = R18 | | | 1011 = R19 | | | 1100 = R20 | | | 1101 = R21 | | | 1110 = R22 | | | 1111 = R23 | | Rdd, Rtt | 000 = R1:0 | | | 001 = R3:2 | | | 010 = R5:4 | | | 011 = R7:6 | | | 100 = R17:16 | | | 101 = R19:18 | | | 110 = R21:20 | | | 111 = R23:22 | # 10.3 Duplexes A duplex is encoded as a 32-bit instruction with bits [15:14] set to 00. The sub-instructions (Section 10.2) that comprise a duplex are encoded as 13-bit fields in the duplex. Table 10-4 shows the encoding details for a duplex. An instruction packet can contain one duplex and up to two other (non-duplex) instructions. The duplex must always appear as the last word in a packet. The sub-instructions in a duplex always execute in Slot 0 and Slot 1. Table 10-4 Duplex instruction | Bits | Name | Description | |-----------|---------------|-----------------------------------------------------------------------------------------| | 15:14 | Parse bits | 00 = Duplex type, ends the packet and indicates that word contains two sub-instructions | | 12:0 | Sub-insn low | Encodes slot 0 sub-instruction | | 28:16 | Sub-insn high | Encodes slot 1 sub-instruction | | 31:29, 13 | 4-bit ICLASS | Indicates the group to which he low and high sub-<br>instructions belong. | Table 10-5 lists the duplex ICLASS field values, which specify the group of each sub-instruction in a duplex. Table 10-5 Duplex ICLASS field | ICLASS | Low slot 0 sub-instruction type | High slot 1 sub-<br>instruction type | |--------|---------------------------------|--------------------------------------| | 0x0 | L1-type | L1-type | | 0x1 | L2-type | L1-type | | 0x2 | L2-type | L2-type | | 0x3 | A-type | A-type | | 0x4 | L1-type | A-type | | 0x5 | L2-type | A-type | | 0x6 | S1-type | A-type | | 0x7 | S2-type | A-type | | 0x8 | S1-type | L1-type | | 0x9 | S1-type | L2-type | | 0xA | S1-type | S1-type | | 0xB | S2-type S1-type | | | 0xC | S2-type | L1-type | | 0xD | S2-type | L2-type | | 0xE | S2-type S2-type | | | 0xF | Reserved | Reserved | Duplexes have the following grouping constraints: ■ Constant extenders expand the range of an instruction's immediate operand to 32 bits (Section 10.9). The following sub-instructions can be expanded with a constant extender: - Rx = add(Rx,#s7) – Rd = #u6 A duplex can contain only one constant-extended instruction, and it must appear in the slot 1 position. - If a duplex contains two instructions with the same sub-instruction group, the instructions must be ordered in the duplex as follows: when the sub-instructions are treated as 13-bit unsigned integer values, the instruction corresponding to the numerically smaller value must be encoded in the Slot 1 position of the duplex.¹ - Sub-instructions must conform to any slot assignment grouping rules that apply to the individual instructions, even if a duplex pattern exists that violates those assignments. One exception to this rule is that jumpr R31 must appear in the slot 0 position. <sup>&</sup>lt;sup>1</sup> The sub-instruction register and immediate fields are assumed to be 0 when performing this comparison. ## 10.4 Instruction classes The instruction class (Section 3.2) is encoded in the four most-significant bits of the instruction word (31:28). These bits are referred to as the ICLASS field of the instruction. Table 10-6 lists the encoding values for the instruction classes. The slots column indicates which slots can receive the instruction class. Table 10-6 Instruction class encoding | Encoding | Instruction class | Slots | |----------|----------------------------------------|---------| | 0000 | Constant extender (Section 10.9) | | | 0001 | J 2,3 | | | 0010 | J | 2,3 | | 0011 | LD 0,1 | | | 0100 | LD 0,1 ST (conditional or GP-relative) | | | 0101 | J | 2,3 | | 0110 | CR | 3 | | 0111 | ALU32 | 0,1,2,3 | | 1000 | XTYPE | 2,3 | | 1001 | LD | 0,1 | | 1010 | ST | 0 | | 1011 | ALU32 | 0,1,2,3 | | 1100 | XTYPE | 2,3 | | 1101 | XTYPE | 2,3 | | 1110 | XTYPE | 2,3 | | 1111 | ALU32 | 0,1,2,3 | For details on encoding the individual class types, see Chapter 11. # 10.5 Instruction packets Instruction packets are encoded using two bits of the instruction word (15:14), which are referred to as the Parse field of the instruction word. The field values have the following definitions: - '11' indicates that an instruction is the last instruction in a packet the instruction word at the highest address). - '01' or '10' indicate that an instruction is not the last instruction in a packet. - '00'indicates a duplex (Section 10.3) If any sequence of four consecutive instructions occurs without one of them containing '11' or '00', the processor raises an error exception (illegal opcode). Figure 10-1 shows the location of the Parse field in an instruction word. Packet/loop parse bits: 01, 10 = not end of packet 11 = end of packet 00 = duplex ### Figure 10-1 Instruction packet encoding The following examples show how to use the Parse field to encode instruction packets: # 10.6 Loop packets In addition to encoding the last instruction in a packet, the Parse field of the instruction word (Section 10.5) encodes the last packet in a hardware loop. The Hexagon processor supports two hardware loops, labeled 0 and 1 (Section 8.2). The last packet in these loops is subject to the following restrictions: - The last packet in a hardware loop 0 must contain two or more instruction words. - The last packet in a hardware loop 1 must contain three or more instruction words. If the last packet in a loop is expressed in assembly language with fewer than the required number of words, the assembler automatically adds one or two NOP instructions to the encoded packet so it contains the minimum required number of instruction words. The Parse fields in the first and second instruction words (the words at the lowest addresses) of a packet encode whether the packet is the last packet in a hardware loop. Table 10-7 shows how the Parse fields encode loop packets. Table 10-7 Loop packet encoding | Packet | Parse field in first Instruction | Parse field in second Instruction | |---------------------|----------------------------------|-----------------------------------| | Not last in loop | 01 or 11 | 01 or 11 <sup>1</sup> | | Last in loop 0 | 10 | 01 or 11 | | Last in loop 1 | 01 | 10 | | Last in loops 0 & 1 | 10 | 10 | <sup>&</sup>lt;sup>1</sup> Not applicable for single-instruction packets. The following examples show how to use the Parse field to encode loop packets: ``` { A B}:endloop0 10 11 // Parse fields of instrs A,B { A В C}:endloop0 11 10 01 // Parse fields of instrs A,B,C { A В С D}:endloop0 // Parse fields of instrs A,B,C,D 10 01 01 11 { A В C}:endloop1 // Parse fields of instrs A,B,C 01 10 11 В С D}:endloop1 { A 01 10 01 11 // Parse fields of instrs A,B,C,D C}:endloop0:endloop1 { A В 11 // Parse fields of instrs A,B,C 10 10 { A В С D}:endloop0:endloop1 10 10 01 11 // Parse fields of instrs A,B,C,D ``` ## 10.7 Immediate values To conserve encoding space, the Hexagon processor often stores immediate values in instruction fields that are smaller (in bit size) than the values needed in the instruction operation. When an instruction operates on one of its immediate operands, the processor automatically extends the immediate value to the bit size required by the operation: - Signed immediate values are sign-extended - Unsigned immediate values are zero-extended ### 10.8 Scaled immediate values To minimize the number of bits in instruction words to store certain immediate values, the Hexagon processor stores the values as scaled immediate values. Use scaled immediate values when an immediate value must represent integral multiples of a power of 2 in a specific range. For example, consider an instruction operand whose possible values are the following: Encoding the full range of integers -32 through 28 normally requires six bits. However, if the operand is stored as a scaled immediate, it can first be shifted right by two bits, with only the four remaining bits being stored in the instruction word. When the operand is fetched from the instruction word, the processor automatically shifts the value left by two bits to recreate the original operand value. NOTE: The scaled immediate value in the example above is represented notationally as #s4:2. For more information, see Section 3.1. Scaled immediate values commonly encode address offsets that apply to data types of varying size. For example, Table 10-8 shows how to use the byte offsets in immediate-with-offset addressing mode stored as 11-bit scaled immediate values. This enables the offsets to span the same range of data elements regardless of the data type. Offset size Scale Offset size Offset range Offset range Data type (Stored) bits (effective) (elements) (bytes) byte 11 0 11 -1024 ... 1023 -1024 ... 1023 -1024 ... 1023 halfword 11 1 12 -2048 ... 2046 word 11 2 13 -4096 ... 4092 -1024 ... 1023 doubleword 11 3 14 -8192 ... 8184 -1024 ... 1023 Table 10-8 Scaled immediate encoding (indirect offsets) ### 10.9 Constant extenders To support the use of 32-bit operands in some instructions, the Hexagon processor defines an instruction word that solely exists to extend the bit range of an immediate or address operand that is contained in an adjacent instruction in a packet. These instruction words are called constant extenders. For example, the absolute addressing mode specifies a 32-bit constant value as the effective address. Instructions using this addressing mode are encoded in a single packet containing both the normal instruction word and a second word with a constant extender that increases the range of the normal constant operand of the instruction to a full 32 bits. NOTE: Constant extended operands can encode symbols. A constant extender is encoded as a 32-bit instruction with the 4-bit ICLASS field set to 0 and the 2-bit Parse field set to its usual value (Section 10.5). The remaining 26 bits in the instruction word store the data bits that are prepended to an operand as small as six bits to create a full 32-bit value. Table 10-9 Constant extender encoding | Bits | Name | Description | |-------|---------------|-------------------------------------------| | 31:28 | ICLASS | Instruction class = 0000 | | 27:16 | Extender high | High 12 bits of 26-bit constant extension | | 15:14 | Parse | Parse bits | | 13:0 | Extender low | Low 14 bits of 26-bit constant extension | Within a packet, a constant extender must be positioned immediately before the instruction that it extends: in terms of memory addresses, the extender word must reside at address (<instr address> - 4). The constant extender serves as a prefix for an instruction: it does not execute in a slot, nor does it consume any slot resources. All packets must contain four or fewer words, and the constant extender occupies one word. If the instruction operand to extend is longer than six bits, the overlapping bits in the base instruction must be encoded as zeros. The value in the constant extender always supplies the upper 26 bits. The Regclass field in Table 10-10 lists the values to set bits [27:24] to in the instruction word to identify the instruction as one that might include a constant extender. NOTE: When the base instruction encodes two constant operands, the extended immediate is the one specified in the table. Constant extenders appear in disassembly listings as Hexagon instructions with the name immext. Table 10-10 Constant extender instructions | ICLASS | Regclass | Instructions | |--------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | LD | 1 | <pre>Rd = mem{b,ub,h,uh,w,d} (##U32) if ([!]Pt[.new]) Rd = mem{b,ub,h,uh,w,d} (Rs + ##U32) // Predicated loads</pre> | | LD | | Rd = mem{b,ub,h,uh,w,d} (Rs + ##U32)<br>Rd = mem{b,ub,h,uh,w,d} (Re=##U32)<br>Rd = mem{b,ub,h,uh,w,d} (Rt<<#u2 + ##U32)<br>if ([!]Pt[.new]) Rd = mem{b,ub,h,uh,w,d} (##U32) | | ST | 0 | <pre>mem{b,h,w,d}(##U32) = Rs[.new]</pre> | | ST | | mem{b,h,w,d} (Rs + ##U32) = Rt[.new] mem{b,h,w,d} (Rd = ##U32) = Rt[.new] mem{b,h,w,d} (Ru << #u2 + ##U32) = Rt[.new] if ([!]Pt[.new]) mem{b,h,w,d} (##U32) = Rt[.new] | Table 10-10 Constant extender instructions (cont.) | ICLASS | Regclass | Instructions | |--------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | MEMOP | | [if [!]Ps] memw(Rs + #u6) = ##U32 // Constant store<br>memw(Rs + Rt << #u2) = ##U32 // Constant store | | NV | | if (cmp.xx(Rs.new, ##U32)) jump:hint target | | ALU32 | | Rd = ##u32 Rdd = combine(Rs, ##u32) Rdd = combine(##u32, R s) Rdd = combine(##u32, #s8) Rdd = combine(##u32) Rd = mux (Pu, Rs, ##u32) Rd = mux (Pu, ##u32, Rs) Rd = mux (Pu, ##u32, Rs) Rd = mux (Pu, ##u32, #s8) if ([!]Pu[.new]) Rd = add(Rs, ##u32) if ([!]Pu[.new]) Rd = ##u32 Pd = [!]cmp.eq (Rs, ##u32) Pd = [!]cmp.gt (Rs, ##u32) Pd = [!]cmp.gtu (Rs, ##u32) Rd = and(Rs, ##u32) Rd = and(Rs, ##u32) Rd = or(Rs, ##u32) Rd = sub(##u32, Rs) | | ALU32 | | Rd = add(Rs, ##s32) | | XTYPE | 00 | Rd = mpyi(Rs, ##u32)<br>Rd += mpyi(Rs, ##u32)<br>Rd -= mpyi(Rs, ##u32)<br>Rx += add(Rs, ##u32)<br>Rx -= add(Rs, ##u32) | | ALU32 | 1 | Rd = ##u32<br>Rd = add(Rs, ##s32) | | J | 1 | <pre>jump (PC + ##s32) call (PC + ##s32) if ([!]Pu) call (PC + ##s32)</pre> | | CR | | Pd = spNloop0(PC+##s32,Rs/#U10)<br>loop0/1 (PC+##s32,#Rs/#U10) | | XTYPE | 1 | Rd = add(pc,##s32) Rd = add(##u32,mpyi(Rs,#u6)) Rd = add(##u32,mpyi(Rs,Rt)) Rd = add(Rs,add(Rt,##u32)) Rd = add(Rs,sub(##u32,Rt)) Rd = sub(##u32,add(Rs,Rt)) Rd = or(Rs,and(Rt,##u32)) Rx = add/sub/and/or (##u32,asl/asr/lsr(Rx,#U5)) Rx = add/sub/and/or (##u32,asl/asr/lsr(Rs,Rx)) Rx = add/sub/and/or (##u32,asl/asr/lsr(Rx,Rs)) Pd = cmpb/h.{eq,gt,gtu} (Rs,##u32) | <sup>&</sup>lt;sup>1</sup> Constant extension is only for a Slot 1 sub-instruction. NOTE: If a constant extender is encoded in a packet for an instruction that does not accept a constant extender, the execution result is undefined. The assembler normally ensures that only valid constant extenders are generated. ### **Encoding 32-bit address operands in load/stores** Two methods exist for encoding a 32-bit absolute address in a load or store instruction: - For unconditional load/stores, use the GP-relative load/store instruction. The assembler encodes the absolute 32-bit address as follows: - ☐ The upper 26 bits are encoded in a constant extender - ☐ The lower six bits are encoded in the six operand bits contained in the GP-relative instruction - In this case, the 32-bit value encoded must be a plain address, and the value stored in the GP register is ignored. NOTE: When a constant extender is explicitly specified with a GP-relative load/store, the processor ignores the value in GP and creates the effective address directly from the 32-bit constant value. For conditional load/store instructions that have their base address encoded only by a 6-bit immediate operand, a constant extender must be explicitly specified; otherwise, the execution result is undefined. The assembler ensures that these instructions always include a constant extender. This case applies also to instructions that use the absolute-set addressing mode or absolute-plus-register-offset addressing mode. ### **Encoding 32-bit immediate operands** The immediate operands of certain instructions use scaled immediates (Section 10.8) to increase their addressable range. When using constant extenders, scaled immediates are not scaled by the processor. Instead, the assembler must encode the full 32-bit unscaled value as follows: - Encode the upper 26 bits in the constant extender - Encode the lower 6 bits in the base instruction in the least-significant bit positions of the immediate operand field. - Encode any overlapping bits in the base instruction as zeros. ### Encoding 32-bit jump/call target addresses When a jump/call has a constant extender, the resulting target address is forced to a 32-bit alignment (hardware clears bits 1:0 in the address). The resulting jump/call operation never causes an alignment violation. # 10.10 New-value operands Instructions that include a new-value register operand specify in their encodings which instruction in the packet has its destination register accessed as the new-value register. New-value consumers include a 3-bit instruction field, Nt, which specifies this information. - Nt[0] is reserved and must always be encoded as zero. A nonzero value produces undefined results. - Nt[2:1] encodes the distance (in instructions) from the producer to the consumer, as follows: ``` □ Nt[2:1] = 00 // Reserved □ Nt[2:1] = 01 // Producer is +1 instruction ahead of consumer □ Nt[2:1] = 10 // Producer is +2 instructions ahead of consumer □ Nt[2:1] = 11 // Producer is +3 instructions ahead of consumer ``` The use of "ahead" in the comments means the instruction encoded at a lower memory address than the consumer instruction, not counting empty slots or constant extenders. For example, the following producer/consumer relationship is encoded with Nt[2:1] set to 01. ``` consumer instruction word> consumer constant extender word> consumer instruction word> ... ``` NOTE: Instructions with 64-bit register pair destinations cannot produce new-values. The assembler flags this case with an error, as the result is undefined. # 10.11 Instruction mapping The assembler encodes some Hexagon processor instructions as variants of other instructions. This is done for operations that are functionally equivalent to other instructions, but are still defined as separate instructions because of their programming utility as common operations. Table 10-11 Instructions mapped to other instructions | Instruction | Mapping | |--------------|---------------------------------| | Rd = not(Rs) | Rd = sub(#-1,Rs) | | Rd = neg(Rs) | Rd = sub(#0,Rs) | | Rdd = Rss | Rdd = combine(Rss.H32, Rss.L32) | # 11 Instruction set This chapter describes the instruction set for version 7 of the Hexagon processor. The instructions are listed alphabetically within instruction categories. The following information is provided for each instruction: - Instruction name - A brief description of the instruction - A high-level functional description (syntax and behavior) with all possible operand types - Instruction class and slot information for grouping instructions in packets - Notes on miscellaneous issues - Any C intrinsic functions that provide access to the instruction - Instruction encoding ## 11.1 ALU32 The ALU32 instruction class includes instructions that perform arithmetic and logical operations on 32-bit data. ALU32 instructions are executable on any slot. ### 11.1.1 ALU32 ALU The ALU32 ALU instruction subclass includes instructions that perform arithmetic and logical operations on individual 32-bit items. ### Add Add a source register either to another source register or to a signed 16-bit immediate value. Store the result in destination register. Source and destination registers are 32 bits. If the result overflows 32 bits, it wraps around. Optionally saturate result to a signed value between 0x80000000 and 0x7fffffff. For 64-bit versions of this operation, see the XTYPE add instructions. | Syntax | Benavior | |-------------------|-------------------------------------------| | Rd=add(Rs, #s16) | <pre>apply_extension(#s); Rd=Rs+#s;</pre> | | Rd=add(Rs,Rt) | Rd=Rs+Rt; | | Rd=add(Rs,Rt):sat | Rd=sat <sub>32</sub> (Rs+Rt); | Class: ALU32 (slots 0,1,2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** | Rd=add(Rs, #s16) | Word32 | Q6_R_add_RI(Word32 Rs, Word32 Is16) | |-------------------|---------------|-------------------------------------| | Rd=add(Rs,Rt) | Word32 | Q6_R_add_RR(Word32 Rs, Word32 Rt) | | Rd=add(Rs,Rt):sat | Word32<br>Rt) | Q6_R_add_RR_sat(Word32 Rs, Word32 | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 1 | 1 | i | i | i | i | i | i | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=add(Rs,#s16) | | | ICL | ASS | 3 | Р | N | 1ajC | )p | N | 1inC | )p | | • | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | • | | | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=add(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=add(Rs,Rt):sat | Field name Description Major opcode Maj0p Minor opcode MinOp Ρ Predicated ICLASS Instruction class Packet/loop parse bits Parse d5 Field to encode register d Field to encode register s s5 Field to encode register t t5 # Logical operations Perform bitwise logical operations (AND, OR, XOR, NOT) either on two source registers or on a source register and a signed 10-bit immediate value. Store result in the destination register. Source and destination registers are 32 bits. For 64-bit versions of these operations, see the XTYPE logical instructions. | Syntax | Behavior | |------------------|-----------------------------------------------| | Rd=and(Rs, #s10) | <pre>apply_extension(#s); Rd=Rs&amp;#s;</pre> | | Rd=and(Rs,Rt) | Rd=Rs&Rt | | Rd=and(Rt,~Rs) | Rd = (Rt & ~Rs); | | Rd=not(Rs) | Assembler mapped to: "Rd=sub(#-1,Rs)" | | Rd=or(Rs,#s10) | <pre>apply_extension(#s); Rd=Rs #s;</pre> | | Rd=or(Rs,Rt) | Rd=Rs Rt; | | Rd=or(Rt,~Rs) | Rd = (Rt ~Rs); | | Rd=xor(Rs,Rt) | Rd=Rs^Rt; | Class: ALU32 (slots 0,1,2,3) ### **Intrinsics** | Rd=and(Rs, #s10) | Word32 Q6_R_and_RI(Word32 Rs, Word32 Is10) | |------------------|--------------------------------------------| | Rd=and(Rs,Rt) | Word32 Q6_R_and_RR(Word32 Rs, Word32 Rt) | | Rd=and(Rt,~Rs) | Word32 Q6_R_and_RnR(Word32 Rt, Word32 Rs) | | Rd=not(Rs) | Word32 Q6_R_not_R(Word32 Rs) | | Rd=or(Rs,#s10) | Word32 Q6_R_or_RI(Word32 Rs, Word32 Is10) | | Rd=or(Rs,Rt) | Word32 Q6_R_or_RR(Word32 Rs, Word32 Rt) | | Rd=or(Rt,~Rs) | Word32 Q6_R_or_RnR(Word32 Rt, Word32 Rs) | | Rd=xor(Rs,Rt) | Word32 Q6_R_xor_RR(Word32 Rs, Word32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------------| | | ICL | ASS | 3 | Rs | N | 1ajC | )p | N | /linC | )p | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=and(Rs,#s10) | | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=or(Rs,#s10) | | | ICL | ASS | 3 | Р | N | /ajC | )p | N | /linC | р | | s5 | | | | Pa | rse | | | | t5 | | | | | | | | d5 | • | | | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|----------------| | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=and(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=or(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | • | d | d | d | d | d | Rd=xor(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | | - | • | d | d | d | d | d | Rd=and(Rt,~Rs) | | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | • | - | - | d | d | d | d | d | Rd=or(Rt,~Rs) | ### Field name ### Description | MajOp | Major opcode | |--------|----------------------------| | MinOp | Minor opcode | | Rs | No Rs read | | P | Predicated | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | # Negate Perform arithmetic negation on a source register. Store result in the destination register. Source and destination registers are 32 bits. For 64-bit and saturating versions of this instruction, see the XTYPE negate instructions. | Syntax | Behavior | |-----------------------|----------------------------------------| | Rd=neg(Rs) | Assembler mapped to: "Rd = sub(#0,Rs)" | | Class: N/A Intrinsics | | | mumsics | | | Rd = neg(Rs) | Word32 Q6_R_neg_R(Word32 Rs) | # Nop Perform no operation. This instruction is for padding and alignment. Within a packet, it can be positioned in any slot 0 through 3. | Syntax Behav | vior | |--------------|------| |--------------|------| nop Class: ALU32 (slots 0,1,2,3) ## **Encoding** | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|----|-------|----|----|----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-----| | | IC | CLA | SS | ; | Rs | ٨ | /lajC | )p | | | | | | | | | Pai | rse | | | | | | | | | | | | | | | | | ( | ) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | - | - | - | - | | - | | - | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | nop | Field name Description Major opcode Maj0p No Rs read ICLASS Instruction class Packet/loop parse bits Parse ### Subtract Subtract a source register from either another source register or from a signed 10-bit immediate value. Store the result in destination register. Source and destination registers are 32 bits. If the result underflows 32 bits, it wraps around. Optionally saturate result to a signed value between 0x8000\_0000 and 0x7fff\_ffff. For 64-bit versions of this operation, see the XTYPE subtract instructions. | Syntax | Benavior | |---------------------|-----------------------------------------------| | Rd = sub(#s10,Rs) | <pre>apply_extension(#s); Rd = #s - Rs;</pre> | | Rd = sub(Rt,Rs) | Rd = Rt - Rs; | | Rd = sub(Rt,Rs):sat | $Rd = sat_{32}(Rt - Rs);$ | Class: ALU32 (slots 0,1,2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** | Rd=sub(#s10,Rs) | Word32 Q6_R_sub_IR(Word32 Is10, Word32 Rs) | |-------------------|----------------------------------------------| | Rd=sub(Rt,Rs) | Word32 Q6_R_sub_RR(Word32 Rt, Word32 Rs) | | Rd=sub(Rt,Rs):sat | Word32 Q6_R_sub_RR_sat(Word32 Rt, Word32 Rs) | ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------| | | ICL. | ASS | 3 | Rs | N | 1ajC | )p | N | /linC | )p | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=sub(#s10,Rs) | | | ICL. | ASS | 3 | Р | N | 1ajC | )p | ٨ | linC | )p | | | s5 | | • | Pa | rse | | | | t5 | | • | | | | | | d5 | | | | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=sub(Rt,Rs) | | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=sub(Rt,Rs):sat | | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | i | s | s s s s s | | | | | P | ï | i | ï | i | i | i | i | i | i | d | d | d | d | d | Rd=sub(#s10,Rs) | |------------|---|---|---|------|----|---|------|---|---|-----------|---|---|---|---|-------|-----|------|----|----|---|---|---|---|---|---|---|----|---|---|-------------------| | SS | | Р | ٨ | 1ajC | )p | N | linC | р | | s5 | | | | | Parse | | | | t5 | | | | | | | | d5 | | | | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | | - | d | d | d | d | d | Rd=sub(Rt,Rs) | | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | | - | d | d | d | d | d | Rd=sub(Rt,Rs):sat | | Field name | | | | | | | | | | | • | | | D | )es | cri | ptic | on | | | • | | | | | | | | | | Major opcode Maj0p Minor opcode MinOp No Rs read Rs Predicated Instruction class ICLASS Packet/loop parse bits Parse d5 Field to encode register d Field to encode register s s5 t5 Field to encode register t # Sign extend Sign-extend the least-significant byte or halfword from the source register and place the 32-bit result in the destination register. | Syntax | Behavior | |-------------|--------------------------| | Rd=sxtb(Rs) | $Rd = sxt_{8->32}(Rs);$ | | Rd=sxth(Rs) | $Rd = sxt_{16->32}(Rs);$ | Class: ALU32 (slots 0,1,2,3) ### **Intrinsics** | Rd=sxtb(Rs) | Word32 Q6_R_sxtb_R(Word32 Rs) | |-------------|-------------------------------| | Rd=sxth(Rs) | Word32 Q6_R_sxth_R(Word32 Rs) | ## **Encoding** s5 | ; | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------| | Ī | ICI | ASS | 3 | Rs | N | 1ajC | )p | N | /linC | )p | | | s5 | | | Pa | rse | С | | | | | | | | | | | d5 | | | | | Γ | ) 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | | - | | - | - | | d | d | d | d | d | Rd=sxtb(Rs) | | Ī | ) 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | - | - | - | - | - | - | d | d | d | d | d | Rd=sxth(Rs) | Description | Field name | | |------------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | Rs | No Rs read | | C | Conditional | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | Field to encode register s # Transfer immediate Assign an immediate value to a 32-bit destination register. This instruction supports two types of assignment: - Sign-extend a 16-bit signed immediate value to 32 bits. - Assign a 16-bit unsigned immediate value to either the upper or lower 16 bits of the destination register, leaving the other 16 bits unchanged. | Syntax | Behavior | |---------------|------------------------------------------------------------------------------------------------------------------------------------------| | Rd = #s16 | <pre>apply_extension(#s); Rd = #s;</pre> | | Rdd = #s8 | <pre>if ("#s8&lt;0") { Assembler mapped to: "Rdd=combine(#-1,#s8)"; } else { Assembler mapped to: "Rdd=combine(#0,#s8)"; }</pre> | | Rx.[HL]= #u16 | Rx.h[01]=#u; | # Class: ALU32 (slots 0,1,2,3) #### **Intrinsics** | Rd=#s16 | Word32 Q6_R_equals_I(Word32 Is16) | |-----------|-----------------------------------------------| | Rdd=#s8 | Word64 Q6_P_equals_I(Word32 Is8) | | Rx.H=#u16 | Word32 Q6_Rh_equals_I(Word32 Rx, Word32 Iu16) | | Rx.L=#u16 | Word32 Q6_Rl_equals_I(Word32 Rx, Word32 Iu16) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------| | | ICL | ASS | 3 | Rs | N | 1ajC | )p | N | 1inC | )p | | | х5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | i | i | 1 | x | x | x | x | x | P | Р | i | i | i | i | i | i | i | i | i | i | i | i | i | i | Rx.L=#u16 | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | i | i | 1 | x | x | x | x | x | P | Р | i | i | i | i | i | i | i | i | i | i | i | i | i | i | Rx.H=#u16 | | | ICL | ASS | 3 | Rs | N | 1ajC | )p | N | 1inC | )p | | | | | | Pa | rse | | | | • | | | • | | | | | d5 | | • | | | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | i | i | - | i | i | i | i | i | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=#s16 | Field name Description MajOp Major opcode MinOp Minor opcode Rs No Rs read ICLASS Instruction class Parse Packet/loop parse bits d5 Field to encode register d x5 Field to encode register x # Transfer register Transfer a source register to a destination register. Source and destination registers are either 32 bits or 64 bits. | Rd=Rs | Rd=Rs; | |---------|-----------------------------------------------------| | Rdd=Rss | Assembler mapped to: "Rdd=combine(Rss.H32,Rss.L32)" | Class: ALU32 (slots 0,1,2,3) ### **Intrinsics** | Rd=Rs | Word32 Q6_R_equals_R(Word32 Rs) | |---------|----------------------------------| | Rdd=Rss | Word64 Q6_P_equals_P(Word64 Rss) | # **Encoding** | ; | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------| | Ī | I | CLA | ASS | | Rs | N | lajC | )p | N | 1inO | р | | | s5 | | | Pa | rse | С | | | | | | | | | | | d5 | | | | | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | P | 0 | | - | - | - | | - | | - | d | d | d | d | d | Rd=Rs | | Field name | | Description | |------------|--------------|-------------| | ajOp | Major opcode | | | MajOp | Major opcode | |--------|----------------------------| | MinOp | Minor opcode | | Rs | No Rs read | | C | Conditional | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | | | ## Vector add halfwords Add the two 16-bit halfwords of Rs to the two 16-bit halfwords of Rt. The results optionally saturate to signed or unsigned 16-bit values. #### Syntax Behavior Class: ALU32 (slots 0,1,2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rd=vaddh(Rs,Rt) Word32 Q6_R_vaddh_RR(Word32 Rs, Word32 Rt) Rd=vaddh(Rs,Rt):sat Word32 Q6_R_vaddh_RR_sat(Word32 Rs, Word32 Rt) Rd=vadduh(Rs,Rt):sat Word32 Q6_R_vadduh_RR_sat(Word32 Rs, Word32 Rt) ``` #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------------| | | ICL | ASS | 3 | Р | N | 1ajC | )p | N | 1inO | )p | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=vaddh(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | | | - | d | d | d | d | d | Rd=vaddh(Rs,Rt):sat | | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=vadduh(Rs,Rt):sat | | Field name | | Description | |------------|--------------|-------------| | MaiOp | Major opcode | | | мајор | Major opcode | |--------|----------------------------| | MinOp | Minor opcode | | P | Predicated | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | # Vector average halfwords The <code>vavgh</code> instruction adds the two 16-bit halfwords of Rs to the two 16-bit halfwords of Rd, and shifts the result right by 1 bit. Optionally, add a rounding constant before shifting. The ${\tt vnavgh}$ instruction subtracts the two 16-bit halfwords of Rt from the two 16-bit halfwords of Rs, and shifts the result right by 1 bit. For vector negative average with rounding, see the XTYPE ${\tt vnavgh}$ instruction. #### Syntax Behavior | Rd=vavgh(Rs,Rt) | <pre>for (i=0;i&lt;2;i++) { Rd.h[i]=((Rs.h[i]+Rt.h[i])&gt;&gt;1); }</pre> | |---------------------|--------------------------------------------------------------------------------| | Rd=vavgh(Rs,Rt):rnd | <pre>for (i=0;i&lt;2;i++) { Rd.h[i]=((Rs.h[i]+Rt.h[i]+1)&gt;&gt;1); }</pre> | | Rd=vnavgh(Rt,Rs) | <pre>for (i=0;i&lt;2;i++) { Rd.h[i]=((Rt.h[i]-Rs.h[i])&gt;&gt;1); }</pre> | Class: ALU32 (slots 0,1,2,3) #### **Intrinsics** | Rd=vavgh(Rs,Rt) | Nord32 Q6_R_vavgh_RR(Word32 Rs, Wor | rd32 Rt) | |---------------------|--------------------------------------------|-----------| | Rd=vavgh(Rs,Rt):rnd | Jord32 Q6_R_vavgh_RR_rnd(Word32 Rs,<br>Rt) | , Word32 | | Rd=vnavgh(Rt,Rs) | Jord32 Q6_R_vnavgh_RR(Word32 Rt, Wo | ord32 Rs) | ## **Encoding** | 3 | 1 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------------------| | | IC | LAS | SS | | Р | N | 1ajC | )p | Ν | /linC | )p | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=vavgh(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | • | d | d | d | d | d | Rd=vavgh(Rs,Rt):rnd | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | - | 1 | 1 | s | s | s | s | s | Р | Р | | t | t | t | t | t | • | • | 1 | d | d | d | d | d | Rd=vnavgh(Rt,Rs) | #### Field name Description | MajOp | Major opcode | |--------|----------------------------| | MinOp | Minor opcode | | P | Predicated | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ### Vector subtract halfwords Subtract each of the two halfwords in 32-bit vector Rs from the corresponding halfword in vector Rt. Optionally saturate each 16-bit addition to either a signed or unsigned 16-bit value. Applying saturation to the vsubh instruction clamps the result to the signed range 0x8000 to 0x7fff, whereas applying saturation to vsubuh ensures that the unsigned result is in the range 0 to 0xffff. When saturation is not needed, use the vsubh instruction. #### Syntax Behavior Class: ALU32 (slots 0,1,2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rd=vsubh(Rt,Rs) Word32 Q6_R_vsubh_RR(Word32 Rt, Word32 Rs) Rd=vsubh(Rt,Rs):sat Word32 Q6_R_vsubh_RR_sat(Word32 Rt, Word32 Rs) Rd=vsubuh(Rt,Rs):sat Word32 Q6_R_vsubuh_RR_sat(Word32 Rt, Word32 Rs) ``` #### **Encoding** | 31 | 30 | 0 29 | 9 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|------|----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------------| | | ICI | LAS | SS | | Р | N | 1ajC | )p | Λ | /linC | )p | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 1 | 1 | 1 | | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | | - | - | d | d | d | d | d | Rd=vsubh(Rt,Rs) | | 1 | 1 | 1 | | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | | - | | d | d | d | d | d | Rd=vsubh(Rt,Rs):sat | | 1 | 1 | 1 | | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=vsubuh(Rt,Rs):sat | # Field name Description | rajop | Major opocao | |--------|----------------------------| | MinOp | Minor opcode | | P | Predicated | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Zero extend Zero-extend the least significant byte or halfword from Rs and place the 32-bit result in Rd. | Syntax | Behavior | |-------------|-------------------------------------------| | Rd=zxtb(Rs) | Assembler mapped to: "Rd = and(Rs, #255)" | | Rd=zxth(Rs) | $Rd = zxt_{16->32}(Rs);$ | Class: ALU32 (slots 0,1,2,3) ## **Intrinsics** | Rd=zxtb(Rs) | Word32 Q6_R_zxtb_R(Word32 Rs) | |-------------|-------------------------------| | Rd=zxth(Rs) | Word32 Q6_R_zxth_R(Word32 Rs) | # **Encoding** s5 | 3 | 1 30 | 29 | 9 2 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------| | Ī | ICI | LAS | SS | | Rs | N | 1ajC | )p | N | 1inO | )p | | | s5 | | | Pa | rse | С | | | | | | | | | | | d5 | | | | | ( | 1 | 1 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Ρ | 0 | - | | | - | - | - | - | | d | d | d | d | d | Rd=zxth(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | MajOp | Major opcode | | | MinOp | Minor opcode | | | Rs | No Rs read | | | C | Conditional | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | Field to encode register s ## 11.1.2 ALU32 PERM The ALU32 PERM instruction subclass includes instructions that rearrange or perform format conversion on vector data types. #### Combine words into doubleword Combine halfwords or words into larger values. In a halfword combine, either the high or low halfword of the first source register is transferred to the most-significant halfword of the destination register, while either the high or low halfword of the second source register is transferred to the least-significant halfword of the destination register. Source and destination registers are 32 bits. In a word combine, the first source register is transferred to the most-significant word of the destination register, while the second source register is transferred to the least-significant word of the destination register. Source registers are 32 bits and the destination register is 64 bits. In a variant of word combine, signed 8-bit immediate values (instead of registers) are transferred to the most- and least-significant words of the 64-bit destination register. Optionally, one of the immediate values can be 32 bits. **Syntax** **Behavior** | - Jillax | Donavior | |-----------------------------|-----------------------------------------------------------| | Rd=combine(Rt.[HL],Rs.[HL]) | Rd = (Rt.uh[01] << 16) Rs.uh[01]; | | Rdd=combine(#s8,#S8) | <pre>apply_extension(#s); Rdd.w[0]=#S; Rdd.w[1]=#s;</pre> | | Rdd=combine(#s8,#U6) | <pre>apply_extension(#U); Rdd.w[0]=#U; Rdd.w[1]=#s;</pre> | | Rdd=combine(#s8,Rs) | <pre>apply_extension(#s); Rdd.w[0]=Rs; Rdd.w[1]=#s;</pre> | | Rdd=combine(Rs, #s8) | <pre>apply_extension(#s); Rdd.w[0]=#s; Rdd.w[1]=Rs;</pre> | | Rdd=combine(Rs,Rt) | Rdd.w[0]=Rt;<br>Rdd.w[1]=Rs; | # Class: ALU32 (slots 0,1,2,3) # Intrinsics | Rd=combine(Rt.H,Rs.H) | Word32 Q6_R_combine_RhRh(Word32 Rt, Word32 Rs) | |-----------------------|------------------------------------------------| | Rd=combine(Rt.H,Rs.L) | Word32 Q6_R_combine_RhRl(Word32 Rt, Word32 Rs) | | Rd=combine(Rt.L,Rs.H) | Word32 Q6_R_combine_RlRh(Word32 Rt, Word32 Rs) | | Rd=combine(Rt.L,Rs.L) | Word32 Q6_R_combine_RlR1(Word32 Rt, Word32 Rs) | | Rdd=combine(#s8,#S8) | Word64 Q6_P_combine_II(Word32 Is8, Word32 IS8) | | Rdd=combine(#s8,Rs) | Word64 Q6_P_combine_IR(Word32 Is8, Word32 Rs) | | Rdd=combine(Rs, #s8) | Word64 Q6_P_combine_RI(Word32 Rs, Word32 Is8) | | Rdd=combine(Rs,Rt) | Word64 Q6_P_combine_RR(Word32 Rs, Word32 Rt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------------------| | | ICL | ASS | ; | Rs | N | 1ajC | )p | N | /linC | )p | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 1 | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=combine(Rs,#s8) | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 1 | · | i | · | i | i | i | i | i | d | d | d | d | d | Rdd=combine(#s8,Rs) | | | ICL | ASS | ; | Rs | N | 1ajC | )p | Ν | /linC | )p | | | | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | I | ı | I | I | ı | I | I | Р | Р | ı | i | i | i | i | - | i | i | i | d | d | d | d | d | Rdd=combine(#s8,#S8) | | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | - | ı | ı | ı | I | ı | Р | Р | ı | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=combine(#s8,#U6) | | | ICL | ASS | ; | Р | N | 1ajC | )p | Ν | /linC | )p | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | | | d | d | d | d | d | Rd=combine(Rt.H,Rs.H) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | | - | d | d | d | d | d | Rd=combine(Rt.H,Rs.L) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | | | d | d | d | d | d | Rd=combine(Rt.L,Rs.H) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=combine(Rt.L,Rs.L) | | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | - | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rdd=combine(Rs,Rt) | ## Field name Description | MajOp | Major opcode | |--------|----------------------------| | MinOp | Minor opcode | | Rs | No Rs read | | P | Predicated | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | ## Mux Select between two source registers based on the least-significant bit of a predicate register. If the bit is 1, transfer the first source register to the destination register; otherwise, transfer the second source register. Source and destination registers are 32 bits. In a variant of $\max$ , signed 8-bit immediate values can be used instead of registers for either or both source operands. For 64-bit versions of this instruction, see the XTYPE vmux instruction. | Syntax | Behavior | |----------------------|------------------------------------------------------------------------------------| | Rd=mux(Pu, #s8, #S8) | <pre>PREDUSE_TIMING;<br/>apply_extension(#s);<br/>(Pu[0]) ? (Rd=#s):(Rd=#S);</pre> | | Rd=mux(Pu, #s8, Rs) | <pre>PREDUSE_TIMING;<br/>apply_extension(#s);<br/>(Pu[0]) ? (Rd=#s):(Rd=Rs);</pre> | | Rd=mux(Pu,Rs,#s8) | <pre>PREDUSE_TIMING;<br/>apply_extension(#s);<br/>(Pu[0]) ? (Rd=Rs):(Rd=#s);</pre> | | Rd=mux(Pu,Rs,Rt) | <pre>PREDUSE_TIMING; (Pu[0]) ? (Rd=Rs):(Rd=Rt);</pre> | Class: ALU32 (slots 0,1,2,3) ## **Intrinsics** | Rd=mux(Pu, #s8, #S8) | Word32 Q6_R_mux_pII(Byte Pu, Word32 Is8, Wor<br>IS8) | :d32 | |----------------------|------------------------------------------------------|----------| | Rd=mux(Pu, #s8, Rs) | Word32 Q6_R_mux_pIR(Byte Pu, Word32 Is8, Wor | d32 Rs) | | Rd=mux(Pu,Rs,#s8) | Word32 Q6_R_mux_pRI(Byte Pu, Word32 Rs, Word | l32 Is8) | | Rd=mux(Pu,Rs,Rt) | Word32 Q6_R_mux_pRR(Byte Pu, Word32 Rs, Word | l32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------| | | ICL | ASS | ; | Rs | ٨ | ЛаjС | )p | | u | 12 | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | u | u | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=mux(Pu,Rs,#s8) | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | u | u | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=mux(Pu,#s8,Rs) | | | ICL | ASS | 3 | Rs | | | u | 11 | | | • | | • | | • | Pa | rse | | • | • | | • | | • | • | | | | d5 | • | | | | 0 | 1 | 1 | 1 | 1 | 0 | 1 | u | u | I | ı | ı | I | I | I | I | Р | Р | I | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=mux(Pu,#s8,#S8) | | | ICL | ASS | 3 | Р | ٨ | ЛајС | )p | | | | | • | s5 | | | Pa | rse | | | | t5 | | | | u | 2 | | | d5 | | | | | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | - | - | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | u | u | d | d | d | d | d | Rd=mux(Pu,Rs,Rt) | Major opcode | Field name | | Description | |------------|----------------------------|-------------| | MajOp | Major opcode | - | | MinOp | Minor opcode | | | Rs | No Rs read | | | P | Predicated | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | u1 | Field to encode register u | | | u2 | Field to encode register u | | # Shift word by 16 The aslh instruction performs an arithmetic left shift of the 32-bit source register by 16 bits (one halfword). The lower 16 bits of the destination are zero-filled. The asrh instruction performs an arithmetic right shift of the 32-bit source register by 16 bits (one halfword). The upper 16 bits of the destination are sign-extended. | Syntax | Behavior | |---------------|----------------| | Rd = aslh(Rs) | Rd = Rs << 16; | | Rd = asrh(Rs) | Rd = Rs >> 16; | Class: ALU32 (slots 0,1,2,3) ## **Intrinsics** | Rd=aslh(Rs) | Word32 Q6_R_aslh_R(Word32 Rs) | |-------------|-------------------------------| | Rd=asrh(Rs) | Word32 Q6 R asrh R(Word32 Rs) | ### **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------| | | ICI | ASS | 3 | Rs | N | 1ajC | )p | N | 1inC | )p | | | s5 | | | Pa | rse | С | | | | | | | | | | | d5 | | | | | 1 | ) 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | - | - | | - | - | - | - | - | d | d | d | d | d | Rd=aslh(Rs) | | ( | ) 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | - | - | - | - | - | - | d | d | d | d | d | Rd=asrh(Rs) | | ASS | | Rs | ٨ | /lajC | )p | N | /linO | )p | | | s5 | | | Pa | rse | С | | | | | | | | | | d5 | | | | |-----|---|-----|-----|-------|----|----------|-------|----|---|---|----|---|---|----|-----|---|---|----|-----|------|----|---|---|---|---|----|---|---|-------------| | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | Ø | s | s | s | Р | Р | 0 | - | - | - | - | - | - | - | d | d | d | d | d | Rd=aslh(Rs) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | Ø | s | s | s | Р | Р | 0 | - | - | - | - | - | - | • | d | d | d | d | d | Rd=asrh(Rs) | | | | Fie | eld | na | me | <b>)</b> | | | | | | | | | | | D | es | cri | ptic | on | | | | | | | | | Major opcode Maj0p MinOp Minor opcode No Rs read Rs Conditional Instruction class ICLASS Packet/loop parse bits Parse d5 Field to encode register d s5 Field to encode register s # Pack high and low halfwords Pack together the most-significant halfwords from Rs and Rt into the most-significant word of register pair Rdd, and the least-significant halfwords from Rs and Rt into the least-significant halfword of Rdd. Syntax Behavior | Rdd=packhl(Rs,Rt) | Rdd.h[0]=Rt.h[0];<br>Rdd.h[1]=Rs.h[0]; | |-------------------|----------------------------------------| | | Rdd.h[2]=Rt.h[1];<br>Rdd.h[3]=Rs.h[1]; | Class: ALU32 (slots 0,1,2,3) ### **Intrinsics** Rdd=packhl(Rs,Rt) Word64 Q6 P packhl RR(Word32 Rs, Word32 Rt) ## **Encoding** d5 s5 t5 | 3 | 1 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|---|-----|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------| | | I | CLA | ASS | | Р | N | 1ajC | )p | N | /linO | р | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | • | ı | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | - | - | s | ø | s | s | s | Р | P | - | t | t | t | t | t | - | | - | d | d | d | d | d | Rdd=packhl(Rs,Rt) | | Field name | | Description | |------------|------------------------|-------------| | MajOp | Major opcode | | | MinOp | Minor opcode | | | P | Predicated | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | Field to encode register d Field to encode register s Field to encode register t ## 11.1.3 ALU32 PRED The ALU32 PRED instruction subclass includes instructions that perform conditional arithmetic and logical operations based on the values stored in a predicate register, and that produce predicate results. They are executable on any slot. ## Conditional add If the least-significant bit of predicate Pu is set, add a 32-bit source register to either another register or an immediate value. The result is placed in 32-bit destination register. If the predicate is false, the instruction does nothing. #### Class: ALU32 (slots 0,1,2,3) #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|-------|----|----|------|----|----|----|----|----|----|----|-----|--------|----|----|----|---|---|----|---|---|---|---|----|---|---|--------------------------------| | | ICL | ASS | 3 | Rs | ٨ | /lajC | )p | PS | u | 12 | | | s5 | | | Pa | rse | D<br>N | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | u | u | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | d | d | d | d | d | if (Pu) Rd=add(Rs,#s8) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | u | u | s | s | s | s | s | Р | Р | 1 | i | i | i | i | i | i | i | i | d | d | d | d | d | if (Pu.new) Rd=add(Rs,#s8) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | u | u | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | d | d | d | d | d | if (!Pu) Rd=add(Rs,#s8) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | u | u | s | s | s | s | s | Р | Р | 1 | i | i | i | i | i | i | i | i | d | d | d | d | d | if (!Pu.new)<br>Rd=add(Rs,#s8) | | | ICL | ASS | 3 | Р | ٨ | ЛајС | )p | N | linC | )p | | | s5 | | | Pa | rse | D<br>N | | | t5 | | • | PS | u | 2 | | | d5 | | | | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu) Rd=add(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu) Rd=add(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 0 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu.new) Rd=add(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 0 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu.new) Rd=add(Rs,Rt) | ### Field name Description Majop Major opcode MinOp Minor opcode Rs No Rs read DN Dot-new PS Predicate sense P Predicated | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | u2 | Field to encode register u | | | | | | # Conditional shift halfword Conditionally shift a halfword. The aslh instruction performs an arithmetic left shift of the 32-bit source register by 16 bits (one halfword), and zero-fills the lower 16 bits of the destination. The asrh instruction performs an arithmetic right shift of the 32-bit source register by 16 bits (one halfword); and sign-extends the upper 16 bits of the destination. #### Syntax Behavior ``` if ([!]Pu[.new]) Rd=aslh(Rs) if([!]Pu[.new][0]) { Rd=Rs<<16; } else { NOP; } if ([!]Pu[.new]) Rd=asrh(Rs) if([!]Pu[.new][0]) { Rd=Rs>>16; } else { NOP; } ``` Class: ALU32 (slots 0,1,2,3) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------| | | ICL | ASS | 3 | Rs | N | ЛаjС | )p | N | /linC | )p | | | s5 | | | Pa | rse | С | | S | dn | u | 2 | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | P | Р | 1 | - | 0 | 0 | u | u | - | - | - | d | d | d | d | d | if (Pu) Rd=aslh(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | - | 0 | 1 | u | u | - | - | - | d | d | d | d | d | if (Pu.new) Rd=aslh(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | - | 1 | 0 | u | u | - | - | - | d | d | d | d | d | if (!Pu) Rd=aslh(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | - | 1 | 1 | u | u | - | - | - | d | d | d | d | d | if (!Pu.new) Rd=aslh(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | P | Р | 1 | - | 0 | 0 | u | u | - | - | - | d | d | d | d | d | if (Pu) Rd=asrh(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | - | 0 | 1 | u | u | - | - | - | d | d | d | d | d | if (Pu.new) Rd=asrh(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | - | 1 | 0 | u | u | - | - | - | d | d | d | d | d | if (!Pu) Rd=asrh(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | P | Р | 1 | - | 1 | 1 | u | u | - | - | - | d | d | d | d | d | if (!Pu.new) Rd=asrh(Rs) | | Field name | Description | |------------|-------------| | | | MajOpMajor opcodeMinOpMinor opcodeRsNo Rs readCConditionalSPredicate sense dn **Dot-new** ICLASS Instruction class Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s u2 Field to encode register u # Conditional combine When the least-significant bit of predicate Pu is set, the most-significant word of destination Rdd is taken from the first source register Rs, while the least-significant word is taken from the second source register Rt. If the predicate is false, this instruction does nothing. ### Syntax Behavior ``` if ([!]Pu[.new]) Rdd=combine(Rs,Rt) if ([!]Pu[.new][0]) { Rdd.w[0]=Rt; Rdd.w[1]=Rs; } else { NOP; } ``` Class: ALU32 (slots 0,1,2,3) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|--------|----|----|----|---|---|----|---|---|---|---|----|---|---|------------------------------------| | | ICL. | ASS | 3 | Р | N | ЛајС | )p | N | /linC | )p | | | s5 | | | Pa | rse | D<br>N | | | t5 | | | PS | u | 2 | | | d5 | | | | | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu) Rdd=combine(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu)<br>Rdd=combine(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu.new)<br>Rdd=combine(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu.new)<br>Rdd=combine(Rs,Rt) | ### Field name Description | DN | Dot-new | |--------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | P | Predicated | | PS | Predicate sense | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | u2 | Field to encode register u | | | | # Conditional logical operations When the least-significant bit of predicate Pu is set, perform a logical operation on the source values, and place the result in a 32-bit destination register. When the predicate is false, the instruction does nothing. #### **Syntax Behavior** if ([!]Pu[.new]) if([!]Pu[.new][0]){ Rd=and(Rs,Rt) Rd=Rs&Rt; } else { NOP; if ([!]Pu[.new]) if([!]Pu[.new][0]){ Rd=or(Rs,Rt) Rd=Rs|Rt; } else { NOP; if ([!]Pu[.new]) if([!]Pu[.new][0]){ Rd=xor(Rs,Rt) Rd=Rs^Rt; } else { NOP; Class: ALU32 (slots 0,1,2,3) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|--------|----|----|----|---|---|----|---|---|---|---|----|---|---|----------------------------| | | ICL. | ASS | 3 | Р | N | ЛајС | )p | N | linC | )p | | | s5 | | | Pa | rse | D<br>N | | | t5 | | | PS | u | 2 | | | d5 | | | | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu) Rd=and(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu) Rd=and(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu.new) Rd=and(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu.new) Rd=and(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu) Rd=or(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu) Rd=or(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu.new) Rd=or(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu.new) Rd=or(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu) Rd=xor(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu) Rd=xor(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 1 | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu.new) Rd=xor(Rs,Rt) | | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | - | 1 | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu.new) Rd=xor(Rs,Rt) | #### Field name Description DN Dot-new MajOp Major opcode MinOp Minor opcode P Predicated PS Predicate sense ICLASS Instruction class | Field name | | Description | |------------|----------------------------|-------------| | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | u2 | Field to encode register u | | | | | | # Conditional subtract When the least-significant bit of predicate Pu is set, subtract a 32-bit source register Rt from register Rs, and place the result in a 32-bit destination register. When the predicate is false, the instruction does nothing. ## Syntax Behavior ``` if ([!]Pu[.new]) Rd=sub(Rt,Rs) if([!]Pu[.new][0]){ Rd=Rt-Rs; } else { NOP; } ``` Class: ALU32 (slots 0,1,2,3) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|------|----|----|----|----|----|----|----|-----|--------|----|----|----|---|---|----|---|---|---|---|----|---|---|----------------------------| | | ICL | ASS | 3 | Р | N | 1ajC | )p | N | linC | )p | | | s5 | | | Pa | rse | D<br>N | | | t5 | | | PS | u | 2 | | | d5 | | | | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu) Rd=sub(Rt,Rs) | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu) Rd=sub(Rt,Rs) | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | u | u | d | d | d | d | d | if (Pu.new) Rd=sub(Rt,Rs) | | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | - | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | u | u | d | d | d | d | d | if (!Pu.new) Rd=sub(Rt,Rs) | ### Field name Description | DN | Dot-new | |--------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | P | Predicated | | PS | Predicate sense | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | u2 | Field to encode register u | # Conditional sign extend Conditionally sign-extend the LSB or halfword from Rs and put the 32-bit result in Rd. NOP; Class: ALU32 (slots 0,1,2,3) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------| | | ICL | ASS | 3 | Rs | N | ЛајС | )p | N | /linC | )p | | | s5 | | | Pa | rse | С | | S | dn | u | 2 | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | - | 0 | 0 | u | u | - | - | - | d | d | d | d | d | if (Pu) Rd=sxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | 1 | - | 0 | 1 | u | u | - | - | - | d | d | d | d | d | if (Pu.new) Rd=sxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | - | 1 | 0 | u | u | - | - | - | d | d | d | d | d | if (!Pu) Rd=sxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | 1 | - | 1 | 1 | u | u | - | - | - | d | d | d | d | d | if (!Pu.new) Rd=sxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 1 | - | 0 | 0 | u | u | - | - | - | d | d | d | d | d | if (Pu) Rd=sxth(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 1 | - | 0 | 1 | u | u | - | - | - | d | d | d | d | d | if (Pu.new) Rd=sxth(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 1 | - | 1 | 0 | u | u | - | - | - | d | d | d | d | d | if (!Pu) Rd=sxth(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 1 | - | 1 | 1 | u | u | - | - | - | d | d | d | d | d | if (!Pu.new) Rd=sxth(Rs) | | Field name | | Description | |------------|--------------|-------------| | MajOp | Major opcode | | | MinOp | Minor opcode | |--------|----------------------------| | Rs | No Rs read | | C | Conditional | | S | Predicate sense | | dn | Dot-new | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | u2 | Field to encode register u | | | | # Conditional transfer When the LSB of predicate Pu is set, transfer register Rs or a signed immediate into destination Rd. When the predicate is false, this instruction does nothing. | Syntax | Behavior | |--------|----------| |--------|----------| | if ([!]Pu[.new]) Rd=#s12 | <pre>apply_extension(#s); if ([!]Pu[.new][0]) Rd=#s; else NOP;</pre> | |--------------------------|----------------------------------------------------------------------| | if ([!]Pu[.new]) Rd=Rs | Assembler mapped to: "if ([!]Pu[.new]) Rd=add(Rs,#0)" | | if ([!]Pu[.new]) Rdd=Rss | Assembler mapped to: "if ([!]Pu[.new]) Rdd=combine(Rss.H32,Rss.L32)" | Class: ALU32 (slots 0,1,2,3) # **Encoding** Maj0p | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|----|----|----|----|----|----|----|----|-----|--------|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------------| | | ICL | ASS | ; | Rs | N | 1ajC | )p | PS | u | 2 | | | | | | Pa | rse | D<br>N | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | u | u | 0 | i | _ | i | i | Р | Р | 0 | i | i | i | i | i | i | i | - | d | d | d | d | d | if (Pu) Rd=#s12 | | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | u | u | 0 | i | | i | i | Ρ | P | 1 | i | i | i | i | i | i | i | _ | d | d | d | d | d | if (Pu.new) Rd=#s12 | | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | u | u | 0 | i | _ | i | i | Р | Р | 0 | i | i | i | i | i | i | i | - | d | d | d | d | d | if (!Pu) Rd=#s12 | | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | u | u | 0 | i | i | i | i | Р | Р | 1 | i | i | i | i | i | i | i | i | d | d | d | d | d | if (!Pu.new) Rd=#s12 | | ription | |---------| | | | MinOp | Minor opcode | |--------|----------------------------| | Rs | No Rs read | | DN | Dot-new | | PS | Predicate sense | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | u2 | Field to encode register u | Major opcode # Conditional zero extend Conditionally zero-extend the LSB or halfword from Rs and put the 32-bit result in Rd. ## Class: ALU32 (slots 0,1,2,3) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|------|----|----|-------|------------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------| | | ICL | ASS | ; | Rs | N | ЛаjC | р | N | /linC | <b>)</b> p | | | s5 | | | Pa | rse | С | | S | dn | u | 2 | | | | | | d5 | | | | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | - | 0 | 0 | u | u | - | - | - | d | d | d | d | d | if (Pu) Rd=zxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | 1 | - | 0 | 1 | u | u | - | - | - | d | d | d | d | d | if (Pu.new) Rd=zxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | 1 | - | 1 | 0 | u | u | - | - | - | d | d | d | d | d | if (!Pu) Rd=zxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | - | 1 | 1 | u | u | - | - | - | d | d | d | d | d | if (!Pu.new) Rd=zxtb(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 1 | - | 0 | 0 | u | u | • | - | - | d | d | d | d | d | if (Pu) Rd=zxth(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 1 | - | 0 | 1 | u | u | - | - | - | d | d | d | d | d | if (Pu.new) Rd=zxth(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | P | 1 | - | 1 | 0 | u | u | • | - | - | d | d | d | d | d | if (!Pu) Rd=zxth(Rs) | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 1 | - | 1 | 1 | u | u | • | - | - | d | d | d | d | d | if (!Pu.new) Rd=zxth(Rs) | | Field name | | Description | |------------|--------------|-------------| | MajOp | Major opcode | | | MinOp | Minor opcode | |--------|----------------------------| | Rs | No Rs read | | C | Conditional | | S | Predicate sense | | dn | Dot-new | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | u2 | Field to encode register u | # Compare The register form compares two 32-bit registers for unsigned greater than, greater than, or equal. The immediate form compares a register against a signed or unsigned immediate value. The 8-bit predicate register Pd is set to all 1s or all 0s depending on the result. For 64-bit versions of this instruction, see the XTYPE compare instructions. | Syntax | Behavior | |------------------------|-----------------------------------------------------------------------------------------------------------------------------------------| | Pd=[!]cmp.eq(Rs, #s10) | <pre>apply_extension(#s); Pd = Rs[!]=#s ? 0xff : 0x00;</pre> | | Pd=[!]cmp.eq(Rs,Rt) | Pd = Rs[!]=Rt ? 0xff : 0x00; | | Pd=[!]cmp.gt(Rs,#s10) | <pre>apply_extension(#s); Pd = Rs&lt;=#s ? 0xff : 0x00;</pre> | | Pd=[!]cmp.gt(Rs,Rt) | Pd = Rs<=Rt ? 0xff : 0x00; | | Pd=[!]cmp.gtu(Rs,#u9) | <pre>apply_extension(#u); Pd = Rs.uw[0]&lt;=#u.uw[0] ? 0xff : 0x00;</pre> | | Pd=[!]cmp.gtu(Rs,Rt) | Pd = Rs.uw[0] <= Rt.uw[0] ? 0xff : 0x00; | | Pd=cmp.ge(Rs, #s8) | Assembler mapped to: "Pd = cmp.gt(Rs, #s8-1)" | | Pd=cmp.geu(Rs,#u8) | <pre>if ("#u8 == 0") { Assembler mapped to: "Pd = cmp.eq(Rs,Rs)"; } else { Assembler mapped to: "Pd=cmp.gtu(Rs,#u8-1)"; }</pre> | | Pd=cmp.lt(Rs,Rt) | Assembler mapped to: "Pd=cmp.gt(Rt,Rs)" | | Pd=cmp.ltu(Rs,Rt) | Assembler mapped to: "Pd=cmp.gtu(Rt,Rs)" | ## Class: ALU32 (slots 0,1,2,3) ## **Intrinsics** | Pd=!cmp.eq(Rs, #s10) | Byte Q6_p_not_cmp_eq_RI(Word32 Rs, Word32 Is10) | |----------------------|-------------------------------------------------| | Pd=!cmp.eq(Rs,Rt) | Byte Q6_p_not_cmp_eq_RR(Word32 Rs, Word32 Rt) | | Pd=!cmp.gt(Rs, #s10) | Byte Q6_p_not_cmp_gt_RI(Word32 Rs, Word32 Is10) | | Pd=!cmp.gt(Rs,Rt) | Byte Q6_p_not_cmp_gt_RR(Word32 Rs, Word32 Rt) | | Pd=!cmp.gtu(Rs,#u9) | Byte Q6_p_not_cmp_gtu_RI(Word32 Rs, Word32 Iu9) | | Pd=!cmp.gtu(Rs,Rt) | Byte Q6_p_not_cmp_gtu_RR(Word32 Rs, Word32 Rt) | | Pd=cmp.eq(Rs, #s10) | Byte Q6_p_cmp_eq_RI(Word32 Rs, Word32 Is10) | | Pd=cmp.eq(Rs,Rt) | Byte Q6_p_cmp_eq_RR(Word32 Rs, Word32 Rt) | | Pd=cmp.ge(Rs, #s8) | Byte Q6_p_cmp_ge_RI(Word32 Rs, Word32 Is8) | | Pd=cmp.geu(Rs, #u8) | Byte Q6_p_cmp_geu_RI(Word32 Rs, Word32 Iu8) | | Pd=cmp.gt(Rs, #s10) | Byte Q6_p_cmp_gt_RI(Word32 Rs, Word32 Is10) | | Pd=cmp.gt(Rs,Rt) | Byte Q6_p_cmp_gt_RR(Word32 Rs, Word32 Rt) | |--------------------|---------------------------------------------| | Pd=cmp.gtu(Rs,#u9) | Byte Q6_p_cmp_gtu_RI(Word32 Rs, Word32 Iu9) | | Pd=cmp.gtu(Rs,Rt) | Byte Q6_p_cmp_gtu_RR(Word32 Rs, Word32 Rt) | | Pd=cmp.lt(Rs,Rt) | Byte Q6_p_cmp_lt_RR(Word32 Rs, Word32 Rt) | | Pd=cmp.ltu(Rs,Rt) | Byte Q6_p_cmp_ltu_RR(Word32 Rs, Word32 Rt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|-------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------| | | ICL/ | ASS | 3 | Rs | N | /lajC | )p | N | 1inO | )p | | | s5 | | | Pa | rse | | | | | | | | | | | | | d | 2 | | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | 0 | 0 | 0 | d | d | Pd=cmp.eq(Rs,#s10) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | 1 | 0 | 0 | d | d | Pd=!cmp.eq(Rs,#s10) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | 0 | 0 | 0 | d | d | Pd=cmp.gt(Rs,#s10) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | i | s | ß | s | s | s | P | Р | i | i | i | i | i | i | i | i | i | 1 | 0 | 0 | d | d | Pd=!cmp.gt(Rs,#s10) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | 0 | 0 | 0 | d | d | Pd=cmp.gtu(Rs,#u9) | | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | 1 | 0 | 0 | d | d | Pd=!cmp.gtu(Rs,#u9) | | | ICL/ | ASS | 3 | Р | Ν | /lajC | )p | N | 1inO | )p | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | d | 2 | | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | | 0 | 0 | s | S | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | 0 | 0 | 0 | d | d | Pd=cmp.eq(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | - | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | 1 | 0 | 0 | d | d | Pd=!cmp.eq(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | | 1 | 0 | s | ß | s | s | s | P | Р | - | t | t | t | t | t | - | - | - | 0 | 0 | 0 | d | d | Pd=cmp.gt(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | - | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | 1 | 0 | 0 | d | d | Pd=!cmp.gt(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | - | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | 0 | 0 | 0 | d | d | Pd=cmp.gtu(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | - | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | 1 | 0 | 0 | d | d | Pd=!cmp.gtu(Rs,Rt) | # Description | MajOp | Major opcode | |--------|----------------------------| | MinOp | Minor opcode | | Rs | No Rs read | | P | Predicated | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Compare to general register The register form compares two 32-bit registers for unsigned greater than, greater than, or equal. The immediate form compares a register against a signed or unsigned immediate value. The resulting zero or one is placed in a general register. ## Syntax Behavior | Rd = [!]cmp.eq(Rs, #s8) | <pre>apply_extension(#s); Rd = (Rs[!] = #s);</pre> | |-----------------------------------|----------------------------------------------------| | <pre>Rd = [!]cmp.eq(Rs, Rt)</pre> | Rd = (Rs[!] = Rt); | Class: ALU32 (slots 0,1,2,3) #### **Intrinsics** | Rd=!cmp.eq(Rs, #s8) | Word32 Q6_R_not_cmp_eq_RI(Word32 Rs, Word32 Is8) | | |---------------------|--------------------------------------------------|--| | Rd=!cmp.eq(Rs,Rt) | Word32 Q6_R_not_cmp_eq_RR(Word32 Rs, Word32 Rt) | | | Rd=cmp.eq(Rs, #s8) | Word32 Q6_R_cmp_eq_RI(Word32 Rs, Word32 Is8) | | | Rd=cmp.eq(Rs,Rt) | Word32 Q6_R_cmp_eq_RR(Word32 Rs, Word32 Rt) | | # **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|----|------|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------| | | ICL | ASS | 3 | Rs | N | 1ajC | )p | N | /linC | )p | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | C | 1 | 1 | 1 | 0 | 0 | 1 | 1 | | 1 | 0 | s | s | s | s | s | Р | Р | 1 | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=cmp.eq(Rs,#s8) | | C | 1 | 1 | 1 | 0 | 0 | 1 | 1 | - | 1 | 1 | s | s | s | s | s | Р | Р | 1 | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=!cmp.eq(Rs,#s8) | | | ICL | ASS | 3 | Р | N | 1ajC | )p | N | linC | )p | | | s5 | | • | Pa | rse | | | • | t5 | | • | | • | • | | • | d5 | | • | | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=cmp.eq(Rs,Rt) | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=!cmp.eq(Rs,Rt) | | Field name | Description | |------------|-------------| | | | | MajOp | Major opcode | |--------|----------------------------| | MinOp | Minor opcode | | Rs | No Rs read | | P | Predicated | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # 11.2 CR The CR instruction class includes instructions that manage control registers, including hardware looping, modulo addressing, and status flags. CR instructions are executable on slot 3. ## End loop instructions The endloop instructions mark the end of a hardware loop. If the loop count (LC) register indicates that a loop should continue to iterate, decrement the LC register and change the program flow to the address in the start address (SA) register. The endloopN instruction is a pseudo-instruction encoded in bits 14 through 15 of each instruction. Therefore, no distinct 32-bit encoding exists for this instruction. | Syntax | Behavior | |-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | endloop0 | <pre>if (USR.LPCFG) { if (USR.LPCFG==1) { P3=0xff; } USR.LPCFG=USR.LPCFG-1; } if (LCO&gt;1) { PC=SA0; LCO=LCO-1;}</pre> | | endloop01 | <pre>if (USR.LPCFG) { if (USR.LPCFG==1) { P3=0xff; } USR.LPCFG=USR.LPCFG-1; } if (LCO&gt;1) { PC=SA0; LCO=LCO-1; } else { if (LC1&gt;1) { PC=SA1; LC1=LC1-1; } }</pre> | | endloop1 | <pre>if (LC1&gt;1) { PC=SA1; LC1=LC1-1;}</pre> | ## Class: CR #### **Notes** - This instruction cannot be grouped in a packet with any program flow instructions. - The Next PC value is the address immediately following the last instruction in the packet containing this instruction. - The PC value is the address of the start of the packet ## Corner detection acceleration The fastcorner9 instruction takes the Ps and Pt values and treats them as a circular bit string. If any contiguous nine bits are set around the circle, the result is true, false otherwise. The sense can be optionally inverted. This instruction accelerates FAST corner detection. #### Syntax Behavior Class: CR (slot 2,3) #### **Notes** This instruction can execute on either slot2 or slot3, even though it is a CR-type #### **Intrinsics** ``` Pd=!fastcorner9(Ps,Pt) Byte Q6_p_not_fastcorner9_pp(Byte Ps, Byte Pt) Pd=fastcorner9(Ps,Pt) Byte Q6_p_fastcorner9_pp(Byte Ps, Byte Pt) ``` ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|------------------------| | | ICL | ASS | 3 | | sm | | | | | | | | | S | 2 | Pa | rse | | | | | ť | 2 | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | | - | s | s | Р | Р | 1 | - | - | - | t | t | 1 | - | - | 1 | | - | d | d | Pd=fastcorner9(Ps,Pt) | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | | - | s | s | Р | Р | 1 | - | - | - | t | t | 1 | - | - | 1 | - | | d | d | Pd=!fastcorner9(Ps,Pt) | | Field name | | Description | |------------|----------|-------------| | | C m .: n | | | sm | Supervisor mode only | |--------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s2 | Field to encode register s | | t2 | Field to encode register t | # Logical reductions on predicates The any8instruction sets a destination predicate register to 0xff if any of the low 8 bits in source predicate register Ps are set. Otherwise, the predicate is set to 0x00. The all8 instruction sets a destination predicate register to 0xff if all of the low 8 bits in the source predicate register Ps are set. Otherwise, the predicate is set to 0x00. #### Syntax Behavior | Pd = all8(Ps) | <pre>PREDUSE_TIMING; (Ps==0xff) ? (Pd=0xff) : (Pd=0x00);</pre> | |---------------|----------------------------------------------------------------| | Pd = any8(Ps) | PREDUSE_TIMING; Ps ? (Pd=0xff) : (Pd=0x00); | Class: CR (slot 2,3) #### **Notes** ■ This instruction can execute on either slot2 or slot3, even though it is a CR-type #### **Intrinsics** | Pd=all8(Ps) | Byte Q6_p_all8_p(Byte Ps) | |--------------|---------------------------| | Pd=any8 (Ps) | Byte Q6_p_any8_p(Byte Ps) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------| | | ICL | ASS | 3 | | sm | | | | | | | | | s | 2 | Pa | rse | | | | | | | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | - | s | s | Р | Р | 0 | - | | - | - | | - | - | - | - | - | - | d | d | Pd=any8(Ps) | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | - | - | s | s | Р | Р | 0 | - | - | - | - | | - | - | - | - | - | - | d | d | Pd=all8(Ps) | #### Field name Description | sm | Supervisor mode only | |--------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s2 | Field to encode register s | Cuntar # Looping instructions loopN is a single instruction that sets up a hardware loop. The N in the instruction name indicates the set of loop registers to use. <code>loop0</code> is the innermost loop, while <code>loop1</code> is the outer loop. The loopN instruction first sets the start address (SA) register based on a PC-relative immediate add. The relative immediate is added to the PC and stored in SA. The loop count (LC) register is set to either an unsigned immediate or to a register value. Dahaviar | Syntax | Behavior | |-------------------|--------------------------------------------------------------------------------------------| | loop0(#r7:2,#U10) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=#U; USR.LPCFG=0;</pre> | | loop0(#r7:2,Rs) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=Rs; USR.LPCFG=0;</pre> | | loop1(#r7:2,#U10) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA1=PC+#r; LC1=#U;</pre> | | loop1(#r7:2,Rs) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA1=PC+#r; LC1=Rs;</pre> | Class: CR (slot 3) #### **Notes** - This instruction cannot execute in the last address of a hardware loop. - The Next PC value is the address immediately following the last instruction in the packet containing this instruction. - The PC value is the address of the start of the packet - A PC-relative address is formed by taking the decoded immediate value and adding it to the current PC value. #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------| | | ICL | ASS | ; | | sm | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | - | - | - | i | i | - | - | - | loop0(#r7:2,Rs) | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | - | - | - | i | i | - | | - | loop1(#r7:2,Rs) | | | ICL/ | ASS | ; | | sm | | | | | | • | • | | • | • | Pa | rse | | • | | • | | • | • | | • | | • | | | • | | | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | ı | I | ı | I | I | Р | Р | - | i | i | i | i | i | I | I | ı | i | i | - | ı | I | loop0(#r7:2,#U10) | | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | I | ı | ı | ı | I | Р | Р | - | i | i | i | i | i | I | I | I | i | i | - | ı | ı | loop1(#r7:2,#U10) | Field name Description smSupervisor mode onlyICLASSInstruction class Parse Packet/loop parse bits s5 Field to encode register s # Add to PC Add an immediate value to the program counter (PC) and place the result in a destination register. Use this instruction with a constant extender to add a 32-bit immediate value to PC. | Syntax | Behavior | |-------------------|-----------------------------------------| | Rd = add(pc, #u6) | <pre>Rd = PC+apply extension(#u);</pre> | Class: CR (slot 3) # **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|------|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------| | | ICL | .ASS | 3 | | sm | | | | | | | | | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | Р | Р | - | i | i | i | i | i | i | | - | d | d | d | d | d | Rd=add(pc,#u6) | | Field name | | Description | |------------|----------------------|-------------| | m | Supervisor mode only | | ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register d # Pipelined loop instructions The spNloop0 instruction is a single instruction that sets up a hardware loop with automatic predicate control. This saves code size by enabling generation of software pipelined loops without prologue code. Upon executing this instruction, the P3 register automatically clears. After the loop executes N times (where N is selectable from 1 to 3), the P3 register is set. Store instructions in the loop are predicated with P3 and thus not enabled during the pipeline warm-up. The spNloop0 instruction uses the loop 0 (inner-loop) registers. This instruction sets the start address (SAO) register based on a PC-relative immediate add. The relative immediate is added to the PC and stored in SAO. The loop count (LCO) is set to either an unsigned immediate or to a register value. The predicate P3 is cleared. The USR.LPCFG bits are set based on the N value. Syntax Behavior | p3=sp1loop0(#r7:2,#U10) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=#U; USR.LPCFG=1; P3=0;</pre> | |-------------------------|--------------------------------------------------------------------------------------------------| | p3=sp1loop0(#r7:2,Rs) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=Rs; USR.LPCFG=1; P3=0;</pre> | | p3=sp2loop0(#r7:2,#U10) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=#U; USR.LPCFG=2; P3=0;</pre> | | p3=sp2loop0(#r7:2,Rs) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=Rs; USR.LPCFG=2; P3=0;</pre> | | p3=sp3loop0(#r7:2,#U10) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=#U; USR.LPCFG=3; P3=0;</pre> | | p3=sp3loop0(#r7:2,Rs) | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; SA0=PC+#r; LC0=Rs; USR.LPCFG=3; P3=0;</pre> | ## Class: CR (slot 3) #### **Notes** - The predicate that this instruction generates cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. - This instruction cannot execute in the last address of a hardware loop. - The Next PC value is the address immediately following the last instruction in the packet containing this instruction. - The PC value is the address of the start of the packet - A PC-relative address is formed by taking the decoded immediate value and adding it to the current PC value. ## **Encoding** s5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------------| | | ICL | ASS | ; | | sm | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | - | - | - | i | i | - | - | - | p3=sp1loop0(#r7:2,Rs) | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | - | - | - | i | i | - | - | - | p3=sp2loop0(#r7:2,Rs) | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | - | - | - | i | i | • | - | - | p3=sp3loop0(#r7:2,Rs) | | | ICL | ASS | ; | | sm | | | | | | • | • | • | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | ı | I | I | I | I | Р | Р | - | i | i | i | i | i | I | I | I | i | i | - | I | I | p3=sp1loop0(#r7:2,#U10) | | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | I | I | I | I | I | Р | Р | - | i | i | i | i | i | I | I | I | i | i | • | I | I | p3=sp2loop0(#r7:2,#U10) | | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | ı | ı | I | I | I | Р | Р | - | i | i | i | i | i | I | I | I | i | i | - | I | ı | p3=sp3loop0(#r7:2,#U10) | #### Field name Description smSupervisor mode onlyICLASSInstruction classParsePacket/loop parse bits Field to encode register s # Logical operations on predicates Perform bitwise logical operations on predicate registers. | Syntax | Behavior | |--------------------------|--------------------------------------------------------| | Pd=Ps | Assembler mapped to: "Pd=or(Ps,Ps)" | | Pd=and(Ps,and(Pt,[!]Pu)) | PREDUSE_TIMING;<br>Pd = Ps & Pt & (~Pu); | | Pd=and(Ps,or(Pt,[!]Pu)) | <pre>PREDUSE_TIMING; Pd = Ps &amp; (Pt (~Pu));</pre> | | Pd=and(Pt,[!]Ps) | PREDUSE_TIMING;<br>Pd=Pt & (~Ps); | | Pd=not(Ps) | PREDUSE_TIMING;<br>Pd=~Ps; | | Pd=or(Ps, and(Pt,[!]Pu)) | PREDUSE_TIMING;<br>Pd = Ps (Pt & (~Pu)); | | Pd=or(Ps,or(Pt,[!]Pu)) | PREDUSE_TIMING; Pd = Ps Pt (~Pu); | | Pd=or(Pt,[!]Ps) | PREDUSE_TIMING;<br>Pd=Pt (~Ps); | | Pd=xor(Ps,Pt) | PREDUSE_TIMING; Pd=Ps ^ Pt; | # Class: CR (slot 2,3) #### **Notes** ■ This instruction can execute on either slot2 or slot3, even though it is a CR-type ## **Intrinsics** | Pd=Ps | Byte Q6_p_equals_p(Byte Ps) | |------------------------|---------------------------------------------------| | Pd=and(Ps,and(Pt,!Pu)) | Byte Q6_p_and_and_ppnp(Byte Ps, Byte Pt, Byte Pu) | | Pd=and(Ps,and(Pt,Pu)) | Byte Q6_p_and_and_ppp(Byte Ps, Byte Pt, Byte Pu) | | Pd=and(Ps,or(Pt,!Pu)) | Byte Q6_p_and_or_ppnp(Byte Ps, Byte Pt, Byte Pu) | | Pd=and(Ps,or(Pt,Pu)) | Byte Q6_p_and_or_ppp(Byte Ps, Byte Pt, Byte Pu) | | Pd=and(Pt,!Ps) | Byte Q6_p_and_pnp(Byte Pt, Byte Ps) | | Pd=and(Pt,Ps) | Byte Q6_p_and_pp(Byte Pt, Byte Ps) | | Pd=not(Ps) | Byte Q6_p_not_p(Byte Ps) | | Pd=or(Ps, and(Pt,!Pu)) | Byte Q6_p_or_and_ppnp(Byte Ps, Byte Pt, Byte Pu) | | Pd=or(Ps,and(Pt,Pu)) | Byte Q6_p_or_and_ppp(Byte Ps, Byte Pt, Byte Pu) | | Pd=or(Ps,or(Pt,!Pu)) | Byte Q6_p_or_or_ppnp(Byte Ps, Byte Pt, Byte Pu) | |----------------------|-------------------------------------------------| | Pd=or(Ps,or(Pt,Pu)) | Byte Q6_p_or_or_ppp(Byte Ps, Byte Pt, Byte Pu) | | Pd=or(Pt,!Ps) | Byte Q6_p_or_pnp(Byte Pt, Byte Ps) | | Pd=or(Pt,Ps) | Byte Q6_p_or_pp(Byte Pt, Byte Ps) | | Pd=xor(Ps,Pt) | Byte Q6_p_xor_pp(Byte Ps, Byte Pt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|-----|----|----|----|---|---|---|---|---|---|---|---|---|------------------------| | | ICL | ASS | | | sm | | | | | | | | | S | 2 | Pa | rse | | | | | t2 | 2 | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | - | - | s | s | Р | Р | 0 | - | - | - | t | t | - | - | - | - | - | - | d | d | Pd=and(Pt,Ps) | | | ICL | ASS | | | sm | | | | • | | | | | S | 2 | Pa | rse | | | | | t2 | 2 | u | 2 | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | - | - | s | s | P | Р | 0 | - | - | - | t | t | u | u | - | - | - | - | d | d | Pd=and(Ps,and(Pt,Pu)) | | | ICL | ASS | | | sm | | | | • | • | | • | | s | 2 | Pa | rse | | | | • | t2 | 2 | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | - | - | s | s | Р | Р | 0 | | - | - | t | t | - | - | | - | - | - | d | d | Pd=or(Pt,Ps) | | | ICL | ASS | | | sm | | | | | • | | | | S | 2 | Pa | rse | | | | | t2 | 2 | u | 2 | | | • | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | | - | Ø | s | Р | Р | 0 | | - | | t | t | u | u | | - | - | - | d | d | Pd=and(Ps,or(Pt,Pu)) | | | ICL | ASS | | | sm | | | | | | | | | S | 2 | Pa | rse | | | | | t2 | 2 | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | • | - | s | s | P | P | 0 | • | • | • | t | t | • | - | • | • | - | • | d | d | Pd=xor(Ps,Pt) | | | ICL | ASS | | | sm | | | | | | | • | | S | 2 | Pa | rse | | | | • | t2 | 2 | u | 2 | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | | - | Ø | s | Р | Р | 0 | | - | | t | t | u | u | | - | - | - | d | d | Pd=or(Ps,and(Pt,Pu)) | | | ICL | ASS | | | sm | | | | | | | | | S | 2 | Pa | rse | | | | | t2 | 2 | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | - | - | s | s | P | P | 0 | - | • | - | t | t | - | - | - | • | - | • | d | d | Pd=and(Pt,!Ps) | | | ICL | ASS | | | sm | | | | | | | | | s | 2 | Pa | rse | | | | | t2 | 2 | u | 2 | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | - | - | s | s | Р | Р | 0 | - | - | - | t | t | u | u | - | - | - | - | d | d | Pd=or(Ps,or(Pt,Pu)) | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | - | s | s | Р | Р | 0 | - | - | - | t | t | u | u | - | - | - | - | d | d | Pd=and(Ps,and(Pt,!Pu)) | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | - | - | s | s | P | P | 0 | - | • | - | t | t | u | u | - | • | - | • | d | d | Pd=and(Ps,or(Pt,!Pu)) | | | ICL | ASS | | | sm | | | | | | | | | s | 2 | Pa | rse | | | | | | | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | - | • | s | s | Р | Р | 0 | - | - | - | - | - | - | - | - | - | - | - | d | d | Pd=not(Ps) | | | ICL | ASS | | | sm | | | | | | | | | S | 2 | Pa | rse | | | | | t2 | 2 | u | 2 | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | s | s | Р | Р | 0 | - | - | - | t | t | u | u | - | - | - | - | d | d | Pd=or(Ps,and(Pt,!Pu)) | | | ICL | ASS | | | sm | | | | | | | | | S | 2 | Pa | rse | | | | | tź | 2 | | | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | - | s | s | Р | Р | 0 | - | - | - | t | t | - | - | - | - | - | - | d | d | Pd=or(Pt,!Ps) | | | ICL | ASS | | | sm | | ı | ı | | ı | , | | | S | 2 | Pa | rse | | , . | | | t2 | 2 | u | 2 | | | | | d | 2 | | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | - | s | s | P | Р | 0 | - | - | - | t | t | u | u | - | - | - | - | d | d | Pd=or(Ps,or(Pt,!Pu)) | ## Description | sm | Supervisor mode only | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s2 | Field to encode register s | | t2 | Field to encode register t | | <b>u</b> 2 | Field to encode register u | | | | # User control register transfer Move 32- or 64-bit values between a user control register and a general register. The user control registers include SA, LC, Predicates, M, USR, PC, UGP, GP, and CS, and UPCYCLE. The figure shows the user control registers and their register field encodings. Registers can be moved as singles or as aligned 64-bit pairs. The PC register is not writable. A program flow instruction must be used to change the PC value. | Syntax | Behavior | |--------|----------| | | | | Cd=Rs | Cd=Rs; | |---------|----------| | Cdd=Rss | Cdd=Rss; | | Rd=Cs | Rd=Cs; | | Rdd=Css | Rdd=Css; | Class: CR (slot 3) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------| | | ICL | ASS | ; | | sm | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | d | d | d | d | d | Cd=Rs | | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | | - | d | d | d | d | d | Cdd=Rss | | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | | - | - | - | | d | d | d | d | d | Rdd=Css | | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | d | d | d | d | d | Rd=Cs | | Field name Des | |----------------| |----------------| smSupervisor mode onlyICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register s ## 11.3 JR The JR instruction class includes instructions to change the program flow to a new location contained in a register. JR instructions are executable on slot 2. ## Call subroutine from register Change the program flow to a subroutine. This instruction first transfers the next program counter (NPC) value into the link register, and then jumps to a target address contained in a register. This instruction can only appear in slot 2. | Syntax | Behavior | |---------------------|-----------------------------------------------------| | callr Rs | <pre>LR=NPC; PC=Rs;</pre> | | if ([!]Pu) callr Rs | <pre>if ([!]Pu[0]) { LR=NPC; PC=Rs; }</pre> | Class: JR (slot 2) #### **Notes** ■ This instruction can conditionally execute based on the value of a predicate register. When the instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the predicate register is 1. Similarly, when the instruction is preceded by 'if !Pn', the instruction executes only if the least-significant bit of Pn is 0. #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | callr Rs | | | ICL | ASS | 3 | | | | | | | | | • | s5 | | | Pa | rse | | • | | | u | 2 | | • | | | | • | | | | | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | u | u | - | - | - | - | - | - | - | - | if (Pu) callr Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | u | u | - | - | - | - | - | - | - | - | if (!Pu) callr Rs | | | | <b>5</b> | |------------|------------------------|-------------| | Field name | | Description | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | s5 Field to encode register s u2 Field to encode register u # Hint an indirect jump address Provide a hint indicating that there will soon be an indirect JUMPR to the address specified in Rs. | Syntax | Behavior | |------------|----------| | hintjr(Rs) | ; | Class: JR (slot 2) # **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | hintjr(Rs) | | Field name | | Description | |-------------|-------------------|-------------| | T G T 7 G G | Instruction class | | ICLASS Instruction class Parse Packet/loop parse bits 5 Field to encode register s # Jump to address from register Change the program flow to a target address. This instruction changes the program counter to a target address contained in a register. This instruction can appear only in slot 2. #### Class: JR (slot 2) #### **Notes** ■ This instruction can conditionally execute based on the value of a predicate register. When the instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the predicate register is 1. Similarly, when the instruction is preceded by 'if !Pn', the instruction executes only if the least-significant bit of Pn is 0. ## **Encoding** u2 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----------|----|----------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|--------------------------| | | ICL | ASS | ; | | | | <u> </u> | - | <u> </u> | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | jumpr Rs | | | ICL. | ASS | ; | | • | • | | | | | | | s5 | | | Pa | rse | | • | • | • | u | 2 | | | | | | • | | • | | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | 0 | 0 | - | u | u | - | - | - | - | - | - | - | - | if (Pu) jumpr:nt Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | P | Р | - | 0 | 1 | - | u | u | - | | | - | - | - | | - | if (Pu.new) jumpr:nt Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | 1 | 0 | - | u | u | - | - | - | - | - | - | - | - | if (Pu) jumpr:t Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | 1 | 1 | - | u | u | - | - | - | - | - | - | - | - | if (Pu.new) jumpr:t Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | 0 | 0 | - | u | u | - | - | - | - | - | - | - | - | if (!Pu) jumpr:nt Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | 0 | 1 | - | u | u | - | - | - | - | - | - | - | - | if (!Pu.new) jumpr:nt Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | 1 | 0 | - | u | u | - | - | - | - | - | - | - | - | if (!Pu) jumpr:t Rs | | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | 1 | 1 | - | u | u | - | - | - | - | - | - | - | - | if (!Pu.new) jumpr:t Rs | Description | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | s5 | Field to encode register s | Field to encode register s Field to encode register u ## 11.4 J The J instruction class includes branch instructions (jumps and calls) that obtain the target address from a (PC-relative) immediate address value. J instructions are executable on slot 2 and slot 3. #### Call subroutine Change the program flow to a subroutine. This instruction first transfers the next program counter (NPC) value into the Link Register, and then jumps to the target address. This instruction can appear in slots 2 or 3. **Syntax** call #r22:2 apply\_extension(#r); #r=#r & ~PCALIGN MASK; LR=NPC; PC=PC+#r; apply\_extension(#r); if ([!]Pu) call #r15:2 #r=#r & ~PCALIGN MASK; if ([!]Pu[0]) { LR=NPC; PC=PC+#r; **Behavior** Class: J (slots 2,3) #### **Notes** - This instruction can conditionally execute based on the value of a predicate register. When the instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the predicate register is 1. Similarly, when the instruction is preceded by 'if !Pn', the instruction executes only if the least-significant bit of Pn is 0. - The Next PC value is the address immediately following the last instruction in the packet containing this instruction. - The PC value is the address of the start of the packet - A PC-relative address is formed by taking the decoded immediate value and adding it to the current PC value. #### **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|----------------------| | Ī | ICL | .ASS | 3 | | | | | | | | | | | | | Pa | rse | | | | | | | | | | | | | | | | | C | 1 | 0 | 1 | 1 | 0 | 1 | i | i | i | i | i | i | | i | i | Р | Р | i | i | i | i | i | i | i | i | i | i | i | i | i | 0 | call #r22:2 | | | ICL | .ASS | 3 | | | | | | | | | | | | | Pa | rse | | | DN | | u | 2 | | | | | | | | | | | C | 1 | 0 | 1 | 1 | 1 | 0 | 1 | i | i | 0 | i | i | | i | i | Р | Р | i | - | 0 | | u | u | i | i | i | i | i | i | i | - | if (Pu) call #r15:2 | | C | 1 | 0 | 1 | 1 | 1 | 0 | 1 | i | i | 1 | i | i | _ | i | i | Р | Р | i | - | 0 | | u | u | i | i | i | i | i | i | ï | - | if (!Pu) call #r15:2 | Field name Description ICLASS Instruction class DN Dot-new Parse Packet/loop parse bits u2 Field to encode register u ## Compare and jump Compare two registers, or a register and immediate value, and write a predicate with the result. Then use the predicate result to conditionally jump to a PC-relative target address. The registers available as operands are restricted to R0-R7 and R16-R23. The predicate destination is restricted to P0 and P1. In assembly syntax, this instruction appears as two instructions in the packet: a compare and a separate conditional jump. The assembler can convert adjacent compare and jump instructions into compound compare-jump form. Syntax Behavior ``` P[01] = (Rs = -1) ? 0xff : 0x00 if p[01] = cmp.eq(Rs, #-1); if ([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { #r9:2 apply_extension(#r); #r=#r & ~PCALIGN_MASK; PC=PC+#r; p[01] = cmp.eq(Rs, #U5); if P[01] = (Rs == \#U) ? 0xff : 0x00 if ([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; p[01] = cmp.eq(Rs,Rt); if P[01] = (Rs == Rt) ? 0xff : 0x00 if ([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; P[01] = (Rs > -1) ? 0xff : 0x00 if p[01] = cmp.gt(Rs, #-1); if ([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { apply_extension(#r); #r9:2 #r=#r & ~PCALIGN MASK; PC=PC+#r; P[01] = (Rs > \#U) ? 0xff : 0x00 if p[01] = cmp.qt(Rs, #U5); if ([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { apply_extension(#r); #r9:2 #r=#r & ~PCALIGN MASK; PC=PC+#r; P[01] = (Rs > Rt) ? 0xff : 0x00 if p[01] = cmp.qt(Rs,Rt); if ([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { apply_extension(#r); #r9:2 #r=#r & ~PCALIGN MASK; PC=PC+#r; ``` #### **Syntax Behavior** p[01]=cmp.gtu(Rs, #U5); if P[01] = (Rs.uw[0] > #U) ? 0xff : 0x00 if([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { #r9:2 apply\_extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; P[01] = (Rs.uw[0] > Rt) ? 0xff : 0x00 ifp[01] = cmp.gtu(Rs,Rt); if ([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; p[01] = tstbit(Rs, #0); if P[01] = (Rs & 1) ? 0xff : 0x00 if([!]p[01].new) jump:<hint> ([!]P[01].new[0]) { #r9:2 apply extension(#r); #r=#r & ~PCALIGN\_MASK; PC=PC+#r; Class: J (slots 0,1,2,3) ## **Encoding** | • • | | | | | | | | | | | | 40 | 40 | 4- | 40 | 4- | | 40 | 4.0 | | 4.0 | • | _ | _ | _ | _ | _ | _ | _ | | | | |-----|-----|-----|---|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|-----|----|-----|---|---|---|---|---|---|---|---|---|---|--------------------------------------------------| | | | | | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | | 16 | | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | | | ICL | ASS | 3 | | | | | | | | | | S | 4 | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 0 | 0 | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,#-1); if<br>(p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 0 | 1 | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,#-1); if<br>(p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | - | • | - | 1 | 1 | i | i | i | - | i | i | i | 1 | p0=tstbit(Rs,#0); if (p0.new)<br>jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | - | • | - | 0 | 0 | i | i | i | ï | i | i | i | • | p0=cmp.eq(Rs,#-1); if (p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | - | • | - | 0 | 1 | i | i | i | - | i | i | i | 1 | p0=cmp.gt(Rs,#-1); if (p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 1 | 1 | i | i | i | i | i | i | i | 1 | p0=tstbit(Rs,#0); if (p0.new)<br>jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | - | | - | 0 | 0 | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,#-1); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 0 | 1 | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,#-1); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | - | | - | 1 | 1 | i | i | i | - | i | i | i | 1 | p0=tstbit(Rs,#0); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | - | • | - | 0 | 0 | i | i | i | ï | i | i | i | • | p0=cmp.eq(Rs,#-1); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | - | • | - | 0 | 1 | i | i | i | · | i | i | i | • | p0=cmp.gt(Rs,#-1); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 1 | 1 | i | i | i | i | i | i | i | - | p0=tstbit(Rs,#0); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | i | i | s | s | s | s | Р | Р | 0 | ı | I | ı | I | ı | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,#U5); if (p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | I | ı | I | ı | I | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,#U5); if (p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,#U5); if<br>(!p0.new) jump:nt #r9:2 | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------------------------------------| | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,#U5); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,#U5); if<br>(p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,#U5); if<br>(p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,#U5); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,#U5); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 0 | ı | ı | ı | ı | ı | i | ï | i | i | i | i | i | - | p0=cmp.gtu(Rs,#U5); if<br>(p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.gtu(Rs,#U5); if<br>(p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | ı | ı | I | - | ı | i | - | i | - | i | ï | ï | - | p0=cmp.gtu(Rs,#U5); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p0=cmp.gtu(Rs,#U5); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 0 | 0 | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,#-1); if<br>(p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 0 | 1 | i | - | i | - | · | - | - | - | p1=cmp.gt(Rs,#-1); if (p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | - | - | | 1 | 1 | i | - | i | - | ï | · | ï | - | p1=tstbit(Rs,#0); if (p1.new)<br>jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 0 | 0 | i | - | i | - | ï | - | · | - | p1=cmp.eq(Rs,#-1); if (p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 0 | 1 | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,#-1); if (p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 1 | 1 | i | i | i | i | i | i | i | - | p1=tstbit(Rs,#0); if (p1.new)<br>jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 0 | 0 | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,#-1); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 0 | 1 | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,#-1); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | - | - | - | 1 | 1 | i | i | i | i | i | i | i | - | p1=tstbit(Rs,#0); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 0 | 0 | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,#-1); if<br>(!p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 0 | 1 | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,#-1); if<br>(!p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | - | - | - | 1 | 1 | i | i | i | i | i | i | i | - | p1=tstbit(Rs,#0); if<br>(!p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | i | i | s | s | s | s | Р | Р | 0 | I | I | I | ı | I | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,#U5); if (p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | I | - | ı | i | - | i | - | ï | - | · | - | p1=cmp.eq(Rs,#U5); if (p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | I | ı | I | ı | ı | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,#U5); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,#U5); if<br>(!p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | I | ı | I | ı | ı | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,#U5); if (p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | I | ı | ı | ı | I | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,#U5); if (p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | I | I | I | ı | I | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,#U5); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | I | ı | ı | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,#U5); if<br>(!p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 0 | ı | ı | I | ı | ı | i | i | i | i | i | i | i | - | p1=cmp.gtu(Rs,#U5); if (p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | ı | I | ı | ı | I | i | i | i | i | i | i | i | - | p1=cmp.gtu(Rs,#U5); if (p1.new) jump:t #r9:2 | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------------------------------------| | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | ı | ı | ı | I | I | i | i | i | i | i | i | i | - | p1=cmp.gtu(Rs,#U5); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | i | i | s | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | - | p1=cmp.gtu(Rs,#U5); if<br>(!p1.new) jump:t #r9:2 | | | ICL. | ASS | ; | | | | | | | | | | s | 4 | | Pa | rse | | | | t- | 4 | | | | | | | | | | | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | i | i | ø | s | s | s | Р | Р | 0 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,Rt); if<br>(p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | i | i | s | s | s | s | Р | Р | 0 | 1 | t | t | t | t | i | i | i | i | i | i | i | • | p1=cmp.eq(Rs,Rt); if<br>(p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,Rt); if (p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,Rt); if<br>(p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,Rt); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,Rt); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 1 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.eq(Rs,Rt); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | i | i | s | s | s | s | Р | Р | 1 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.eq(Rs,Rt); if<br>(!p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,Rt); if<br>(p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 0 | 1 | t | t | t | t | i | i | i | i | i | i | i | • | p1=cmp.gt(Rs,Rt); if<br>(p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,Rt); if<br>(p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | i | i | s | s | s | s | Р | Р | 1 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,Rt); if<br>(p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,Rt); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 0 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,Rt); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gt(Rs,Rt); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | i | i | s | s | s | s | Р | Р | 1 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.gt(Rs,Rt); if<br>(!p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 0 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gtu(Rs,Rt); if<br>(p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 0 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.gtu(Rs,Rt); if<br>(p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gtu(Rs,Rt); if<br>(p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | i | i | s | s | s | s | Р | Р | 1 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.gtu(Rs,Rt); if<br>(p1.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gtu(Rs,Rt); if<br>(!p0.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | i | i | s | s | s | s | Р | Р | 0 | 1 | t | t | t | t | i | i | i | i | i | i | i | - | p1=cmp.gtu(Rs,Rt); if<br>(!p1.new) jump:nt #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | i | i | s | s | s | s | Р | Р | 1 | 0 | t | t | t | t | i | i | i | i | i | i | i | - | p0=cmp.gtu(Rs,Rt); if<br>(!p0.new) jump:t #r9:2 | | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | i | i | ø | s | s | s | Р | Р | 1 | 1 | t | t | t | t | i | i | i | i | i | i | i | • | p1=cmp.gtu(Rs,Rt); if<br>(!p1.new) jump:t #r9:2 | ## Field name ## Description ICLASSInstruction classParsePacket/loop parse bitss4Field to encode register st4Field to encode register t ## Jump to address Change the program flow to a target address. This instruction changes the program counter (PC) to a target address that is relative to the PC address. The offset from the current PC address is contained in the instruction encoding. A speculated jump instruction includes a hint ("taken" or "not taken") that specifies the expected value of the conditional expression. If the actual generated value of the predicate differs from this expected value, the jump instruction incurs a performance penalty. This instruction can appear in slots 2 or 3. # #r15:2 Syntax Assembler mapped to: "if ([!]Pu) ""jump"":nt ""#r15:2" if ([!]Pu) jump:<hint> if ([!]Pu[0]) { apply\_extension(#r); #r=#r & ~PCALIGN\_MASK; PC=PC+#r; } jump #r22:2 apply\_extension(#r); #r=#r & ~PCALIGN\_MASK; PC=PC+#r; Class: J (slots 0,1,2,3) #### Notes ■ This instruction can conditionally execute based on the value of a predicate register. When the instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the predicate register is 1. Similarly, when the instruction is preceded by 'if !Pn', the instruction executes only if the least-significant bit of Pn is 0. ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|--------|----|---|---|---|---|---|---|---|---|---|---|-------------------------| | | ICL | ASS | 6 | | | | | | | | | | | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 0 | 1 | 1 | 0 | 0 | i | i | i | i | i | i | i | i | i | Р | Р | i | į | i | i | i | i | i | i | i | i | i | i | i | - | jump #r22:2 | | | ICL | ASS | 3 | | | | | | | | | | | | | Pa | rse | | PT | D<br>N | | u | 2 | | | | | | | | | | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 0 | i | i | i | i | i | Р | Р | i | 0 | 0 | | u | u | i | i | i | i | i | i | i | - | if (Pu) jump:nt #r15:2 | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 0 | i | i | i | i | i | Р | Р | i | 1 | 0 | | u | u | i | i | i | i | i | i | i | | if (Pu) jump:t #r15:2 | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 1 | i | i | i | i | i | Р | Р | i | 0 | 0 | | u | u | i | i | i | i | i | i | i | | if (!Pu) jump:nt #r15:2 | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 1 | i | i | i | i | i | Р | Р | i | 1 | 0 | - | u | u | i | i | i | i | i | i | i | - | if (!Pu) jump:t #r15:2 | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | DN | Dot-new | | | PT | Predict-taken | | | Parse | Packet/loop parse bits | | | u2 | Field to encode register u | | ## Jump to address conditioned on new predicate Perform speculated jump. Jump if the LSB of the newly-generated predicate is true. The predicate must be generated in the same packet as the speculated jump instruction. A speculated jump instruction includes a hint ("taken" or "not taken") specifying the expected value of the conditional expression. If the actual generated value of the predicate differs from this expected value, the jump instruction incurs a performance penalty. This instruction can appear in slots 2 or 3. Syntax if ([!]Pu.new) jump:<hint> if([!]Pu.new[0]) { apply\_extension(#r); #r=#r & ~PCALIGN\_MASK; PC=PC+#r; Class: J (slots 0,1,2,3) #### **Notes** This instruction can conditionally execute based on the value of a predicate register. If the instruction is preceded by 'if Pn', the instruction only executes when the least-significant bit of the predicate register is 1. Similarly, if the instruction is preceded by 'if !Pn', the instruction executes only when the least-significant bit of Pn is 0. #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|--------|----|---|---|---|---|---|---|---|---|---|---|-----------------------------| | | ICL | ASS | 3 | | | | | | | | | | | | | Pa | rse | | PT | D<br>N | | u | 2 | | | | | | | | | | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 0 | i | i | i | i | i | Р | Р | i | 0 | 1 | - | u | u | i | i | i | i | i | i | i | - | if (Pu.new) jump:nt #r15:2 | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 0 | i | i | i | i | i | Р | Р | i | 1 | 1 | - | u | u | i | i | i | i | i | i | i | | if (Pu.new) jump:t #r15:2 | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 1 | i | i | i | i | i | Р | Р | i | 0 | 1 | - | u | u | i | i | i | i | i | i | i | - | if (!Pu.new) jump:nt #r15:2 | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | i | i | 1 | i | i | i | i | i | Р | Р | i | 1 | 1 | - | u | u | i | i | i | i | i | i | i | - | if (!Pu.new) jump:t #r15:2 | ICLASSInstruction classDNDot-newPTPredict-taken Parse Packet/loop parse bits u2 Field to encode register u # Jump to address condition on register value Perform register-conditional jump. Jump if the specified register expression is true. A register-conditional jump includes a hint ("taken" or "not taken") that specifies the expected value of the register expression. If the actual generated value of the expression differs from this expected value, the jump instruction incurs a performance penalty. This instruction can appear only in slot 3. | | Syntax | Behavior | |-------------|----------------|--------------------------------------------| | if (Rs!=#0) | jump:nt #r13:2 | <pre>if (Rs != 0) { PC=PC+#r; }</pre> | | if (Rs!=#0) | jump:t #r13:2 | <pre>if (Rs != 0) { PC=PC+#r; }</pre> | | if (Rs<=#0) | jump:nt #r13:2 | <pre>if (Rs&lt;=0) { PC=PC+#r; }</pre> | | if (Rs<=#0) | jump:t #r13:2 | <pre>if (Rs&lt;=0) { PC=PC+#r; }</pre> | | if (Rs==#0) | jump:nt #r13:2 | <pre>if (Rs == 0) { PC=PC+#r; }</pre> | | if (Rs==#0) | jump:t #r13:2 | <pre>if (Rs == 0) { PC=PC+#r; }</pre> | | if (Rs>=#0) | jump:nt #r13:2 | <pre>if (Rs&gt;=0) { PC=PC+#r; }</pre> | | if (Rs>=#0) | jump:t #r13:2 | <pre>if (Rs&gt;=0) { PC=PC+#r; }</pre> | ## Class: J (slot 3) #### **Notes** This instruction will be deprecated in a future version. # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|----------------------------| | | ICL. | ASS | ; | | sm | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | i | s | s | s | s | s | Р | Р | i | 0 | i | i | i | i | i | i | i | i | i | i | i | - | if (Rs!=#0) jump:nt #r13:2 | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | i | s | s | s | s | s | P | P | i | 1 | i | i | i | i | i | i | i | i | i | i | i | - | if (Rs!=#0) jump:t #r13:2 | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | i | s | s | s | s | s | Р | Р | i | 0 | i | i | i | i | i | i | i | i | i | i | i | - | if (Rs>=#0) jump:nt #r13:2 | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | i | s | s | s | s | s | Р | Р | i | 1 | i | i | i | i | i | i | i | i | i | i | i | | if (Rs>=#0) jump:t #r13:2 | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | i | s | s | s | s | s | Р | Р | i | 0 | i | i | i | i | i | i | i | i | i | i | i | - | if (Rs==#0) jump:nt #r13:2 | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | i | s | s | s | s | s | Р | Р | i | 1 | i | i | i | i | i | i | i | i | i | i | i | - | if (Rs==#0) jump:t #r13:2 | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | i | s | s | s | s | s | P | Р | i | 0 | i | i | i | i | i | i | i | i | i | i | i | • | if (Rs<=#0) jump:nt #r13:2 | | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | i | s | s | s | s | s | Р | Р | i | 1 | i | i | i | i | i | i | i | i | i | i | i | - | if (Rs<=#0) jump:t #r13:2 | #### Field name ## Description smSupervisor mode onlyICLASSInstruction class Parse Packet/loop parse bits s5 Field to encode register s # Transfer and jump Move an unsigned immediate or register value into a destination register and unconditionally jump. In assembly syntax, this instruction appears as two instructions in the packet, a transfer and a separate jump. The assembler can convert adjacent transfer and jump instructions into compound transfer-jump form. | Rd=#U6 ; jump #r9:2 | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; Rd=#U; PC=PC+#r;</pre> | |---------------------|-----------------------------------------------------------------------------| | Rd=Rs ; jump #r9:2 | <pre>apply_extension(#r); #r=#r &amp; ~PCALIGN_MASK; Rd=Rs; PC=PC+#r;</pre> | Class: J (slots 2,3) ## **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----------------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------| | | ICLASS d4 Parse | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | - | | i | i | d | d | d | d | Р | Р | I | I | I | I | ı | I | i | i | - | i | i | i | i | - | Rd=#U6 ; jump #r9:2 | | | ICL | ASS | 3 | | | | • | | | • | | | S | 4 | | Pai | rse | | • | | d | 4 | • | | | | | | | | | | | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | - | - | i | i | s | s | s | s | Р | Р | - | - | d | d | d | d | i | i | i | i | i | i | i | - | Rd=Rs ; jump #r9:2 | #### Field name Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d4 | Field to encode register d | | s4 | Field to encode register s | # 11.5 LD The LD instruction class includes load instructions, which load values into registers. LD instructions execute on slot 0 and slot 1. ## Load doubleword Load a 64-bit doubleword from memory and place in a destination register pair. | Syntax | Behavior | |-----------------------------------------|----------------------------------------------------------------------------------| | Rdd=memd(Re=#U6) | <pre>apply_extension(#U); EA=#U; Rdd = *EA; Re=#U;</pre> | | Rdd=memd(Rs+#s11:3) | <pre>apply_extension(#s); EA=Rs+#s; Rdd = *EA;</pre> | | Rdd=memd(Rs+Rt<<#u2) | EA=Rs+(Rt<<#u);<br>Rdd = *EA; | | Rdd=memd(Rt<<#u2+#U6) | <pre>apply_extension(#U); EA=#U+(Rt&lt;&lt;#u); Rdd = *EA;</pre> | | Rdd=memd(Rx++#s4:3) | EA=Rx;<br>Rx=Rx+#s;<br>Rdd = *EA; | | <pre>Rdd=memd(Rx++#s4:3:circ(Mu))</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); Rdd = *EA;</pre> | | Rdd=memd(Rx++I:circ(Mu)) | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;3,MuV); Rdd = *EA;</pre> | | Rdd=memd(Rx++Mu) | EA=Rx;<br>Rx=Rx+MuV;<br>Rdd = *EA; | | Rdd=memd(Rx++Mu:brev) | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; Rdd = *EA;</pre> | | Rdd=memd(gp+#u16:3) | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; Rdd = *EA;</pre> | ## Class: LD (slots 0,1) #### **Intrinsics** ``` Rdd=memd(Rx++#s4:3:circ(Mu)) Word32 Q6_R_memd_IM_circ(void** StartAddress, Word32 Is4_3, Word32 Mu, void* BaseAddress) ``` Word32 Q6\_R\_memd\_M\_circ(void\*\* StartAddress, Word32 Mu, void\* BaseAddress) Rdd=memd(Rx++I:circ(Mu)) # **Encoding** ICLASS | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|-------|----|-----|----|----|------|----|--------|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------| | | ICL/ | ASS | ; | | | | | | | | | | s5 | | | Pai | se | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | - | - | d | d | d | d | d | Rdd=memd(Rs+Rt<<#u2) | | | ICL | ASS | ; | | | | | Ту | ре | U<br>N | | | | | | Par | se | | | | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 1 | 1 | 1 | 0 | i | i | i | i | i | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=memd(gp+#u16:3) | | | ICL | ASS | ; | Α | mod | de | | Туре | Э | U | | | s5 | | | Par | se | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 0 | i | i | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=memd(Rs+#s11:3) | | | ICL | ASS | ; | Α | mod | de | , | Туре | е | UN | | | x5 | | | Par | se | u1 | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 0 | i | i | i | i | d | d | d | d | d | Rdd=memd(Rx++#s4:3:circ (Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | d | d | d | d | d | Rdd=memd(Rx++I:circ(Mu) | | | ICL | ASS | ; | Α | mod | de | , | Туре | Э | U | | | e5 | | | Par | se | | | | | | • | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | е | е | е | е | е | Р | Р | 0 | 1 | ı | I | ı | ı | | ı | ı | d | d | d | d | d | Rdd=memd(Re=#U6) | | | ICL | ASS | ; | Α | mod | de | | Тур | Э | U<br>N | | | x5 | | | Par | se | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | 0 | 0 | - | · | • | i | i | i | i | d | d | d | d | d | Rdd=memd(Rx++#s4:3) | | | ICL | ASS | ; | Α | mo | de | | Тур | Э | U<br>N | | | t5 | | | Pai | se | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | t | t | t | t | t | Р | Р | i | 1 | I | ı | ı | ı | i | ı | ı | d | d | d | d | d | Rdd=memd(Rt<<#u2+#U6) | | | ICL | ASS | <br>: | Α | mo | de | | Тур | e | UN | | | x5 | | | Pai | se | u1 | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rdd=memd(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | · | • | - | 0 | - | - | d | d | d | d | d | Rdd=memd(Rx++Mu:brev) | | Field name | | Description | |------------|-------------------|-------------| | CLASS | Instruction class | • | | Amode | Amode | |-------|----------------------------| | Туре | Туре | | UN | Unsigned | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | x5 | Field to encode register x | | | | Instruction class # Load-acquire doubleword Load a 64-bit doubleword from memory and place in a destination register pair. The load-acquire memory operation is observed before any following memory operations (in program order) are observed at the local point of serialization. A different order can be observed at the global point of serialization, see Memory ordering. | Syntax | Behavior | |-----------------|---------------------| | Rdd=memd_aq(Rs) | EA=Rs;<br>Rdd = *EA | Class: LD (slots 0) ### **Encoding** Parse d5 s5 | 3 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------------| | Ī | ı | ICL | ASS | 3 | А | mod | de | | Туре | Э | U<br>N | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | Г | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 1 | 1 | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rdd=memd_aq(Rs) | | | ' | U | U | ١. | U | U | U | U | 3 | 3 | 3 | 3 | 3 | - | г | U | ' | ' | - | - | - | U | U | U | u | u | u | u | ٠ | |---|----|-----|----|----|----|---|-----|-----|------|------|------|----|---|---|---|---|---|---|----|-----|------|----|---|---|---|---|---|---|---| | • | | | | | | | | | | | | | | | | | | | | | | | | | | | | | • | | | | Fie | ld | na | me | • | | | | | | | | | | | | D | es | cri | ptic | on | | | | | | | | | | ΙC | LA: | SS | | | | Ins | tru | ctic | on o | clas | ss | | | | | | | | | | | | | | | | | | | Ž | Αm | od | е | | | | An | nod | le | | | | | | | | | | | | | | | | | | | | | | - | Гу | ре | | | | | Ту | ре | | | | | | | | | | | | | | | | | | | | | | | Ţ | JN | | | | | | Un | sig | ne | d | | | | | | | | | | | | | | | | | | | | Packet/loop parse bits Field to encode register d Field to encode register s ## Load doubleword conditionally Load a 64-bit doubleword from memory and place in a destination register pair. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. Syntax Behavior ``` if ([!]Pt[.new]) Rdd = apply extension(#u); memd(#u6) EA=#u; if ([!]Pt[.new][0]) { Rdd = *EA; } else { NOP; if ([!]Pt[.new]) Rdd = apply extension(#u); memd(Rs + #u6:3) EA=Rs+#u; if ([!]Pt[.new][0]) { Rdd = *EA; } else { NOP; if ([!]Pt[.new]) Rdd = EA=Rx; memd(Rx ++ #s4:3) if([!]Pt[.new][0]){ Rx=Rx+#s; Rdd = *EA; } else { NOP; if ([!]Pv[.new]) Rdd = EA=Rs+(Rt<<\#u); memd(Rs + Rt << #u2) if ([!]Pv[.new][0]) { Rdd = *EA; } else { NOP; ``` Class: LD (slots 0,1) #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----------|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | ٧ | d | d | d | d | d | if (Pv)<br>Rdd=memd(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | ٧ | d | d | d | d | d | if (!Pv)<br>Rdd=memd(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv.new)<br>Rdd=memd(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv.new)<br>Rdd=memd(Rs+Rt<<#u2) | | | ICL | ASS | 3 | | | Pr<br>ed | | Ту | ре | U | | | s5 | | | Pa | rse | | t | 2 | | | | | | | | • | d5 | | | | | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------------| | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rdd=memd(Rs+#u6:3) | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rdd=memd(Rs+#u6:3) | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rdd=memd(Rs+#u6:3) | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | P | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rdd=memd(Rs+#u6:3) | | | ICL | ASS | 3 | Α | mod | de | | Туре | Э | U | | | х5 | | | Pa | rse | | | | ť | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rdd=memd(Rx++#s4:3) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | P | 1 | 0 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rdd=memd(Rx++#s4:3) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | P | 1 | 1 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rdd=memd(Rx++#s4:3) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | 1 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rdd=memd(Rx++#s4:3) | | | ICL | ASS | 3 | Α | mod | de | | Туре | 9 | U<br>N | | | | | | Pa | rse | | | | ť | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | i | i | i | i | i | Р | Р | 1 | 0 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt) Rdd=memd(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | i | i | i | - | i | Р | Ρ | 1 | 0 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt) Rdd=memd(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | i | i | i | i | i | Р | P | 1 | 1 | 0 | t | t | i | 1 | • | - | d | d | d | d | d | if (Pt.new) Rdd=memd(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | i | i | i | i | i | Р | P | 1 | 1 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt.new)<br>Rdd=memd(#u6) | ## Field name ## Description ICLASS Instruction class Amode Amode Type Type UN Unsigned PredNew PredNew Sense Sense Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t2 Field to encode register t t5 Field to encode register t v2 Field to encode register v x5 Field to encode register x # Load byte Load a signed byte from memory. The byte at the effective address in memory is placed in the least-significant eight bits of the destination register. The destination register is then sign-extended from eight bits to 32. | Syntax | Behavior | |-----------------------------|---------------------------------------------------------------------------------| | Rd=memb (Re=#U6) | <pre>apply_extension(#U); EA=#U; Rd = *EA; Re=#U;</pre> | | Rd=memb(Rs+#s11:0) | <pre>apply_extension(#s); EA=Rs+#s; Rd = *EA;</pre> | | Rd=memb(Rs+Rt<<#u2) | EA=Rs+(Rt<<#u);<br>Rd = *EA; | | Rd=memb(Rt<<#u2+#U6) | <pre>apply_extension(#U);<br/>EA=#U+(Rt&lt;&lt;#u);<br/>Rd = *EA;</pre> | | Rd=memb(Rx++#s4:0) | EA=Rx;<br>Rx=Rx+#s;<br>Rd = *EA; | | Rd=memb(Rx++#s4:0:circ(Mu)) | EA=Rx;<br>Rx=Rx=circ_add(Rx, #s, MuV);<br>Rd = *EA; | | Rd=memb(Rx++I:circ(Mu)) | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;0,MuV); Rd = *EA;</pre> | | Rd=memb (Rx++Mu) | EA=Rx;<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memb(Rx++Mu:brev) | EA=Rx.h[1] brev(Rx.h[0]);<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memb(gp+#u16:0) | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; Rd = *EA;</pre> | ## Class: LD (slots 0,1) #### **Intrinsics** ``` Rd=memb(Rx++#s4:0:circ(Mu)) Word32 Q6_R_memb_IM_circ(void** StartAddress, Word32 Is4_0, Word32 Mu, void* BaseAddress) Rd=memb(Rx++I:circ(Mu)) Word32 Q6_R_memb_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) ``` # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|-----|-----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|-------------|---|----|---|---|-----------------------------| | | ICL | ASS | ; | | | | | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | - | - | d | d | d | d | d | Rd=memb(Rs+Rt<<#u2) | | | ICL | ASS | ; | | | | | Ту | /ре | U<br>N | | | | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 1 | 0 | 0 | 0 | i | i | i | i | i | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memb(gp+#u16:0) | | | ICL | ASS | ; | А | mo | de | | Тур | е | U | | • | s5 | | | Pa | rse | | | | | • | • | • | • | • | | | d5 | • | • | | | 1 | 0 | 0 | 1 | 0 | i | i | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memb(Rs+#s11:0) | | | ICL | ASS | ; | А | mo | de | | Тур | е | U | | • | х5 | | | Pa | rse | u1 | | | | • | • | | | | | | d5 | | • | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 0 | i | i | i | i | d | d | d | d | d | Rd=memb(Rx++#s4:0:circ(Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | х | х | х | x | х | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | d | d | d | d | d | Rd=memb(Rx++I:circ(Mu)) | | | ICL | ASS | ; | Α | mo | de | | Тур | е | U<br>N | | | е5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | е | е | е | е | е | Р | Р | 0 | 1 | I | I | I | I | - | I | I | d | d | d | d | d | Rd=memb(Re=#U6) | | | ICL | ASS | ; | Α | mo | de | | Тур | е | U | | | х5 | | | Pa | rse | | | | | | • | | • | • | d d d d d d | | | • | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | X | x | Р | Р | 0 | 0 | - | - | | i | i | i | i | d | d | d | d | d | Rd=memb(Rx++#s4:0) | | | ICL | ASS | ; | Α | mo | de | , | Тур | е | U<br>N | | | t5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | t | t | t | t | t | Р | Р | i | 1 | ı | I | ı | I | i | I | ı | d | d | d | d | d | Rd=memb(Rt<<#u2+#U6) | | | ICL | ASS | ; | Α | mo | de | | Тур | е | UN | | | х5 | | | Pa | rse | u1 | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memb(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memb(Rx++Mu:brev) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | Amode | Amode | |-------|----------------------------| | Type | Туре | | UN | Unsigned | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | x5 | Field to encode register x | # Load byte conditionally Load a signed byte from memory. The byte at the effective address in memory is placed in the least-significant eight bits of the destination register. The destination register is then sign-extended from eight bits to 32. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. #### **Syntax Behavior** if ([!]Pt[.new]) apply extension(#u); Rd=memb(#u6) EA=#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) apply\_extension(#u); Rd=memb(Rs+#u6:0) EA=Rs+#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) EA=Rx; Rd=memb(Rx++#s4:0)if([!]Pt[.new][0]){ Rx=Rx+#s;Rd = \*EA;} else { NOP; if ([!]Pv[.new]) EA=Rs+(Rt<<#u);Rd=memb (Rs+Rt<<#u2) if ([!]Pv[.new][0]) { Rd = \*EA;} else { NOP; } Class: LD (slots 0,1) #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------------| | | ICL/ | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv)<br>Rd=memb(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv)<br>Rd=memb(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv.new)<br>Rd=memb(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv.new)<br>Rd=memb(Rs+Rt<<#u2) | | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|---------------|---------------------|----|-----|-----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|------------------------------------| | | ICL | ASS | 3 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | Ту | /pe | U | | | s5 | | | Ра | rse | | t | 2 | | 1 | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt) Rd=memb(Rs+#u6:0) | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memb(Rs+#u6:0) | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memb(Rs+#u6:0) | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | ø | ø | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memb(Rs+#u6:0) | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | UN | | | х5 | | | Pa | rse | | | | t | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memb(Rx++#s4:0) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | 0 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memb(Rx++#s4:0) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | 1 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memb(Rx++#s4:0) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | 1 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memb(Rx++#s4:0) | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | | | | Pa | rse | | | | t | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | i | i | i | - | - | Р | Р | 1 | 0 | 0 | t | t | i | 1 | | - | d | d | d | d | d | if (Pt) Rd=memb(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | i | i | i | - | - | Р | Р | 1 | 0 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt) Rd=memb(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | i | i | i | - | - | Р | Р | 1 | 1 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt.new) Rd=memb(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | i | i | i | - | - | Р | Р | 1 | 1 | 1 | t | t | i | 1 | | - | d | d | d | d | d | if (!Pt.new) Rd=memb(#u6) | #### Field name #### Description ICLASS Instruction class Amode Amode Amode Type UN Unsigned PredNew Sense Type Sense Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t2 Field to encode register t t5 Field to encode register t v2 Field to encode register v x5 Field to encode register x # Load byte into shifted vector Shift a 64-bit vector right by one byte. Insert a byte from memory into the vacated upper byte of the vector. Syntax Behavior ``` Ryy=memb fifo(Re=#U6) apply extension(#U); EA=#U; tmpV = *EA; Ryy = (((size8u t)Ryy)>>8) | (tmpV<<56); Re=#U; Ryy=memb_fifo(Rs) Assembler mapped to: "Ryy=memb_fifo""(Rs+#0)" Ryy=memb_fifo(Rs+#s11:0) apply_extension(#s); EA=Rs+#s; tmpV = *EA; Ryy = (((size8u t)Ryy)>>8) | (tmpV<<56); Ryy=memb fifo(Rt<<#u2+#U6) apply extension(#U); EA = \#U + (Rt << \#u); tmpV = *EA; Ryy = (((size8u t)Ryy)>>8) | (tmpV<<56); Ryy=memb fifo(Rx++#s4:0) EA=Rx; Rx=Rx+#s; tmpV = *EA; Ryy = (((size8u t)Ryy)>>8) | (tmpV<<56); Ryy = memb fifo(Rx ++ EA=Rx; #s4:0:circ(Mu)) Rx=Rx=circ add(Rx, #s, MuV); tmpV = *EA; Ryy = (((size8u_t)Ryy)>>8) | (tmpV<<56); ``` #### Syntax Behavior ``` Ryy = memb fifo(Rx ++ EA=Rx; I:circ(Mu)) Rx=Rx=circ_add(Rx,I<<0,MuV);</pre> tmpV = *EA; Ryy = (((size8u_t)Ryy) >> 8) | (tmpV << 56); Ryy = memb fifo(Rx++Mu) EA=Rx; Rx=Rx+MuV; tmpV = *EA; Ryy = (((size8u_t)Ryy)>>8) | (tmpV<<56); Ryy = memb_fifo(Rx++Mu:brev) EA=Rx.h[1] \mid brev(Rx.h[0]); Rx=Rx+MuV; { tmpV = *EA; Ryy = (((size8u_t)Ryy)>>8) | (tmpV<<56); ``` Class: LD (slots 0,1) ## **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|-----------------------------|---|----|---|---|------------------------------------| | | ICL | ASS | 3 | Α | mod | le | | Тур | 9 | U<br>N | | | s5 | | | Pa | rse | | | | | | | | | | | | у5 | | | | | 1 | 0 | 0 | 1 | 0 | i | i | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | у | у | у | у | у | Ryy=memb_fifo(Rs+#s11:0 | | | ICL | ASS | 3 | Α | moc | le | | Тур | Э | U<br>N | | | х5 | | | Pa | rse | u1 | | | | | | | | | | | у5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 0 | i | i | i | i | у | у | у | у | у | Ryy=memb_fifo(Rx++#s4:0 :circ(Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | у | у | у | у | у | Ryy=memb_fifo(Rx++I:circ(Mu)) | | | ICL | ASS | 3 | А | mod | le | | Тур | 9 | U | | | e5 | | | Pa | rse | | | | | | • | • | | | y5 y y y y y y | | | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | е | е | е | е | е | Р | Р | 0 | 1 | ı | I | I | I | - | I | ı | у | у | у | у | у | Ryy=memb_fifo(Re=#U6) | | | ICL | ASS | 3 | А | mod | le | | Тур | 9 | U | | • | х5 | | | Pa | rse | | | | | | • | • | • | • | y5 | | | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | у | у | у | у | у | Ryy=memb_fifo(Rx++#s4:0 ) | | | ICL | ASS | 3 | Α | moc | le | | Тур | Э | U<br>N | | | t5 | | | Pa | rse | | | | | | | | | | y5 y5 | | | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | t | t | t | t | t | Р | Р | i | 1 | ı | ı | ı | ı | i | ı | ı | у | у | у | у | у | Ryy=memb_fifo(Rt<<#u2+#<br>U6) | | | ICL | ASS | 3 | А | moc | le | | Туре | Э | U<br>N | | | х5 | * | | Pa | rse | u1 | | | * | * | • | | | | y5 | | | | * | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | у | у | у | у | у | Ryy=memb_fifo(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | у | у | у | у | у | Ryy=memb_fifo(Rx++Mu:br ev) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Amode | Amode | | | Type | Туре | | | UN | Unsigned | | | Parse | Packet/loop parse bits | | | e5 | Field to encode register e | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | u1 | Field to encode register u | | | x5 | Field to encode register x | | Field to encode register y у5 ## Load half into shifted vector Shift a 64-bit vector right by one halfword. Insert a halfword from memory into the vacated upper halfword of the vector. #### **Syntax** #### Behavior ``` Ryy = memh fifo(Re=#U6) apply extension(#U); EA=#U; tmpV = *EA; Ryy = (((size8u t)Ryy)>>16) | (tmpV<<48); Re=#U; Ryy = memh fifo(Rs) Assembler mapped to: "Ryy=memh fifo""(Rs+#0)" Ryy = memh fifo(Rs+#s11:1) apply extension(#s); EA=Rs+#s; tmpV = *EA; Ryy = (((size8u t)Ryy)>>16) | (tmpV<<48); apply_extension(#U); Ryy = memh_fifo(Rt << #u2+#U6) EA = #U + (Rt << #u); tmpV = *EA; Ryy = (((size8u t)Ryy)>>16) | (tmpV<<48); Ryy = memh_fifo(Rx ++ #s4:1) EA=Rx; Rx=Rx+#s; tmpV = *EA; Ryy = (((size8u_t)Ryy)>>16) | (tmpV<<48); Ryy = memh fifo(Rx ++ EA=Rx; #s4:1:circ(Mu)) Rx=Rx=circ_add(Rx, #s, MuV); tmpV = *EA; Ryy = (((size8u_t)Ryy)>>16) | (tmpV<<48); ``` #### Syntax ``` Ryy = memh_fifo(Rx ++ EA=Rx; I:circ(Mu)) Rx=Rx=circ add(Rx,I<<1,MuV);</pre> tmpV = *EA; Ryy = (((size8u_t)Ryy)>>16) | (tmpV<<48); Ryy = memh fifo(Rx ++ Mu) EA=Rx; Rx=Rx+MuV; tmpV = *EA; Ryy = (((size8u_t)Ryy)>>16) | (tmpV<<48); Ryy = memh_fifo(Rx ++ EA=Rx.h[1] | brev(Rx.h[0]); Mu:brev) Rx=Rx+MuV; tmpV = *EA; Ryy = (((size8u_t)Ryy)>>16) | (tmpV<<48); ``` **Behavior** Class: LD (slots 0,1) ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|-------------|---|----|---|---|------------------------------------| | | ICL | ASS | 3 | Α | mod | de | | Туре | 9 | ZC | | | s5 | | | Pa | rse | | | | | | | | | | | | у5 | | | | | 1 | 0 | 0 | 1 | 0 | i | i | 0 | 0 | 1 | 0 | s | s | s | s | s | P | Р | i | i | i | i | i | i | i | i | i | у | у | у | у | у | Ryy=memh_fifo(Rs+#s11:1 | | | ICL | ASS | 3 | Α | mod | de | | Туре | 9 | U | | | x5 | | | Pa | rse | u1 | | ı | | ı | | | | ı | | | у5 | | ı | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 0 | i | i | i | i | у | у | у | у | у | Ryy=memh_fifo(Rx++#s4:1 :circ(Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | у | у | у | у | у | Ryy=memh_fifo(Rx++I:circ(Mu)) | | | ICL | ASS | 3 | Α | mod | de | | Туре | 9 | ZC | | | e5 | | | Pa | rse | | | | • | | | | • | | | • | у5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | е | е | е | е | е | P | Р | 0 | 1 | I | I | I | I | - | I | I | у | у | у | у | у | Ryy=memh_fifo(Re=#U6) | | | ICL | ASS | 6 | Α | mod | de | | Туре | 9 | U | | | x5 | | | Pa | rse | | • | • | | • | | • | | • | y5 | | | • | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | x | x | x | x | x | Р | P | 0 | 0 | - | - | - | i | i | i | i | у | у | у | у | у | Ryy=memh_fifo(Rx++#s4:1 | | | ICL | ASS | 3 | А | mod | de | | Туре | 9 | U | | | t5 | | | Pa | rse | | | | | | | | | | y y y y y y | | | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | t | t | t | t | t | P | Р | i | 1 | I | I | ı | I | i | I | ı | у | у | у | у | у | Ryy=memh_fifo(Rt<<#u2+#<br>U6) | | | ICL | ASS | 3 | Α | mod | de | | Туре | 9 | ZC | | | х5 | | | Pa | rse | u1 | | | | | | | | | | | у5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | у | у | у | у | у | Ryy=memh_fifo(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | x | x | x | x | x | P | Р | u | 0 | - | - | - | - | 0 | - | - | у | у | у | у | у | Ryy=memh_fifo(Rx++Mu:br ev) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Amode | Amode | | | Type | Туре | | | UN | Unsigned | | | Parse | Packet/loop parse bits | | | e5 | Field to encode register e | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | u1 | Field to encode register u | | | x5 | Field to encode register x | | | у5 | Field to encode register y | | ## Load halfword Load a signed halfword from memory. The 16-bit halfword at the effective address in memory is placed in the least-significant 16 bits of the destination register. The destination register is then sign-extended from 16 bits to 32. | Syntax | Behavior | |------------------------------------|-----------------------------------------------------------------------------------------| | Rd=memh (Re=#U6) | <pre>apply_extension(#U); EA=#U; Rd = *EA; Re=#U;</pre> | | Rd=memh(Rs+#s11:1) | <pre>apply_extension(#s); EA=Rs+#s; Rd = *EA;</pre> | | Rd=memh(Rs+Rt<<#u2) | EA=Rs+(Rt<<#u);<br>Rd = *EA; | | Rd=memh (Rt<<#u2+#U6) | <pre>apply_extension(#U); EA=#U+(Rt&lt;&lt;#u); Rd = *EA;</pre> | | Rd=memh(Rx++#s4:1) | EA=Rx;<br>Rx=Rx+#s;<br>Rd = *EA; | | Rd=memh(Rx++#s4:1:circ(Mu)) | EA=Rx;<br>Rx=Rx=circ_add(Rx, #s, MuV);<br>Rd = *EA; | | <pre>Rd=memh(Rx++I:circ(Mu))</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;1,MuV); Rd = *EA;</pre> | | Rd=memh(Rx++Mu) | EA=Rx;<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memh(Rx++Mu:brev) | EA=Rx.h[1] brev(Rx.h[0]);<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memh(gp+#u16:1) | <pre>apply_extension(#u);<br/>EA=(Constant_extended ? (0) : GP)+#u;<br/>Rd = *EA;</pre> | ## Class: LD (slots 0,1) #### **Intrinsics** ``` Rd=memh(Rx++#s4:1:circ(Mu)) Word32 Q6_R_memh_IM_circ(void** StartAddress, Word32 Is4_1, Word32 Mu, void* BaseAddress) Rd=memh(Rx++I:circ(Mu)) Word32 Q6_R_memh_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) ``` # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|----|----|---|---|---------------------------------| | | ICL/ | ASS | ; | | | | | | | | | | s5 | | | Pai | se | | | t5 | | | | | | | | d5 | | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | - | - | d | d | d | d | d | Rd=memh(Rs+Rt<<#u2) | | | ICL | ASS | ; | | | | | Ту | ре | U<br>N | | | | | | Pai | se | | | | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 1 | 0 | 1 | 0 | i | i | i | i | i | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memh(gp+#u16:1) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U | | | s5 | | | Par | se | | | | | • | | | | • | | • | d5 | | | | | 1 | 0 | 0 | 1 | 0 | i | i | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memh(Rs+#s11:1) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U | | | x5 | | | Par | se | u1 | | | | | | | | • | | • | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 0 | i | i | i | i | d | d | d | d | d | Rd=memh(Rx++#s4:1:circ(<br>Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | x | х | x | x | x | Р | Р | u | 0 | - | | 1 | - | 0 | - | - | d | d | d | d | d | Rd=memh(Rx++I:circ(Mu)) | | | ICL | ASS | ; | Α | mod | de | | Туре | е | UN | | | e5 | | | Par | se | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | е | е | е | е | е | Р | Р | 0 | 1 | ı | I | I | I | - | I | ı | d | d | d | d | d | Rd=memh(Re=#U6) | | | ICL | ASS | ; | Α | mod | de | | Туре | е | U<br>N | | | x5 | | | Par | se | | | | | | • | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | х | x | х | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rd=memh(Rx++#s4:1) | | | ICL | ASS | ; | Α | mod | de | | Туре | е | UN | | | t5 | | | Par | se | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | t | t | t | t | t | Р | Р | i | 1 | ı | I | I | I | i | I | ı | d | d | d | d | d | Rd=memh(Rt<<#u2+#U6) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U | | | x5 | | | Par | se | u1 | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | • | 1 | - | - | 0 | - | - | d | d | d | d | d | Rd=memh(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memh(Rx++Mu:brev) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | тошноо | monaction diago | |--------|----------------------------| | Amode | Amode | | Туре | Туре | | UN | Unsigned | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | x5 | Field to encode register x | # Load halfword conditionally Load a signed halfword from memory. The 16-bit halfword at the effective address in memory is placed in the least-significant 16 bits of the destination register. The destination register is then sign-extended from 16 bits to 32. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. #### **Syntax Behavior** if ([!]Pt[.new]) apply extension(#u); Rd=memh(#u6) EA=#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) apply\_extension(#u); Rd=memh (Rs+#u6:1) EA=Rs+#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) EA=Rx; Rd=memh(Rx++#s4:1)if([!]Pt[.new][0]){ Rx=Rx+#s;Rd = \*EA;} else { NOP; if ([!]Pv[.new]) EA=Rs+(Rt<<#u);Rd=memh (Rs+Rt<<#u2) if ([!]Pv[.new][0]) { Rd = \*EA;} else { NOP; } Class: LD (slots 0,1) #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv)<br>Rd=memh(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv)<br>Rd=memh(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv.new)<br>Rd=memh(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv.new)<br>Rd=memh(Rs+Rt<<#u2) | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|---------------|---------------------|----|-----|-----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|------------------------------------| | | ICL | ASS | 3 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | Ту | /pe | U | | | s5 | | | Ра | rse | | t | 2 | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt) Rd=memh(Rs+#u6:1) | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memh(Rs+#u6:1) | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memh(Rs+#u6:1) | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memh(Rs+#u6:1) | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | х5 | | | Pa | rse | | | | t | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memh(Rx++#s4:1) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | 0 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memh(Rx++#s4:1) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | 1 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memh(Rx++#s4:1) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | 1 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memh(Rx++#s4:1) | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | UN | | | | | | Pa | rse | | • | | t2 | | | | | d | | | • | • | | | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | i | i | i | i | i | Р | Р | 1 | 0 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt) Rd=memh(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | i | i | i | i | i | Р | Р | 1 | 0 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt) Rd=memh(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | i | i | i | i | i | Р | P | 1 | 1 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt.new) Rd=memh(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | i | i | i | i | i | Р | Р | 1 | 1 | 1 | t | t | i | 1 | - | | d | d | d | d | d | if (!Pt.new) Rd=memh(#u6) | #### Field name #### Description ICLASSInstruction classAmodeAmodeTypeTypeUNUnsigned UN Unsigned PredNew Sense Sense Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t2 Field to encode register t t5 Field to encode register t v2 Field to encode register v x5 Field to encode register x ## Memory copy Copy Mu + 1 (length) bytes from the address in Rt (source base) to the address in Rs (destination base). The source base, destination base, and length values must align to the L2 cache-line size. Behavior is undefined for nonaligned values and for source or destination buffers partially in illegal space. The accesses by the memcpy instruction are noncoherent with the cache-hierarchy of the Q6. In addition to normal translation exceptions, a coprocessor memory exception occurs when any of the following are true: - Source or destination base address in illegal space - Source or destination buffer crosses a page boundary - Source base address is not in AXI space - Destination base address is not in VTCM This instruction is only available on cores with VTCM. | Syntax | Behavior | |--------|----------| | | | Class: N/A # Load unsigned byte Load an unsigned byte from memory. The byte at the effective address in memory is placed in the least-significant 8 bits of the destination register. The destination register is then zero-extended from 8 bits to 32. | Syntax | Behavior | |-----------------------------------------|---------------------------------------------------------------------------------| | Rd=memub (Re=#U6) | <pre>apply_extension(#U); EA=#U; Rd = *EA; Re=#U;</pre> | | Rd=memub(Rs+#s11:0) | <pre>apply_extension(#s); EA=Rs+#s; Rd = *EA;</pre> | | Rd=memub(Rs+Rt<<#u2) | EA=Rs+(Rt<<#u);<br>Rd = *EA; | | Rd=memub(Rt<<#u2+#U6) | <pre>apply_extension(#U);<br/>EA=#U+(Rt&lt;&lt;#u);<br/>Rd = *EA;</pre> | | Rd=memub(Rx++#s4:0) | EA=Rx;<br>Rx=Rx+#s;<br>Rd = *EA; | | <pre>Rd=memub(Rx++#s4:0:circ(Mu))</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); Rd = *EA;</pre> | | Rd=memub(Rx++I:circ(Mu)) | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;0,MuV); Rd = *EA;</pre> | | Rd=memub(Rx++Mu) | EA=Rx;<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memub(Rx++Mu:brev) | EA=Rx.h[1] brev(Rx.h[0]);<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memub(gp+#u16:0) | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; Rd = *EA;</pre> | ## Class: LD (slots 0,1) ### **Intrinsics** ``` Rd=memub(Rx++#s4:0:circ(Mu)) Word32 Q6_R_memub_IM_circ(void** StartAddress, Word32 Is4_0, Word32 Mu, void* BaseAddress) Rd=memub(Rx++I:circ(Mu)) Word32 Q6_R_memub_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) ``` # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------| | | ICL | ASS | ; | | | | | | | | | | s5 | | | Pai | se | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | ø | s | s | s | s | Р | Р | - | t | t | t | t | t | i | - | - | d | d | d | d | d | Rd=memub(Rs+Rt<<#u2) | | | ICL | ASS | ; | | | | | Ту | ре | U<br>N | | | | | | Pai | se | | | | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 1 | 0 | 0 | 1 | i | i | i | i | i | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memub(gp+#u16:0) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U | | | s5 | | | Par | se | | | | | • | | | | • | | | d5 | • | | | | 1 | 0 | 0 | 1 | 0 | i | i | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memub(Rs+#s11:0) | | | ICL/ | ASS | ; | Α | mod | de | | Тур | е | UN | | | х5 | | | Pai | se | u1 | | • | | | | • | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | P | u | 0 | - | - | 0 | i | i | i | i | d | d | d | d | d | Rd=memub(Rx++#s4:0:circ (Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | d | d | d | d | d | Rd=memub(Rx++I:circ(Mu) | | | ICL | ASS | ; | Α | mod | de | | Туре | е | U | | | e5 | | | Par | se | | | • | | • | | | | , | | | d5 | • | • | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | е | е | е | е | е | Р | Р | 0 | 1 | I | I | I | ı | - | ı | ı | d | d | d | d | d | Rd=memub(Re=#U6) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U | | | x5 | | | Par | se | | | | | • | | | | • | | | d5 | • | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rd=memub(Rx++#s4:0) | | | ICL/ | ASS | ; | Α | mod | de | | Тур | е | UN | | | t5 | | | Pai | se | | | • | | • | • | • | • | • | | • | d5 | • | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | t | t | t | t | t | Р | Р | i | 1 | I | I | I | ı | i | ı | ı | d | d | d | d | d | Rd=memub(Rt<<#u2+#U6) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | UN | | | х5 | | | Pai | se | u1 | | | | • | • | • | • | • | | • | d5 | • | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | | 0 | | - | d | d | d | d | d | Rd=memub(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memub(Rx++Mu:brev) | | Field name | | | |------------|-------------------|--| | ICLASS | Instruction class | | Amode Amode Туре Type UN Unsigned Packet/loop parse bits Parse Field to encode register d Field to encode register e e5 s5 Field to encode register s Field to encode register t t5 Field to encode register u u1 x5 Field to encode register x Description # Load unsigned byte conditionally Load an unsigned byte from memory. The byte at the effective address in memory is placed in the least-significant eight bits of the destination register. The destination register is then zero-extended from eight bits to 32. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. #### **Syntax Behavior** if ([!]Pt[.new]) apply extension(#u); Rd=memub(#u6) EA=#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) apply\_extension(#u); Rd=memub(Rs+#u6:0) EA=Rs+#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) EA=Rx; Rd=memub (Rx++#s4:0) if([!]Pt[.new][0]){ Rx=Rx+#s;Rd = \*EA;} else { NOP; if ([!]Pv[.new]) EA=Rs+(Rt<<#u);Rd=memub (Rs+Rt<<#u2) if ([!]Pv[.new][0]) { Rd = \*EA;} else { NOP; } Class: LD (slots 0,1) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------------| | | ICL/ | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | ٧ | d | d | d | d | d | if (Pv)<br>Rd=memub(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv)<br>Rd=memub(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | ٧ | d | d | d | d | d | if (Pv.new)<br>Rd=memub(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv.new)<br>Rd=memub(Rs+Rt<<#u2) | | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|---------------|---------------------|----|------|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------------| | | ICL | ASS | 3 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | Ту | /pe | U | | | s5 | | | Ра | rse | | t | 2 | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | P | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memub(Rs+#u6:0) | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | P | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memub(Rs+#u6:0) | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memub(Rs+#u6:0) | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | P | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memub(Rs+#u6:0) | | | ICL | ASS | 3 | Α | moc | le | | Туре | е | U | | | х5 | | | Pa | rse | | | | ť | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 0 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memub(Rx++#s4:0) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 0 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memub(Rx++#s4:0) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 1 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memub(Rx++#s4:0) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 1 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memub(Rx++#s4:0) | | | ICL | ASS | 3 | Α | mod | le | | Туре | е | U | | | | | | Pa | rse | | | | ť | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | i | i | i | i | i | Р | Р | 1 | 0 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt) Rd=memub(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | i | i | i | - | ï | Р | Ρ | 1 | 0 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt) Rd=memub(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | i | i | i | i | i | Р | Р | 1 | 1 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt.new) Rd=memub(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | i | i | i | i | i | Р | Р | 1 | 1 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt.new)<br>Rd=memub(#u6) | ### Field name ## Description ICLASS Instruction class Amode Amode Type Type UN Unsigned PredNew PredNew Sense Sense Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t2 Field to encode register t t5 Field to encode register t v2 Field to encode register v x5 Field to encode register x # Load unsigned halfword Syntay Load an unsigned halfword from memory. The 16-bit halfword at the effective address in memory is placed in the least-significant 16 bits of the destination register. The destination register is zero-extended from 16 bits to 32. Rehavior | Syntax | Behavior | |------------------------------|-----------------------------------------------------------------| | Rd=memuh (Re=#U6) | <pre>apply_extension(#U); EA=#U; Rd = *EA; Re=#U;</pre> | | Rd=memuh(Rs+#s11:1) | <pre>apply_extension(#s); EA=Rs+#s; Rd = *EA;</pre> | | Rd=memuh(Rs+Rt<<#u2) | EA=Rs+(Rt<<#u);<br>Rd = *EA; | | Rd=memuh (Rt<<#u2+#U6) | <pre>apply_extension(#U); EA=#U+(Rt&lt;&lt;#u); Rd = *EA;</pre> | | Rd=memuh(Rx++#s4:1) | EA=Rx;<br>Rx=Rx+#s;<br>Rd = *EA; | | Rd=memuh(Rx++#s4:1:circ(Mu)) | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); Rd = *EA;</pre> | | Rd=memuh(Rx++I:circ(Mu)) | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;1,MuV); Rd = *EA;</pre> | | Rd=memuh(Rx++Mu) | EA=Rx;<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memuh(Rx++Mu:brev) | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; Rd = *EA;</pre> | | | | ## Class: LD (slots 0,1) Rd=memuh (gp+#u16:1) ### **Intrinsics** ``` Rd=memuh(Rx++#s4:1:circ(Mu)) Word32 Q6_R_memuh_IM_circ(void** StartAddress, Word32 Is4_1, Word32 Mu, void* BaseAddress) Rd=memuh(Rx++I:circ(Mu)) Word32 Q6_R_memuh_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) ``` apply extension(#u); Rd = \*EA; EA=(Constant extended ? (0) : GP)+#u; # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------| | | ICL | ASS | ; | | | | | | | | | | s5 | | | Pai | se | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | i | - | - | d | d | d | d | d | Rd=memuh(Rs+Rt<<#u2) | | | ICL | ASS | ; | | | | | Ту | ре | U<br>N | | | | | | Pai | se | | | | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 1 | 0 | 1 | 1 | i | i | i | i | i | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memuh(gp+#u16:1) | | | ICL/ | ASS | ; | Α | mod | de | | Тур | Э | UN | | • | s5 | | | Par | se | | | • | | • | • | • | • | • | | • | d5 | • | • | | | 1 | 0 | 0 | 1 | 0 | i | i | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memuh(Rs+#s11:1) | | | ICL/ | ASS | ; | Α | mod | de | | Тур | Э | UN | | | х5 | | | Par | se | u1 | | • | | | | | | | | | d5 | • | • | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | P | u | 0 | - | - | 0 | i | i | i | i | d | d | d | d | d | Rd=memuh(Rx++#s4:1:circ (Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | d | d | d | d | d | Rd=memuh(Rx++I:circ(Mu) | | | ICL | ASS | ; | Α | mod | de | | Туре | Э | U | | • | e5 | | | Par | se | | | • | | • | • | | | | | | d5 | • | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | е | е | е | е | е | Р | Р | 0 | 1 | ı | I | I | ı | - | ı | ı | d | d | d | d | d | Rd=memuh(Re=#U6) | | | ICL/ | ASS | ; | Α | mod | de | | Тур | Э | UN | | | х5 | | | Par | se | | | • | | | | | | | | | d5 | • | • | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rd=memuh(Rx++#s4:1) | | | ICL/ | ASS | ; | Α | mod | de | | Тур | Э | UN | | | t5 | | | Par | se | | | • | | | | | | | | | d5 | • | • | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | t | t | t | t | t | Р | Р | i | 1 | I | I | I | I | i | ı | ı | d | d | d | d | d | Rd=memuh(Rt<<#u2+#U6) | | | ICL | ASS | ; | Α | mod | de | | Тур | Э | U | | | x5 | - | | Par | se | u1 | | | | | • | | • | • | | • | d5 | • | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | | - | d | d | d | d | d | Rd=memuh(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memuh(Rx++Mu:brev) | Description | Field name | | |------------|-------------------| | ICLASS | Instruction class | | Amode | Amode | |-------|----------------------------| | Type | Туре | | UN | Unsigned | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | x5 | Field to encode register x | # Load unsigned halfword conditionally Load an unsigned halfword from memory. The 16-bit halfword at the effective address in memory is placed in the least-significant 16 bits of the destination register. The destination register is zero-extended from 16 bits to 32. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. #### **Syntax Behavior** if ([!]Pt[.new]) apply extension(#u); Rd=memuh (#u6) EA=#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) apply\_extension(#u); Rd=memuh (Rs+#u6:1) EA=Rs+#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) EA=Rx; Rd=memuh (Rx++#s4:1) if([!]Pt[.new][0]){ Rx=Rx+#s;Rd = \*EA;} else { NOP; if ([!]Pv[.new]) EA=Rs+(Rt<<#u);Rd=memuh (Rs+Rt<<#u2) if ([!]Pv[.new][0]) { Rd = \*EA;} else { NOP; } Class: LD (slots 0,1) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------------| | | ICL/ | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv)<br>Rd=memuh(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv)<br>Rd=memuh(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv.new)<br>Rd=memuh(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv.new)<br>Rd=memuh(Rs+Rt<<#u2) | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|---------------|---------------------|----|-----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------------| | | ICL | ASS | 3 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | Ту | /pe | U | | | s5 | | | Ра | rse | | t | 2 | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memuh(Rs+#u6:1) | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memuh(Rs+#u6:1) | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memuh(Rs+#u6:1) | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memuh(Rs+#u6:1) | | | ICL | ASS | 3 | Α | moc | de | • | Тур | е | U | | | х5 | | | Pa | rse | | | | t | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memuh(Rx++#s4:1) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | 0 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memuh(Rx++#s4:1) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | 1 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memuh(Rx++#s4:1) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | 1 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memuh(Rx++#s4:1) | | | ICL | ASS | 3 | Α | moc | de | • | Тур | е | U | | | | | | Pa | rse | | | | t | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | i | i | i | i | i | Р | Р | 1 | 0 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt) Rd=memuh(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | i | i | i | i | i | Р | Р | 1 | 0 | 1 | t | t | i | 1 | - | • | d | d | d | d | d | if (!Pt) Rd=memuh(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | i | i | i | i | i | Р | Р | 1 | 1 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt.new) Rd=memuh(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | i | i | i | i | i | Р | Р | 1 | 1 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt.new)<br>Rd=memuh(#u6) | #### Field name ## Description ICLASS Instruction class Amode Amode Type Type UN Unsigned PredNew PredNew Sense Sense Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t2 Field to encode register t t5 Field to encode register t v2 Field to encode register v x5 Field to encode register x # Load word Load a 32-bit word from memory and place in a destination register. | Syntax | Behavior | |-----------------------------|---------------------------------------------------------------------------------| | Rd=memw(Re=#U6) | <pre>apply_extension(#U); EA=#U; Rd = *EA; Re=#U;</pre> | | Rd=memw(Rs+#s11:2) | <pre>apply_extension(#s); EA=Rs+#s; Rd = *EA;</pre> | | Rd=memw(Rs+Rt<<#u2) | EA=Rs+(Rt<<#u);<br>Rd = *EA; | | Rd=memw(Rt<<#u2+#U6) | <pre>apply_extension(#U); EA=#U+(Rt&lt;&lt;#u); Rd = *EA;</pre> | | Rd=memw(Rx++#s4:2) | EA=Rx;<br>Rx=Rx+#s;<br>Rd = *EA; | | Rd=memw(Rx++#s4:2:circ(Mu)) | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); Rd = *EA;</pre> | | Rd=memw(Rx++I:circ(Mu)) | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;2,MuV); Rd = *EA;</pre> | | Rd=memw(Rx++Mu) | EA=Rx;<br>Rx=Rx+MuV;<br>Rd = *EA; | | Rd=memw(Rx++Mu:brev) | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; Rd = *EA;</pre> | | Rd=memw(gp+#u16:2) | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; Rd = *EA;</pre> | # Class: LD (slots 0,1) ## **Intrinsics** | Rd=memw(Rx++#s4:2:circ(Mu)) | <pre>Word32 Q6_R_memw_IM_circ(void** StartAddress, Word32 Is4_2, Word32 Mu, void* BaseAddress)</pre> | |-----------------------------|------------------------------------------------------------------------------------------------------| | Rd=memw(Rx++I:circ(Mu)) | <pre>Word32 Q6_R_memw_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress)</pre> | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|----|----|-----|-----|--------|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|-------------|---|----|---|---|-----------------------------| | | ICL | ASS | ; | | | | | | | | | | s5 | | | Par | se | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | - | - | d | d | d | d | d | Rd=memw(Rs+Rt<<#u2) | | | ICL | ASS | ; | | | | | Ту | /ре | U<br>N | | | | | | Pai | se | | | | | | | | | | | | d5 | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 1 | 1 | 0 | 0 | i | i | i | i | i | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memw(gp+#u16:2) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U<br>N | | | s5 | | | Par | se | | | | | | • | • | | | | | d5 | • | | | | 1 | 0 | 0 | 1 | 0 | i | i | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memw(Rs+#s11:2) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U<br>N | | | x5 | | | Par | se | u1 | | • | | | • | • | | | | | d5 | • | | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | 0 | i | i | i | i | d | d | d | d | d | Rd=memw(Rx++#s4:2:circ(Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | x | x | x | X | x | Р | Р | u | 0 | - | - | 1 | | 0 | | | d | d | d | d | d | Rd=memw(Rx++I:circ(Mu)) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U<br>N | | | е5 | | | Par | se | | | | | | | | | | - d d d d c | | | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | е | е | е | е | е | Р | Р | 0 | 1 | I | I | I | ı | - | I | I | d | d | d | d | d | Rd=memw(Re=#U6) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U<br>N | | | x5 | | | Par | se | | | | • | | | | | | | | d5 | | • | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | х | х | x | x | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rd=memw(Rx++#s4:2) | | | ICL | ASS | ; | А | mod | de | | Тур | е | U<br>N | | | t5 | | | Pai | se | | | | | | | | | | d d d d c | | | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | t | t | t | t | t | Р | Р | i | 1 | I | I | I | ı | i | I | I | d | d | d | d | d | Rd=memw(Rt<<#u2+#U6) | | | ICL | ASS | ; | Α | mod | de | | Тур | е | U | | | х5 | | | Par | se | u1 | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memw(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memw(Rx++Mu:brev) | | Field name | | | |------------|-------------------|--| | CLASS | Instruction class | | ICLASS Instruction Amode Amode Type Type UN Unsigned Parse Packet/loop parse bits d5 Field to encode register d e5 Field to encode register e s5 Field to encode register s t5 Field to encode register t u1 Field to encode register u x5 Field to encode register x ### Description # Load-acquire word Load a 32-bit word from memory and place in a destination register. The load-acquire memory operation is observed before any following memory operations (in program order) are observed at the local point of serialization. A different order can be observed at the global point of serialization, see Memory ordering. | Syntax | Behavior | |----------------|--------------------| | Rd=memw_aq(Rs) | EA=Rs;<br>Rd = *EA | Class: LD (slots 0) ## **Encoding** Parse d5 s5 | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|----|----|----|-----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------| | | | ICL | ASS | 3 | Α | mo | de | | Тур | е | UN | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | Ī | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 0 | 1 | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=memw_aq(Rs) | | Field name | | | | Des | cri | ptic | n | | | | | |------------|-------------------|--|--|-----|-----|------|---|--|--|--|--| | ICLASS | Instruction class | | | | | | | | | | | | Amode | Amode | | | | | | | | | | | | Туре | Туре | | | | | | | | | | | | IIN | Unsigned | | | | | | | | | | | Packet/loop parse bits Field to encode register d Field to encode register s # Load word conditionally Load a 32-bit word from memory and place in a destination register. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. **Syntax Behavior** if ([!]Pt[.new]) apply extension(#u); Rd=memw(#u6) EA=#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) apply extension(#u); Rd=memw(Rs+#u6:2) EA=Rs+#u; if ([!]Pt[.new][0]) { Rd = \*EA;} else { NOP; if ([!]Pt[.new]) EA=Rx;Rd=memw(Rx++#s4:2)if([!]Pt[.new][0]){ Rx=Rx+#s;Rd = \*EA;} else { NOP; if ([!]Pv[.new]) EA=Rs+(Rt<<#u);Rd=memw (Rs+Rt<<#u2) if ([!]Pv[.new][0]) { Rd = \*EA;} else { NOP; Class: LD (slots 0,1) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|---------------|---------------------|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------------------------| | | ICL | ASS | ; | | | | | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv)<br>Rd=memw(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv)<br>Rd=memw(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (Pv.new)<br>Rd=memw(Rs+Rt<<#u2) | | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | v | v | d | d | d | d | d | if (!Pv.new)<br>Rd=memw(Rs+Rt<<#u2) | | | ICL | ASS | ; | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | Ту | /pe | U | | | s5 | | | Pa | rse | | t | 2 | | | | | | | | | d5 | | | | | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|----------------------|---|----|---|---|------------------------------------| | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memw(Rs+#u6:2) | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memw(Rs+#u6:2) | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memw(Rs+#u6:2) | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | i | i | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memw(Rs+#u6:2) | | | ICL | ASS | 3 | Α | mod | le | | Туре | Э | U | | | х5 | | | Pai | rse | | | | ť | 2 | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt)<br>Rd=memw(Rx++#s4:2) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | 0 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt)<br>Rd=memw(Rx++#s4:2) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | 1 | 0 | t | t | i | i | i | i | d | d | d | d | d | if (Pt.new)<br>Rd=memw(Rx++#s4:2) | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | P | 1 | 1 | 1 | t | t | i | i | i | i | d | d | d | d | d | if (!Pt.new)<br>Rd=memw(Rx++#s4:2) | | | ICL | ASS | 3 | Α | mod | de | | Тур | Э | U<br>N | | | | | | Pai | rse | | | | t | 2 | | | | | <b>i d d d d</b> d d | | | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | i | i | i | i | i | Р | Р | 1 | 0 | 0 | t | t | i | 1 | - | - | d | d | d | d | d | if (Pt) Rd=memw(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | i | i | i | - | i | Р | Р | 1 | 0 | 1 | t | t | i | 1 | - | • | d | d | d | d | d | if (!Pt) Rd=memw(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | i | i | i | i | i | Р | P | 1 | 1 | 0 | t | t | i | 1 | - | • | d | d | d | d | d | if (Pt.new) Rd=memw(#u6) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | i | i | i | i | i | Р | Ρ | 1 | 1 | 1 | t | t | i | 1 | - | - | d | d | d | d | d | if (!Pt.new) Rd=memw(#u6) | ### Field name ### Description ICLASS Instruction class Amode Amode Type Type UN Unsigned PredNew PredNew Sense Sense Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t2 Field to encode register t t5 Field to encode register t v2 Field to encode register v x5 Field to encode register x # Deallocate stack frame Deallocate a stack frame from the call stack. The instruction first loads the saved FP and saved LR values from the address at FP. It then points SP back to the previous frame. The stack layout is seen in the following figure. ### Stack in memory Syntax Behavior | Rdd=deallocframe(Rs):raw | <pre>EA=Rs; tmp = *EA; Rdd = frame_unscramble(tmp); SP=EA+8;</pre> | |--------------------------|--------------------------------------------------------------------| | deallocframe | Assembler mapped to: "r31:30=deallocframe(r30):raw" | # Class: LD (slots 0,1) # **Encoding** | 3 | 31 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-------|------|----|----|-----|----|----|------|----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------| | | IC | LASS | 3 | А | mod | de | | Туре | Э | U | | | s5 | | | Pai | rse | | | | | | | | | | | | d5 | | | | | Ť | 1 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | - | - | - | - | - | - | - | - | d | d | d | d | d | Rdd=deallocframe(Rs):raw | Field name Description ICLASS Instruction class Amode Amode Type Type UN Unsigned Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s ## Deallocate frame and return Return from a function with a stack frame. This instruction is equivalent to deallocframe followed by jumpr R31. #### Syntax #### **Behavior** ``` Rdd = dealloc return(Rs):raw EA = Rs; tmp = *EA; Rdd = frame unscramble(tmp); SP = EA + 8; PC = Rdd.w[1]; dealloc return Assembler mapped to: "r31:30=dealloc return(r30):raw" if ([!]Pv) Rdd = dealloc return(Rs):raw if ([!]Pv[0]) { tmp = *EA; Rdd = frame_unscramble(tmp); SP=EA+8; PC=Rdd.w[1]; } else { NOP; Assembler mapped to: "if ([!]Pv"") if ([!]Pv) dealloc return r31:30=dealloc return(r30)"":raw" if ([!]Pv.new) Rdd = EA=Rs; dealloc return(Rs):nt:raw if ([!]Pv.new[0]) { tmp = *EA; Rdd = frame unscramble(tmp); SP=EA+8; PC=Rdd.w[1]; } else { NOP; if ([!]Pv.new) Rdd = EA=Rs; dealloc return(Rs):t:raw if ([!]Pv.new[0]) { tmp = *EA; Rdd = frame_unscramble(tmp); SP=EA+8; PC=Rdd.w[1]; } else { NOP; Assembler mapped to: "if ([!]Pv"".new"") if ([!]Pv.new) dealloc return:nt r31:30=dealloc return(r30)"":nt"":raw" if ([!]Pv.new) Assembler mapped to: "if ([!]Pv"".new"") dealloc_return:t r31:30=dealloc return(r30)"":t"":raw" ``` # Class: LD (slots 0) # **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|------|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------------------------------------------------| | | ICL | .ASS | 3 | Α | mod | de | - | Туре | Э | U | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 0 | 0 | 0 | - | - | - | - | - | d | d | d | d | d | Rdd=dealloc_return(Rs):ra | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | P | 0 | 0 | 1 | 0 | v | v | - | | - | d | d | d | d | d | if (Pv.new)<br>Rdd=dealloc_return(Rs):nt:<br>raw | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 1 | 0 | 0 | v | v | - | - | - | d | d | d | d | d | if (Pv)<br>Rdd=dealloc_return(Rs):ra<br>w | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 1 | 1 | 0 | v | v | - | - | - | d | d | d | d | d | if (Pv.new)<br>Rdd=dealloc_return(Rs):t:r<br>aw | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | 0 | 1 | 0 | v | v | - | - | - | d | d | d | d | d | if (!Pv.new)<br>Rdd=dealloc_return(Rs):nt:<br>raw | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | 1 | 0 | 0 | v | v | - | - | - | d | d | d | d | d | if (!Pv)<br>Rdd=dealloc_return(Rs):ra<br>w | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 1 | 1 | 1 | 0 | v | v | - | - | - | d | d | d | d | d | if (!Pv.new)<br>Rdd=dealloc_return(Rs):t:r<br>aw | ### Field name ## Description ICLASS Instruction class Amode Amode Type Type UN Unsigned Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s v2 Field to encode register v # Load and unpack bytes to halfwords Load contiguous bytes from memory and vector unpack them into halfwords. 80-N2040-51 Rev. AB 234 Re=#U; Assembler mapped to: "Rd=membh""(Rs+#0)" Rd=membh(Rs) **Behavior** ### Syntax ``` Rd=membh (Rs+#s11:1) apply extension(#s); EA=Rs+#s; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.b[i]; Rd=membh (Rt<<#u2+#U6) apply extension(#U); EA = \#U + (Rt << \#u); tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.b[i]; Rd=membh(Rx++#s4:1) EA=Rx; Rx=Rx+#s; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.b[i]; Rd=membh(Rx++#s4:1:circ(Mu)) EA=Rx; Rx=Rx=circ add(Rx, #s, MuV); tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.b[i]; Rd=membh(Rx++I:circ(Mu)) EA=Rx; Rx=Rx=circ_add(Rx,I<<1,MuV);</pre> tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.b[i]; Rd=membh (Rx++Mu) EA=Rx; Rx=Rx+MuV; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.b[i]; ``` #### **Syntax** #### **Behavior** ``` Rd=membh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]); Rx=Rx+MuV; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.b[i]; Rd=memubh (Re=#U6) apply extension(#U); EA=#U; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.ub[i]; } Re=#U; Rd=memubh (Rs+#s11:1) apply extension(#s); EA=Rs+#s; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.ub[i]; Rd=memubh (Rt<<#u2+#U6) apply extension(#U); EA = \#U + (Rt << \#u); tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.ub[i]; Rd=memubh(Rx++#s4:1) EA=Rx; Rx=Rx+#s; tmpV = *EA; for (i=0; i<2; i++) { Rd.h[i]=tmpV.ub[i]; Rd = memubh(Rx ++ EA=Rx; #s4:1:circ(Mu)) Rx=Rx=circ_add(Rx, #s, MuV); tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.ub[i]; ``` #### Syntax Behavior ``` Rd=memubh(Rx++I:circ(Mu)) EA=Rx; Rx=Rx=circ add(Rx,I<<1,MuV);</pre> tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.ub[i]; Rd=memubh (Rx++Mu) EA=Rx; Rx=Rx+MuV; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i]=tmpV.ub[i]; Rd=memubh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]); Rx=Rx+MuV; tmpV = *EA; for (i=0;i<2;i++) { Rd.h[i] = tmpV.ub[i]; Rdd=membh (Re=#U6) apply extension(#U); EA=#U; tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.b[i]; Re=\#U; Rdd=membh(Rs) Assembler mapped to: "Rdd=membh""(Rs+#0)" Rdd=membh(Rs+#s11:2) apply extension(#s); EA=Rs+#s; tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.b[i]; Rdd=membh (Rt<<#u2+#U6) apply extension(#U); EA = \#U + (Rt << \#u); tmpV = *EA; for (i=0;i<4;i++) { Rdd.h[i]=tmpV.b[i]; ``` #### **Syntax** #### **Behavior** ``` Rdd=membh(Rx++#s4:2) EA=Rx; Rx=Rx+#s; tmpV = *EA; for (i=0;i<4;i++) { Rdd.h[i]=tmpV.b[i]; Rdd = membh(Rx ++ EA=Rx; #s4:2:circ(Mu)) Rx=Rx=circ add(Rx, #s, MuV); tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.b[i]; Rdd=membh(Rx++I:circ(Mu)) Rx=Rx=circ_add(Rx,I<<2,MuV);</pre> tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.b[i]; EA=Rx; Rdd=membh (Rx++Mu) Rx=Rx+MuV; tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.b[i]; Rdd=membh(Rx++Mu:brev) EA=Rx.h[1] \mid brev(Rx.h[0]); Rx=Rx+MuV; tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.b[i]; Rdd=memubh (Re=#U6) apply extension(#U); EA=#U; tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.ub[i]; Re=#U; ``` #### Syntax #### **Behavior** ``` Rdd=memubh(Rs+#s11:2) apply extension(#s); EA=Rs+#s; tmpV = *EA; for (i=0;i<4;i++) { Rdd.h[i]=tmpV.ub[i]; Rdd=memubh (Rt<<#u2+#U6) apply extension(#U); EA = \#U + (Rt << \#u); tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.ub[i]; Rdd=memubh(Rx++#s4:2) EA=Rx; Rx=Rx+#s; tmpV = *EA; for (i=0;i<4;i++) { Rdd.h[i]=tmpV.ub[i]; EA=Rx; Rdd = memubh(Rx ++ #s4:2:circ(Mu)) Rx=Rx=circ add(Rx, #s, MuV); tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.ub[i]; Rdd=memubh(Rx++I:circ(Mu)) EA=Rx; Rx=Rx=circ_add(Rx,I<<2,MuV);</pre> tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.ub[i]; Rdd=memubh (Rx++Mu) EA=Rx; Rx=Rx+MuV; tmpV = *EA; for (i=0; i<4; i++) { Rdd.h[i]=tmpV.ub[i]; ``` ### Syntax Behavior Class: LD (slots 0,1) # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|-----|------|----------|--------|----|----|----|----|----|--------|--------|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------------------------------| | | ICL | | | | mod | | | Туре | <u> </u> | U | | | s5 | | | | rse | | | | | | | | | | | | d5 | _ | | | | 1 | 0 | 0 | 1 | 0 | i | i | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=membh(Rs+#s11:1) | | 1 | 0 | 0 | 1 | 0 | i | i | 0 | 0 | 1 | 1 | s | s | s | s | s | ·<br>P | ·<br>P | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=memubh(Rs+#s11:1) | | 1 | 0 | 0 | 1 | 0 | i | i | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=memubh(Rs+#s11:2) | | 1 | 0 | 0 | 1 | 0 | i | i | 0 | 1 | 1 | 1 | s | s | s | s | s | P | P | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=membh(Rs+#s11:2) | | | ICL. | | | | mod | | | Туре | l | U | | | x5 | | _ | | rse | u1 | - | | | - | | - | | | | - | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | 0 | i | i | i | i | d | d | d | d | d | Rd=membh(Rx++#s4:1:circ (Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | d | d | d | d | d | Rd=membh(Rx++I:circ(Mu) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | • | - | 0 | - | i | i | i | d | d | d | d | d | Rd=memubh(Rx++#s4:1:cir c(Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | • | 0 | - | - | d | d | d | d | d | Rd=memubh(Rx++I:circ(Mu )) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | • | - | 0 | i | i | i | i | d | d | d | d | d | Rdd=memubh(Rx++#s4:2:c irc(Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | d | d | d | d | d | Rdd=memubh(Rx++I:circ(M u)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | • | - | 0 | i | i | i | i | d | d | d | d | d | Rdd=membh(Rx++#s4:2:cir c(Mu)) | | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | 1 | - | 0 | - | - | d | d | d | d | d | Rdd=membh(Rx++I:circ(Mu )) | | | ICL | ASS | 3 | A | mod | le | | Тур | е | U<br>N | | | е5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | е | е | е | е | е | Р | Р | 0 | 1 | ı | I | I | ı | - | I | I | d | d | d | d | d | Rd=membh(Re=#U6) | | | ICL. | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | х5 | | | Pa | | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rd=membh(Rx++#s4:1) | | | ICL | ASS | 6 | Α | mod | de | | Тур | е | U<br>N | | | e5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | е | е | е | е | е | Р | Р | 0 | 1 | ı | I | I | ı | - | ı | I | d | d | d | d | d | Rd=memubh(Re=#U6) | | | ICL | ASS | 3 | Α | mod | le | | Туре | е | U | | | х5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rd=memubh(Rx++#s4:1) | | | ICL. | ASS | ; | Α | mod | de | | Тур | е | U<br>N | | | е5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | е | е | е | е | е | Р | Р | 0 | 1 | ı | I | I | ı | - | ı | I | d | d | d | d | d | Rdd=memubh(Re=#U6) | | | ICL. | ASS | 3 | A | mod | de | L · | Туре | е | U<br>N | | | х5 | | | Pa | rse | | _ | | | | | | | | | | d5 | _ | _ | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | x | x | x | x | х | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rdd=memubh(Rx++#s4:2) | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------------------------| | | ICL | ASS | 3 | Α | mod | le | | Туре | Э | U<br>N | | | e5 | | | Pa | rse | | | | 1 | | | 1 | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | е | е | е | е | е | Р | Р | 0 | 1 | ı | ı | ı | ı | - | ı | ı | d | d | d | d | d | Rdd=membh(Re=#U6) | | | ICL. | ASS | 6 | А | mod | le | • | Туре | Э | U<br>N | | | x5 | | • | Pa | rse | | • | | | • | • | | • | • | | | d5 | | • | | | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | x | x | x | x | x | Р | Р | 0 | 0 | - | - | - | i | i | i | i | d | d | d | d | d | Rdd=membh(Rx++#s4:2) | | | ICL | ASS | 6 | Α | mod | le | | Туре | е | U<br>N | | | t5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | t | t | t | t | t | Р | Р | i | 1 | I | I | I | I | i | I | I | d | d | d | d | d | Rd=membh(Rt<<#u2+#U6) | | | ICL | ASS | 3 | А | mod | de | • | Туре | Э | U<br>N | | | х5 | | | Pa | rse | u1 | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=membh(Rx++Mu) | | | ICL | ASS | 3 | Α | moc | le | | Туре | е | U<br>N | | | t5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | t | t | t | t | t | Р | Р | i | 1 | 1 | ı | ı | ı | i | ı | ı | d | d | d | d | d | Rd=memubh(Rt<<#u2+#U<br>6) | | | ICL. | ASS | 3 | А | mod | le | • | Туре | Э | U | | | x5 | | • | Pa | rse | u1 | | | | • | | | | | | | d5 | | • | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rd=memubh(Rx++Mu) | | | ICL | ASS | 3 | А | mod | de | • | Туре | Э | U<br>N | | | t5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | t | t | t | t | t | Р | Р | i | 1 | ı | I | ı | ı | i | ı | ı | d | d | d | d | d | Rdd=memubh(Rt<<#u2+#<br>U6) | | | ICL | ASS | 3 | А | mod | le | | Туре | е | U<br>N | | | x5 | | • | Pa | rse | u1 | | | | • | | | | | | | d5 | | • | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rdd=memubh(Rx++Mu) | | | ICL | ASS | 3 | Α | mod | de | • | Туре | æ | U | | | t5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | t | t | t | t | t | Р | Р | i | 1 | ı | ı | ı | ı | i | ı | ı | d | d | d | d | d | Rdd=membh(Rt<<#u2+#U<br>6) | | | ICL. | ASS | 3 | А | mod | le | • | Туре | Э | U<br>N | | | x5 | | • | Pa | rse | u1 | | | | • | | | | | | | d5 | | • | | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | - | - | - | - | 0 | - | - | d | d | d | d | d | Rdd=membh(Rx++Mu) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | • | - | - | - | 0 | - | - | d | d | d | d | d | Rd=membh(Rx++Mu:brev) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | x | x | x | x | x | Р | P | u | 0 | • | - | - | - | 0 | _ | - | d | d | d | d | d | Rd=memubh(Rx++Mu:brev ) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | x | x | x | x | x | P | P | u | 0 | • | - | - | - | 0 | | - | d | d | d | d | d | Rdd=memubh(Rx++Mu:bre v) | | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | x | x | x | x | x | Р | Р | u | 0 | • | - | - | - | 0 | - | - | d | d | d | d | d | Rdd=membh(Rx++Mu:brev ) | ### Field name Instruction class ICLASS ## Description | Amode | Amode | |-------|----------------------------| | Type | Туре | | UN | Unsigned | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | x5 | Field to encode register x | # **11.6 MEMOP** The MEMOP instruction class includes simple operations on values in memory. MEMOP instructions execute on slot 0. # Operation on memory byte Perform ALU or bit operation on the memory byte at the effective address. Syntax Behavior ``` memb(Rs+#u6:0)=clrbit(#U5) apply extension(#u); EA=Rs+#u; tmp = *EA; tmp &= (\sim(1<<\#U)); *EA = tmp; memb(Rs+#u6:0)=setbit(#U5) apply extension(#u); EA=Rs+#u; tmp = *EA; tmp |= (1 << \#U); *EA = tmp; memb (Rs+\#u6:0) [+-]=\#U5 apply extension(#u); EA=Rs[+-]#u; tmp = *EA; tmp [+-] = #U; *EA = tmp; memb (Rs+\#u6:0) [+-|&]=Rt apply_extension(#u); EA=Rs+#u; tmp = *EA; tmp [+-|&] = Rt; *EA = tmp; ``` Class: MEMOP (slots 0) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------| | | ICL/ | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 0 | t | t | t | t | t | memb(Rs+#u6:0)+=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 1 | t | t | t | t | t | memb(Rs+#u6:0)-=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 0 | t | t | t | t | t | memb(Rs+#u6:0)&=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 1 | t | t | t | t | t | memb(Rs+#u6:0) =Rt | | | ICL/ | ASS | 3 | | | • | | | | | | • | s5 | | | Pa | rse | | | | • | | • | • | | • | • | | | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 0 | I | I | I | I | ı | memb(Rs+#u6:0)+=#U5 | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 1 | I | I | I | I | ı | memb(Rs+#u6:0)-=#U5 | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 0 | ı | ı | ı | ı | ı | memb(Rs+#u6:0)=clrbit(#U<br>5) | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 1 | ı | ı | ı | ı | ı | memb(Rs+#u6:0)=setbit(#<br>U5) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | # Operation on memory halfword Perform ALU or bit operation on the memory halfword at the effective address. | memh(Rs+#u6:1)=clrbit(#U5) | <pre>apply_extension(#u); EA=Rs+#u; tmp = *EA; tmp &amp;= (~(1&lt;&lt;#U)); *EA = tmp;</pre> | |----------------------------|----------------------------------------------------------------------------------------------| | memh(Rs+#u6:1)=setbit(#U5) | <pre>apply_extension(#u); EA=Rs+#u; tmp = *EA; tmp = (1&lt;&lt;#U); *EA = tmp;</pre> | | memh (Rs+#u6:1) [+-]=#U5 | <pre>apply_extension(#u); EA=Rs[+-]#u; tmp = *EA; tmp [+-]= #U; *EA = tmp;</pre> | | memh(Rs+#u6:1)[+- &]=Rt | <pre>apply_extension(#u); EA=Rs+#u; tmp = *EA; tmp [+- &amp;]= Rt; *EA = tmp;</pre> | Class: MEMOP (slots 0) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------| | | ICL. | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 0 | t | t | t | t | t | memh(Rs+#u6:1)+=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 1 | t | t | t | t | t | memh(Rs+#u6:1)-=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 0 | t | t | t | t | t | memh(Rs+#u6:1)&=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 1 | t | t | t | t | t | memh(Rs+#u6:1) =Rt | | | ICL | ASS | 3 | | | • | | | | | | • | s5 | | • | Pa | rse | | | • | | | • | • | | | | | • | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 0 | ı | I | I | I | ı | memh(Rs+#u6:1)+=#U5 | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 1 | ı | I | I | I | I | memh(Rs+#u6:1)-=#U5 | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 0 | ı | ı | ı | ı | - | memh(Rs+#u6:1)=clrbit(#U<br>5) | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 0 | 1 | s | s | s | s | s | Р | P | 0 | i | i | i | i | i | i | 1 | 1 | ı | ı | ı | ı | ı | memh(Rs+#u6:1)=setbit(#<br>U5) | Field name Description | Instruction class | |----------------------------| | Packet/loop parse bits | | Field to encode register s | | Field to encode register t | | | # Operation on memory word Perform ALU or bit operation on the memory word at the effective address. ## Syntax Behavior | memw(Rs+#u6:2)=clrbit(#U5) | <pre>apply_extension(#u); EA=Rs+#u; tmp = *EA; tmp &amp;= (~(1&lt;&lt;#U)); *EA = tmp;</pre> | |----------------------------|----------------------------------------------------------------------------------------------| | memw(Rs+#u6:2)=setbit(#U5) | <pre>apply_extension(#u); EA=Rs+#u; tmp = *EA; tmp = (1&lt;&lt;#U); *EA = tmp;</pre> | | memw(Rs+#u6:2)[+-]=#U5 | <pre>apply_extension(#u); EA=Rs[+-]#u; tmp = *EA; tmp [+-]= #U; *EA = tmp;</pre> | | memw(Rs+#u6:2)[+- &]=Rt | <pre>apply_extension(#u); EA=Rs+#u; tmp = *EA; tmp [+- &amp;]= Rt; *EA = tmp;</pre> | Class: MEMOP (slots 0) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------| | | ICL. | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 0 | t | t | t | t | t | memw(Rs+#u6:2)+=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 1 | t | t | t | t | t | memw(Rs+#u6:2)-=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 0 | t | t | t | t | t | memw(Rs+#u6:2)&=Rt | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 1 | t | t | t | t | t | memw(Rs+#u6:2) =Rt | | | ICL | ASS | 3 | | | • | | | • | | | , | s5 | • | | Pa | rse | | | • | | | • | • | | | | | | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 0 | ı | ı | ı | I | ı | memw(Rs+#u6:2)+=#U5 | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 0 | 1 | ı | I | I | I | I | memw(Rs+#u6:2)-=#U5 | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 0 | ı | ı | ı | ı | ı | memw(Rs+#u6:2)=clrbit(#U 5) | | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | - | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | 1 | 1 | ı | ı | ı | ı | ı | memw(Rs+#u6:2)=setbit(#<br>U5) | ### Field name Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | s5 | Field to encode register s | | t5 | Field to encode register t | ## 11.7 NV The NV instruction class includes instructions that take the register source operand from another instruction in the same packet. NV instructions execute on slot 0. ### 11.7.1 NV J The NV J instruction subclass includes jump instructions that take the register source operand from another instruction in the same packet. # Jump to address condition on new register value Compare a register or constant against the value produced by a slot 1 instruction. If the comparison is true, the program counter is changed to a target address, relative to the current PC. This instruction executes only on slot 0. Syntax Behavior ``` if ([!]cmp.eq(Ns.new,#-1)) if ((Ns.new[!]=(-1))) { jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; } if ((Ns.new[!]=(#U))) { if ([!]cmp.eq(Ns.new, #U5)) jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; if ([!]cmp.eq(Ns.new,Rt)) if ((Ns.new[!]=Rt)) { jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; if ([!]cmp.qt(Ns.new,#-1)) if ([!](Ns.new>(-1))) { jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; if ([!](Ns.new>(#U))) { if ([!]cmp.gt(Ns.new, #U5)) apply_extension(#r); jump:<hint> #r9:2 #r=#r & ~PCALIGN MASK; PC=PC+#r; if ([!]cmp.gt(Ns.new,Rt)) if ([!](Ns.new>Rt)) { jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; ``` #### **Syntax** #### **Behavior** ``` if ([!]cmp.gt(Rt,Ns.new)) if ([!](Rt>Ns.new)) { jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; if ([!](Ns.new.uw[0]>(#U))) { if ([!]cmp.gtu(Ns.new, #U5)) jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; } if ([!](Ns.new.uw[0]>Rt.uw[0])) { if ([!]cmp.gtu(Ns.new,Rt)) jump:<hint> #r9:2 apply extension(#r); #r=#r & ~PCALIGN MASK; PC=PC+#r; } if ([!]cmp.gtu(Rt,Ns.new)) if ([!](Rt.uw[0]>Ns.new.uw[0])) { apply_extension(#r); jump:<hint> #r9:2 #r=#r & ~PCALIGN MASK; PC=PC+#r; if ([!]tstbit(Ns.new,#0)) if ([!]((Ns.new) & 1)) { jump:<hint> #r9:2 apply_extension(#r); #r=#r & ~PCALIGN_MASK; PC=PC+#r; ``` Class: NV (slots 0) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------------------------------| | | ICL | ASS | 3 | | | | | | | | | | | s3 | | Pa | rse | | | | t5 | | | | | | | | | | | | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.eq(Ns.new,Rt))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.eq(Ns.new,Rt))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.eq(Ns.new,Rt))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.eq(Ns.new,Rt))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.gt(Ns.new,Rt))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.gt(Ns.new,Rt))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.gt(Ns.new,Rt))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.gt(Ns.new,Rt))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.gtu(Ns.new,Rt))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.gtu(Ns.new,Rt))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.gtu(Ns.new,Rt))<br>jump:nt #r9:2 | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|--------------------------------------------| | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.gtu(Ns.new,Rt))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.gt(Rt,Ns.new))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (cmp.gt(Rt,Ns.new))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | • | if (!cmp.gt(Rt,Ns.new))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | i | i | - | ø | s | s | Р | Р | 1 | t | t | t | t | t | i | ï | i | i | i | i | i | 1 | if (!cmp.gt(Rt,Ns.new))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | - | i | i | i | i | i | i | - | if (cmp.gtu(Rt,Ns.new))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | i | i | - | ø | s | s | Р | Р | 1 | t | t | t | t | t | i | - | · | - | i | · | i | • | if (cmp.gtu(Rt,Ns.new))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | i | i | - | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.gtu(Rt,Ns.new))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | i | i | - | s | s | s | Р | Р | 1 | t | t | t | t | t | i | i | i | i | i | i | i | - | if (!cmp.gtu(Rt,Ns.new))<br>jump:t #r9:2 | | | ICL | ASS | 3 | | | | | | | | | | | s3 | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | i | i | - | s | s | s | Р | Р | 0 | ı | ı | ı | I | I | i | i | i | i | i | i | i | - | if (cmp.eq(Ns.new,#U5))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | i | i | - | s | s | s | Р | Р | 1 | ı | I | ı | I | I | i | i | i | i | i | i | i | - | if (cmp.eq(Ns.new,#U5))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | i | i | - | ø | s | s | Р | Р | 0 | ı | - | I | I | _ | i | ï | ï | - | i | - | i | • | if (!cmp.eq(Ns.new,#U5))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | i | i | - | s | s | s | Р | Р | 1 | ı | ı | ı | I | ı | i | i | i | i | i | i | i | - | if (!cmp.eq(Ns.new,#U5))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | i | i | - | s | s | s | Р | Р | 0 | ı | ı | ı | I | ı | i | i | i | i | i | i | i | - | if (cmp.gt(Ns.new,#U5))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | i | i | - | s | s | s | Р | Р | 1 | ı | ı | ı | ı | ı | i | i | i | i | i | i | i | • | if (cmp.gt(Ns.new,#U5))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | i | i | - | s | s | s | Р | Р | 0 | I | I | ı | I | I | i | i | i | i | i | i | i | - | if (!cmp.gt(Ns.new,#U5))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | i | i | - | s | s | s | Р | Р | 1 | I | I | ı | I | I | i | i | i | i | i | i | i | - | if (!cmp.gt(Ns.new,#U5))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | i | i | - | s | s | s | Р | Р | 0 | I | I | ı | I | I | i | i | i | i | i | i | i | - | if (cmp.gtu(Ns.new,#U5))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | i | i | - | s | s | s | Р | Р | 1 | ı | ı | ı | I | ı | i | i | i | i | i | i | i | - | if (cmp.gtu(Ns.new,#U5))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | i | i | - | s | s | s | Р | Р | 0 | ı | ı | ı | I | ı | i | i | i | i | i | i | i | - | if (!cmp.gtu(Ns.new,#U5))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | i | i | - | s | s | s | Р | Р | 1 | ı | ı | ı | I | ı | i | i | i | i | i | i | i | - | if (!cmp.gtu(Ns.new,#U5))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | i | i | - | s | s | s | Р | Р | 0 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (tstbit(Ns.new,#0))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | i | i | - | s | s | s | Р | Р | 1 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (tstbit(Ns.new,#0)) jump:t<br>#r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | i | i | - | s | s | s | Р | Р | 0 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (!tstbit(Ns.new,#0))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | i | i | - | s | s | s | Р | Р | 1 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (!tstbit(Ns.new,#0)) jump:t<br>#r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | i | i | - | s | s | s | Р | Р | 0 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (cmp.eq(Ns.new,#-1))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | i | i | - | s | s | s | Р | Р | 1 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (cmp.eq(Ns.new,#-1))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | i | i | - | s | s | s | Р | Р | 0 | - | • | - | - | • | i | · | · | - | i | - | i | - | if (!cmp.eq(Ns.new,#-1))<br>jump:nt #r9:2 | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------------------------------| | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | i | i | - | s | s | s | Р | Р | 1 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (!cmp.eq(Ns.new,#-1))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | i | i | - | s | s | s | Р | Р | 0 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (cmp.gt(Ns.new,#-1))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | i | i | - | s | s | s | Р | Р | 1 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (cmp.gt(Ns.new,#-1))<br>jump:t #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | i | i | - | s | s | s | Р | Р | 0 | - | - | - | - | - | i | i | i | i | i | i | i | - | if (!cmp.gt(Ns.new,#-1))<br>jump:nt #r9:2 | | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | i | i | - | s | s | s | Р | Р | 1 | - | - | - | 1 | - | ï | i | i | i | i | i | i | - | if (!cmp.gt(Ns.new,#-1))<br>jump:t #r9:2 | ## Field name Description ICLASS Instruction class Parse Packet/loop parse bits s3 Field to encode register s t5 Field to encode register t # 11.7.2 NV ST The NV ST instruction subclass includes store instructions that take the register source operand from another instruction in the same packet. # Store new-value byte Store the least-significant byte in a source register in memory at the effective address. | Syntax | Behavior | |---------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------| | memb(Re=#U6)=Nt.new | <pre>apply_extension(#U); EA=#U; *EA = Nt.new.b[0]; Re=#U;</pre> | | memb(Rs+#s11:0)=Nt.new | <pre>apply_extension(#s); EA=Rs+#s; *EA = Nt.new.b[0];</pre> | | memb(Rs+Ru<<#u2)=Nt.new | EA=Rs+(Ru<<#u);<br>*EA = Nt.new.b[0]; | | memb(Ru<<#u2+#U6)=Nt.new | <pre>apply_extension(#U);<br/>EA=#U+(Ru&lt;&lt;#u);<br/>*EA = Nt.new.b[0];</pre> | | memb(Rx++#s4:0)=Nt.new | EA=Rx;<br>Rx=Rx+#s;<br>*EA = Nt.new.b[0]; | | $\label{eq:memb} \begin{array}{l} \text{memb} \left( \text{Rx++} \# \text{s4:0:circ} \left( \text{Mu} \right) \right) = \text{Nt.} \\ \text{new} \end{array}$ | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Nt.new.b[0];</pre> | | <pre>memb(Rx++I:circ(Mu))=Nt.new</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;0,MuV); *EA = Nt.new.b[0];</pre> | | memb(Rx++Mu)=Nt.new | EA=Rx;<br>Rx=Rx+MuV;<br>*EA = Nt.new.b[0]; | | memb(Rx++Mu:brev)=Nt.new | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Nt.new.b[0];</pre> | | memb(gp+#u16:0)=Nt.new | <pre>apply_extension(#u);<br/>EA=(Constant_extended ? (0) : GP)+#u;<br/>*EA = Nt.new.b[0];</pre> | # Class: NV (slots 0) ### **Notes** - Forms of this instruction that use a new-value operand produced in the packet must execute on slot 0. - This instruction can execute only in slot 0, even though it is an ST instruction. # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|--------|-----|----|-----|-------|----|------|------|------|--------|----|----|----|----|-------|-------|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|-------------------------------------| | T | ICL | ASS | 3 | | II | 1 | 1 | ll | | I | | I | s5 | | | Pai | rse | | | | u5 | | | | | ı | | 1 | | t3 | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | - | - | 0 | 0 | t | t | t | memb(Rs+Ru<<#u2)=Nt.n | | | ICL | ASS | 3 | | | | • | | Туре | Э | | | | | | | rse | | | • | t3 | | • | | | | • | • | | • | • | | | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 1 | 0 | 1 | i | i | i | i | i | Р | Р | i | 0 | 0 | t | t | t | i | i | i | i | i | i | i | i | memb(gp+#u16:0)=Nt.new | | | ICL | ASS | 3 | Α | mod | de | | Туре | | U<br>N | | s5 | | | | Parse | | | | t3 | | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 1 | 0 | 1 | ø | s | s | s | s | Р | Р | i | 0 | 0 | t | t | t | i | i | i | i | i | i | i | i | memb(Rs+#s11:0)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Тур | е | U<br>N | | | х5 | | | Pai | rse | u1 | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 0 | t | t | t | 0 | - | - | - | - | - | 1 | - | memb(Rx++I:circ(Mu))=Nt. new | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 0 | t | t | t | 0 | i | i | i | i | - | 0 | - | memb(Rx++#s4:0:circ(Mu))<br>=Nt.new | | | ICL. | ASS | 3 | А | Amode | | | Туре | | U | | e5 | | | | Pai | rse | | | | | t3 | | | | • | | | • | • | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | Ф | е | е | е | е | Р | Р | 0 | 0 | 0 | t | t | t | 1 | - | I | I | I | I | I | I | memb(Re=#U6)=Nt.new | | | ICL. | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | х5 | | | Pai | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | X | x | x | x | x | Р | Р | 0 | 0 | 0 | t | t | t | 0 | i | i | i | i | - | 0 | - | memb(Rx++#s4:0)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Тур | е | U<br>N | | | u5 | | | Pai | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | u | u | u | u | u | Р | Р | i | 0 | 0 | t | t | t | 1 | i | ı | ı | ı | ı | ı | ı | memb(Ru<<#u2+#U6)=Nt.<br>new | | | ICLASS | | Α | mod | de | | Туре | | U | x5 | | х5 | i | | Parse | | u1 | | | | t3 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 0 | t | t | t | 0 | - | - | - | - | - | - | - | memb(Rx++Mu)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 0 | t | t | t | 0 | - | - | - | - | - | - | - | memb(Rx++Mu:brev)=Nt.n ew | | Field name | Description | |------------|-------------| |------------|-------------| | i icia mame | | |-------------|----------------------------| | ICLASS | Instruction class | | Туре | Туре | | Parse | Packet/loop parse bits | | e5 | Field to encode register e | | s5 | Field to encode register s | | t3 | Field to encode register t | | u1 | Field to encode register u | | u5 | Field to encode register u | | x5 | Field to encode register x | | Amode | Amode | | UN | Unsigned | | | | # Store new-value byte conditionally Store the least-significant byte in a source register in memory at the effective address. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. #### **Syntax Behavior** if ([!]Pv[.new]) apply extension(#u); memb(#u6)=Nt.new EA=#u; if ([!]Pv[.new][0]) { \*EA = Nt[.new].b[0];} else { NOP; if ([!]Pv[.new]) apply extension(#u); memb(Rs+#u6:0)=Nt.new EA=Rs+#u; if ([!]Pv[.new][0]) { \*EA = Nt[.new].b[0];} else { NOP; if ([!]Pv[.new]) EA=Rs+(Ru<<#u);memb(Rs+Ru<<#u2)=Nt.new if ([!]Pv[.new][0]) { \*EA = Nt[.new].b[0];} else { NOP; if ([!]Pv[.new]) EA=Rx: memb(Rx++#s4:0)=Nt.newif ([!]Pv[.new][0]){ Rx=Rx+#s;\*EA = Nt[.new].b[0];} else { NOP; ### Class: NV (slots 0) #### **Notes** - Forms of this instruction that use a new-value operand produced in the packet must execute on slot 0. - This instruction can execute only in slot 0, even though it is an ST instruction. # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|---------------|---------------------|----|-----|------|--------|----|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---------------------------------------------| | Г | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | u5 | | | | | | | | | t3 | | | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | i | u | u | u | u | u | i | v | v | 0 | 0 | t | t | t | if (Pv)<br>memb(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | 0 | 0 | t | t | t | if (!Pv)<br>memb(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | 0 | 0 | t | t | t | if (Pv.new)<br>memb(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | P | Р | i | u | u | u | u | u | · | ٧ | v | 0 | 0 | t | t | t | if (!Pv.new)<br>memb(Rs+Ru<<#u2)=Nt.n<br>ew | | | ICL | ASS | 3 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | | Туре | Э | | | s5 | | | Ра | rse | | | | | t3 | | | | | | | | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 0 | 0 | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv)<br>memb(Rs+#u6:0)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 0 | 0 | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv.new)<br>memb(Rs+#u6:0)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 0 | 0 | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv)<br>memb(Rs+#u6:0)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 0 | 0 | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memb(Rs+#u6:0)=Nt.new | | | ICL. | ASS | 3 | Α | mod | de | - | Тур | Э | U | | | х5 | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | t | 0 | i | i | i | i | 0 | v | v | if (Pv)<br>memb(Rx++#s4:0)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | t | 0 | i | i | i | i | 1 | v | ٧ | if (!Pv)<br>memb(Rx++#s4:0)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | t | 1 | - | i | i | i | 0 | ٧ | ٧ | if (Pv.new)<br>memb(Rx++#s4:0)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | 1 | 0 | 0 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memb(Rx++#s4:0)=Nt.new | | | ICL | ASS | 3 | А | mod | le | - | Тур | Э | U<br>N | | | | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 0 | 0 | 0 | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv) memb(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 0 | 0 | 0 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv) memb(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 1 | 0 | 0 | t | t | t | 1 | - | i | i | i | 0 | ٧ | v | if (Pv.new)<br>memb(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 1 | 0 | 0 | t | t | t | 1 | i | i | i | i | 1 | v | ٧ | if (!Pv.new)<br>memb(#u6)=Nt.new | ### Field name ### Description ICLASS Instruction class Type Type PredNew PredNew Sense Sense Parse Packet/loop parse bits s5 Field to encode register s t3 Field to encode register t u5 Field to encode register u v2 Field to encode register v x5 Field to encode register x Amode Amode UN Unsigned # Store new-value halfword Store the upper or lower 16 bits of a source register in memory at the effective address. | Syntax | Behavior | |-------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------| | memh(Re=#U6)=Nt.new | <pre>apply_extension(#U); EA=#U; *EA = Nt.new.h[0]; Re=#U;</pre> | | memh (Rs+#s11:1) =Nt.new | <pre>apply_extension(#s); EA=Rs+#s; *EA = Nt.new.h[0];</pre> | | memh(Rs+Ru<<#u2)=Nt.new | <pre>EA=Rs+(Ru&lt;&lt;#u); *EA = Nt.new.h[0];</pre> | | memh(Ru<<#u2+#U6)=Nt.new | <pre>apply_extension(#U);<br/>EA=#U+(Ru&lt;&lt;#u);<br/>*EA = Nt.new.h[0];</pre> | | memh (Rx++#s4:1) =Nt.new | <pre>EA=Rx; Rx=Rx+#s; *EA = Nt.new.h[0];</pre> | | $\label{eq:memhass} \begin{array}{l} \text{memh} \; (\text{Rx++} \# \text{s4:1:circ} \; (\text{Mu}) \;) = \text{Nt.} \\ \text{new} \end{array}$ | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Nt.new.h[0];</pre> | | <pre>memh(Rx++I:circ(Mu))=Nt.new</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;1,MuV); *EA = Nt.new.h[0];</pre> | | memh(Rx++Mu)=Nt.new | <pre>EA=Rx; Rx=Rx+MuV; *EA = Nt.new.h[0];</pre> | | <pre>memh (Rx++Mu:brev) =Nt.new</pre> | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Nt.new.h[0];</pre> | | memh(gp+#u16:1)=Nt.new | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; *EA = Nt.new.h[0];</pre> | ## Class: NV (slots 0) #### **Notes** - Forms of this instruction that use a new-value operand produced in the packet must execute on slot 0. - This instruction can execute only in slot 0, even though it is an ST instruction. # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|------|--------|----|----|----|----|----|----|-----|----|----|----|----|----|---|---|----|---|---|-----|---|----|---|-------------------------------------| | Г | ICL | ASS | 3 | | | | | I | | I | | | s5 | | | Pa | rse | | | | u5 | | | | II | | | II. | | t3 | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | - | - | 0 | 1 | t | t | t | memh(Rs+Ru<<#u2)=Nt.n<br>ew | | | ICL | ASS | 3 | | • | | • | - | Туре | Э | | • | • | | | Pa | rse | | • | | | t3 | | | | | | | | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 1 | 0 | 1 | i | i | i | i | i | Р | Р | i | 0 | 1 | t | t | t | i | i | i | i | i | i | i | i | memh(gp+#u16:1)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Туре | Э | U<br>N | | | s5 | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 0 | 1 | t | t | t | i | i | i | i | i | i | i | i | memh(Rs+#s11:1)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Туре | Э | U<br>N | | | х5 | | | Pa | rse | u1 | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 1 | t | t | t | 0 | - | - | - | - | - | 1 | - | memh(Rx++I:circ(Mu))=Nt.<br>new | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 1 | t | t | t | 0 | i | i | i | i | - | 0 | - | memh(Rx++#s4:1:circ(Mu))<br>=Nt.new | | | ICL | ASS | 3 | Α | mod | de | | Туре | Э | U | | | е5 | | | Pa | rse | | | | | t3 | • | | | • | | | • | | • | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | е | е | е | е | е | Р | Р | 0 | 0 | 1 | t | t | t | 1 | - | I | I | I | ı | I | I | memh(Re=#U6)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Туре | Э | U<br>N | | | х5 | | | Pa | rse | | | | | t3 | | | | | | · | | · | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | 0 | 0 | 1 | t | t | t | 0 | i | i | i | i | - | 0 | - | memh(Rx++#s4:1)=Nt.new | | | ICL | ASS | 3 | A | mod | de | | Туре | Э | U<br>N | | | u5 | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | u | u | u | u | u | Р | Р | i | 0 | 1 | t | t | t | 1 | i | ı | I | ı | ı | ı | ı | memh(Ru<<#u2+#U6)=Nt.<br>new | | | ICL. | ASS | 3 | А | mod | de | | Туре | Э | UN | | | x5 | | | Pa | rse | u1 | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 1 | t | t | t | 0 | - | - | - | - | - | - | - | memh(Rx++Mu)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 0 | 1 | t | t | t | 0 | - | - | - | - | - | - | - | memh(Rx++Mu:brev)=Nt.n ew | | Field name | | |------------|-------------------| | ICLASS | Instruction class | Type Type Packet/loop parse bits Parse e5 Field to encode register e Field to encode register s s5 Field to encode register t t3 u1 Field to encode register u u5 Field to encode register u Field to encode register x x5 Amode Amode UN Unsigned Description ## Store new-value halfword conditionally Store the upper or lower 16 bits of a source register in memory at the effective address. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. #### **Syntax Behavior** if ([!]Pv[.new]) apply extension(#u); memh(#u6)=Nt.new EA=#u; if ([!]Pv[.new][0]) { \*EA = Nt[.new].h[0];} else { NOP; if ([!]Pv[.new]) apply extension(#u); memh (Rs+#u6:1) =Nt.new EA=Rs+#u; if ([!]Pv[.new][0]) { \*EA = Nt[.new].h[0];} else { NOP; if ([!]Pv[.new]) EA=Rs+(Ru<<#u);memh (Rs+Ru<<#u2) =Nt.new if ([!]Pv[.new][0]) { \*EA = Nt[.new].h[0];} else { NOP; if ([!]Pv[.new]) EA=Rx: memh(Rx++#s4:1)=Nt.newif ([!]Pv[.new][0]){ Rx=Rx+#s;\*EA = Nt[.new].h[0];} else { NOP; ### Class: NV (slots 0) #### **Notes** - Forms of this instruction that use a new-value operand produced in the packet must execute on slot 0. - This instruction can execute only in slot 0, even though it is an ST instruction. # **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|---------------|---------------------|----|------|------|--------|----|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---------------------------------------------| | Г | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | u5 | | | | | | | | | t3 | | | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | 0 | 1 | t | t | t | if (Pv)<br>memh(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | 0 | 1 | t | t | t | if (!Pv)<br>memh(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | 0 | 1 | t | t | t | if (Pv.new)<br>memh(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | 0 | 1 | t | t | t | if (!Pv.new)<br>memh(Rs+Ru<<#u2)=Nt.n<br>ew | | | ICL | ASS | 8 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | - | Туре | • | | | s5 | | | Ра | rse | | | | | t3 | | | | | | | | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 0 | 1 | t | t | t | i | i | i | i | i | 0 | ٧ | ٧ | if (Pv)<br>memh(Rs+#u6:1)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | s | s | s | ø | s | Р | P | i | 0 | 1 | t | t | t | i | i | i | i | i | 0 | v | ٧ | if (Pv.new)<br>memh(Rs+#u6:1)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | i | 0 | 1 | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv)<br>memh(Rs+#u6:1)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | P | i | 0 | 1 | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memh(Rs+#u6:1)=Nt.new | | | ICL | ASS | 3 | Α | mod | le | - | Гуре | Э | U<br>N | | | х5 | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | 1 | 0 | 1 | t | t | t | 0 | i | i | i | i | 0 | v | v | if (Pv)<br>memh(Rx++#s4:1)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 0 | 1 | t | t | t | 0 | i | i | i | i | 1 | v | v | if (!Pv)<br>memh(Rx++#s4:1)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 0 | 1 | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memh(Rx++#s4:1)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 0 | 1 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memh(Rx++#s4:1)=Nt.new | | | ICL | ASS | 3 | Α | mod | le | - | Туре | 9 | U<br>N | | | | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 0 | 0 | 1 | t | t | t | 1 | i | i | i | i | 0 | ٧ | ٧ | if (Pv) memh(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 0 | 0 | 1 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv) memh(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 1 | 0 | 1 | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memh(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | - | i | Р | P | 1 | 0 | 1 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memh(#u6)=Nt.new | ### Field name ### Description ICLASS Instruction class Type Type PredNew PredNew Sense Sense Parse Packet/loop parse bits s5 Field to encode register s t3 Field to encode register t u5 Field to encode register u v2 Field to encode register v x5 Field to encode register x Amode Amode UN Unsigned ## Store new-value word Store a 32-bit register in memory at the effective address. | Syntax | Behavior | |----------------------------------------------|-------------------------------------------------------------------------------------| | memw(Re=#U6)=Nt.new | <pre>apply_extension(#U); EA=#U; *EA = Nt.new; Re=#U;</pre> | | memw(Rs+#s11:2)=Nt.new | <pre>apply_extension(#s); EA=Rs+#s; *EA = Nt.new;</pre> | | memw(Rs+Ru<<#u2)=Nt.new | EA=Rs+(Ru<<#u);<br>*EA = Nt.new; | | memw(Ru<<#u2+#U6)=Nt.new | <pre>apply_extension(#U); EA=#U+(Ru&lt;&lt;#u); *EA = Nt.new;</pre> | | memw(Rx++#s4:2)=Nt.new | EA=Rx;<br>Rx=Rx+#s;<br>*EA = Nt.new; | | <pre>memw(Rx++#s4:2:circ(Mu)) = Nt.new</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Nt.new;</pre> | | <pre>memw(Rx++I:circ(Mu))=Nt.new</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;2,MuV); *EA = Nt.new;</pre> | | memw(Rx++Mu)=Nt.new | EA=Rx;<br>Rx=Rx+MuV;<br>*EA = Nt.new; | | <pre>memw(Rx++Mu:brev)=Nt.new</pre> | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Nt.new;</pre> | | memw(gp+#u16:2)=Nt.new | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; *EA = Nt.new;</pre> | ## Class: NV (slots 0) #### **Notes** - Forms of this instruction that use a new-value operand produced in the packet must execute on slot 0. - This instruction can execute only in slot 0, even though it is an ST instruction. # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|----|----|-----|------|--------|----|----|----|----|----|-----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|-------------------------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pai | rse | | | | u5 | | | | | | | | | t3 | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | - | - | 1 | 0 | t | t | t | memw(Rs+Ru<<#u2)=Nt.n | | | ICL | ASS | 3 | | | | | | Туре | 9 | | | | | | Pai | rse | | | | | t3 | | | | | | | | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 1 | 0 | 1 | i | i | i | i | i | Р | Р | i | 1 | 0 | t | t | t | i | i | i | i | i | i | i | i | memw(gp+#u16:2)=Nt.new | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | s5 | | | Pai | rse | | | • | | t3 | | | | | • | • | | • | | | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 1 | 0 | t | t | t | i | i | i | i | i | i | i | i | memw(Rs+#s11:2)=Nt.new | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | x5 | | | Pai | rse | u1 | | | | t3 | | | • | | | • | • | • | • | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 1 | 0 | t | t | t | 0 | - | - | - | - | - | 1 | - | memw(Rx++I:circ(Mu))=Nt.<br>new | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 1 | 0 | t | t | t | 0 | i | i | i | i | - | 0 | - | memw(Rx++#s4:2:circ(Mu)<br>)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Тур | е | U | | | e5 | | | Pai | rse | | | • | | t3 | | | | | • | • | | • | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | е | е | е | е | е | P | Р | 0 | 1 | 0 | t | t | t | 1 | - | I | I | I | I | I | I | memw(Re=#U6)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Тур | е | U<br>N | | | х5 | | | Pai | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | X | x | Р | Р | 0 | 1 | 0 | t | t | t | 0 | i | i | i | i | - | 0 | - | memw(Rx++#s4:2)=Nt.new | | | ICL | ASS | 3 | А | mod | de | | Тур | е | U<br>N | | | u5 | | | Pai | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | u | u | u | u | u | Р | Р | i | 1 | 0 | t | t | t | 1 | i | ı | ı | ı | ı | ı | ı | memw(Ru<<#u2+#U6)=Nt.<br>new | | | ICL | ASS | 3 | А | mod | de | | Тур | е | U | | | x5 | | | Pai | rse | u1 | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 1 | 0 | t | t | t | 0 | - | - | - | - | - | - | - | memw(Rx++Mu)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | Р | u | 1 | 0 | t | t | t | 0 | - | - | - | - | - | - | - | memw(Rx++Mu:brev)=Nt.n | Description | Field name | | | |------------|-------------------|--| | ICLASS | Instruction class | | | Type | Туре | |-------|-------------------------------------------| | Parse | Packet/loop parse bits | | e5 | Field to encode register e | | s5 | Field to encode register s | | t3 | Field to encode register t | | u1 | Field to encode register u | | u5 | Field to encode register u | | x5 | Field to encode register $\boldsymbol{x}$ | Amode Amode Unsigned ## Store new-value word conditionally Store a 32-bit register in memory at the effective address. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. #### **Syntax Behavior** if ([!]Pv[.new]) apply extension(#u); memw(#u6)=Nt.new EA=#u; if ([!]Pv[.new][0]) { \*EA = Nt[.new];} else { NOP; if ([!]Pv[.new]) apply extension(#u); memw (Rs+#u6:2) =Nt.new EA=Rs+#u; if ([!]Pv[.new][0]) { \*EA = Nt[.new];} else { NOP; if ([!]Pv[.new]) EA=Rs+(Ru<<#u);memw (Rs+Ru<<#u2) =Nt.new if ([!]Pv[.new][0]) { \*EA = Nt[.new];} else { NOP; if ([!]Pv[.new]) EA=Rx: memw(Rx++#s4:2)=Nt.newif ([!]Pv[.new][0]){ Rx=Rx+#s;\*EA = Nt[.new];} else { NOP; ### Class: NV (slots 0) #### **Notes** - Forms of this instruction that use a new-value operand produced in the packet must execute on slot 0. - This instruction can execute only in slot 0, even though it is an ST instruction. # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|---------------|---------------------|----|------|------|--------|----|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---------------------------------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | u5 | | | | | | | | | t3 | | | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | P | i | u | u | u | u | u | i | v | v | 1 | 0 | t | t | t | if (Pv)<br>memw(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | P | Р | i | u | u | u | u | u | i | v | v | 1 | 0 | t | t | t | if (!Pv)<br>memw(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | 1 | 0 | t | t | t | if (Pv.new)<br>memw(Rs+Ru<<#u2)=Nt.n<br>ew | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | P | Р | i | u | u | u | u | u | i | v | v | 1 | 0 | t | t | t | if (!Pv.new)<br>memw(Rs+Ru<<#u2)=Nt.n<br>ew | | | ICL | ASS | 8 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | - | Туре | • | | | s5 | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | 1 | 0 | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv)<br>memw(Rs+#u6:2)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | ø | P | P | - | 1 | 0 | t | t | t | i | i | i | i | i | 0 | v | ٧ | if (Pv.new)<br>memw(Rs+#u6:2)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | P | i | 1 | 0 | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv)<br>memw(Rs+#u6:2)=Nt.new | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | P | i | 1 | 0 | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memw(Rs+#u6:2)=Nt.new | | | ICL | ASS | 3 | Α | mod | de | | Туре | Э | U<br>N | | | x5 | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 1 | 0 | t | t | t | 0 | i | i | i | i | 0 | v | v | if (Pv)<br>memw(Rx++#s4:2)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | Р | P | 1 | 1 | 0 | t | t | t | 0 | i | i | i | i | 1 | v | v | if (!Pv)<br>memw(Rx++#s4:2)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | P | P | 1 | 1 | 0 | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memw(Rx++#s4:2)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | x | x | x | x | x | P | P | 1 | 1 | 0 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memw(Rx++#s4:2)=Nt.new | | | ICL | ASS | 3 | Α | mod | de | | Туре | Э | U<br>N | | | | | | Pa | rse | | | | | t3 | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 0 | 1 | 0 | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv) memw(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 0 | 1 | 0 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv) memw(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | | - | i | i | Р | Р | 1 | 1 | 0 | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memw(#u6)=Nt.new | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | - | - | - | i | i | Р | Р | 1 | 1 | 0 | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memw(#u6)=Nt.new | ### Field name ### Description ICLASS Instruction class Type Type PredNew PredNew Sense Sense Parse Packet/loop parse bits s5 Field to encode register s t3 Field to encode register t u5 Field to encode register u v2 Field to encode register v x5 Field to encode register x Amode Amode UN Unsigned # 11.8 ST The ST instruction class includes store instructions, used to store values in memory. ST instructions execute on slot 0 and slot 1. ### Store doubleword Store a 64-bit register pair in memory at the effective address. | Syntax | Behavior | |-------------------------------------|----------------------------------------------------------------------------------| | memd(Re=#U6)=Rtt | <pre>apply_extension(#U); EA=#U; *EA = Rtt; Re=#U;</pre> | | memd(Rs+#s11:3)=Rtt | <pre>apply_extension(#s); EA=Rs+#s; *EA = Rtt;</pre> | | memd(Rs+Ru<<#u2)=Rtt | EA=Rs+(Ru<<#u);<br>*EA = Rtt; | | memd(Ru<<#u2+#U6)=Rtt | <pre>apply_extension(#U); EA=#U+(Ru&lt;&lt;#u); *EA = Rtt;</pre> | | memd(Rx++#s4:3)=Rtt | EA=Rx;<br>Rx=Rx+#s;<br>*EA = Rtt; | | memd(Rx++#s4:3:circ(Mu)) =Rtt | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Rtt;</pre> | | <pre>memd(Rx++I:circ(Mu))=Rtt</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;3,MuV); *EA = Rtt;</pre> | | memd(Rx++Mu)=Rtt | EA=Rx;<br>Rx=Rx+MuV;<br>*EA = Rtt; | | memd(Rx++Mu:brev)=Rtt | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Rtt;</pre> | | memd(gp+#u16:3)=Rtt | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; *EA = Rtt;</pre> | # Class: ST (slots 0,1) #### **Intrinsics** ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|----|----|------|------|--------|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------------------------| | | ICL | ASS | 3 | | | | | • | | | | | s5 | | | Par | se | | | | u5 | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | - | - | t | t | t | t | t | memd(Rs+Ru<<#u2)=Rtt | | | ICL | ASS | 3 | | | | | | Туре | æ | | | | | | Par | se | | | | t5 | | | | | | | | | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 1 | 1 | 0 | i | i | i | i | i | Р | Р | · | t | t | t | t | t | i | i | i | i | i | i | i | i | memd(gp+#u16:3)=Rtt | | | ICL | ASS | 3 | Α | mod | le | | Туре | Э | U | | | s5 | | | Par | se | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memd(Rs+#s11:3)=Rtt | | | ICL | ASS | 3 | Α | mod | de | | Туре | е | ZC | | | х5 | | | Pai | se | u1 | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | | | - | | - | 1 | - | memd(Rx++I:circ(Mu))=Rtt | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memd(Rx++#s4:3:circ(Mu))<br>=Rtt | | | ICL | ASS | 3 | Α | mod | de | | Туре | е | ZC | | | e5 | | | Pai | se | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | е | е | е | е | е | Р | Р | 0 | t | t | t | t | t | 1 | | I | I | I | ı | I | I | memd(Re=#U6)=Rtt | | | ICL | ASS | 3 | А | mod | le | | Туре | Э | ZC | | | x5 | | | Par | se | | | | t5 | | | | • | | | • | • | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | X | x | X | X | x | Р | Р | 0 | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memd(Rx++#s4:3)=Rtt | | | ICL | ASS | 3 | Α | mod | le | | Туре | Э | U<br>N | | | u5 | • | | Par | se | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | u | u | u | u | u | Р | Р | - | t | t | t | t | t | 1 | i | I | I | I | I | I | I | memd(Ru<<#u2+#U6)=Rtt | | | ICL | ASS | 3 | Α | mod | le | | Туре | Э | U<br>N | | | x5 | | | Par | se | u1 | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | - | - | memd(Rx++Mu)=Rtt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | - | - | memd(Rx++Mu:brev)=Rtt | Description | ICLASS | Instruction class | |--------|-------------------| | Туре | Type | | TAPC | Турс | |-------|----------------------------| | Parse | Packet/loop parse bits | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | u5 | Field to encode register u | | x5 | Field to encode register x | | Amode | Amode | | | | Unsigned Field name UN ### Store-release doubleword Store a 64-bit register pair in memory at the effective address. The store-release memory operation is observed after all preceding memory operations have been observed at the local point of serialization. A different order might be observed at the global point of serialization, see Memory ordering. When the :st (same thread) option is specified, the preceding memory operations are those that precede this instruction in program order. When the :at (all threads) option is specified, the preceding memory operations are those that were committed on any thread before this instruction was committed. The store release address is limited to certain memory regions. The following memory regions are excluded: - AHB memory space - AXI M2 memory space - Hexagon memory cut-out (with the exception of addressable TCM and VTCM memory) - Memory with the CCCC types 2, 3, or 4 | Syntax | Behavior | |--------------------|---------------------| | memd_rl(Rs):at=Rtt | EA=Rs;<br>*EA = Rtt | | memd_rl(Rs):st=Rtt | EA=Rs;<br>*EA = Rtt | Class: ST (slots 0) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|--------------------| | | ICL | ASS | 3 | А | mod | de | - | Туре | 9 | U | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | d | 2 | | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | - | - | 0 | 0 | 1 | 0 | d | d | memd_rl(Rs):at=Rtt | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | - | - | 1 | 0 | 1 | 0 | d | d | memd_rl(Rs):st=Rtt | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d2 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | Amode | Amode | | | Type | Туре | | | UN | Unsigned | | # Store doubleword conditionally Store a 64-bit register pair in memory at the effective address. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. Syntax Behavior if ([!]Pv[.new]) apply extension(#u); ``` if ([!]Pv[.new]) apply extension(#u); memd(#u6)=Rtt EA=#u; if ([!]Pv[.new][0]) { *EA = Rtt; } else { NOP; if ([!]Pv[.new]) apply extension(#u); memd(Rs+#u6:3)=Rtt EA=Rs+#u; if ([!]Pv[.new][0]) { *EA = Rtt; } else { NOP; if ([!]Pv[.new]) EA=Rs+(Ru<<\#u); memd(Rs+Ru<<\#u2)=Rtt if ([!]Pv[.new][0]) { *EA = Rtt; } else { NOP; if ([!]Pv[.new]) EA=Rx; memd(Rx++#s4:3)=Rtt if ([!]Pv[.new][0]){ Rx=Rx+#s; *EA = Rtt; } else { NOP; ``` Class: ST (slots 0,1) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|---------------|---------------------|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | u5 | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv)<br>memd(Rs+Ru<<#u2)=Rtt | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (!Pv)<br>memd(Rs+Ru<<#u2)=Rtt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv.new)<br>memd(Rs+Ru<<#u2)=Rtt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (!Pv.new)<br>memd(Rs+Ru<<#u2)=Rtt | | | ICL | ASS | 3 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | | Тур | е | | s5 | | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------------------------| | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv)<br>memd(Rs+#u6:3)=Rtt | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv.new)<br>memd(Rs+#u6:3)=Rtt | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv)<br>memd(Rs+#u6:3)=Rtt | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | s | s | ø | s | s | Р | P | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memd(Rs+#u6:3)=Rtt | | | ICL | ASS | 3 | Α | mod | de | | Туре | Э | U<br>N | | | х5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 0 | v | v | if (Pv)<br>memd(Rx++#s4:3)=Rtt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 1 | v | v | if (!Pv)<br>memd(Rx++#s4:3)=Rtt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memd(Rx++#s4:3)=Rtt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memd(Rx++#s4:3)=Rtt | | | ICL | ASS | 3 | Α | mod | de | - | Туре | Э | U | | • | | | • | Pa | rse | | | | t5 | | | | • | • | | | • | • | • | | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | - | - | - | i | i | Р | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv) memd(#u6)=Rtt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | - | - | • | i | i | Р | Ρ | 0 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv) memd(#u6)=Rtt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | - | - | - | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new) memd(#u6)=Rtt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | - | - | - | i | i | Р | Ρ | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new) memd(#u6)=Rtt | #### Field name Description ICLASS Instruction class Type Type PredNew PredNew Sense Sense Parse Packet/loop parse bits s5 Field to encode register s t5 Field to encode register t u5 Field to encode register u v2 Field to encode register v x5 Field to encode register x Amode Amode Unsigned # Store byte Store the least-significant byte in a source register at the effective address. | Behavior | |----------| | | | memb(Re=#U6)=Rt | <pre>apply_extension(#U);<br/>EA=#U;<br/>*EA = Rt.b[0];<br/>Re=#U;</pre> | |------------------------------------|---------------------------------------------------------------------------------------| | memb(Rs+#s11:0)=Rt | <pre>apply_extension(#s); EA=Rs+#s; *EA = Rt.b[0];</pre> | | memb(Rs+#u6:0)=#S8 | <pre>EA=Rs+#u; apply_extension(#S); *EA = #S;</pre> | | memb(Rs+Ru<<#u2)=Rt | EA=Rs+(Ru<<#u);<br>*EA = Rt.b[0]; | | memb(Ru<<#u2+#U6)=Rt | <pre>apply_extension(#U); EA=#U+(Ru&lt;&lt;#u); *EA = Rt.b[0];</pre> | | memb(Rx++#s4:0)=Rt | EA=Rx;<br>Rx=Rx+#s;<br>*EA = Rt.b[0]; | | memb(Rx++#s4:0:circ(Mu))=Rt | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Rt.b[0];</pre> | | <pre>memb(Rx++I:circ(Mu))=Rt</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;0,MuV); *EA = Rt.b[0];</pre> | | memb(Rx++Mu)=Rt | EA=Rx;<br>Rx=Rx+MuV;<br>*EA = Rt.b[0]; | | memb(Rx++Mu:brev)=Rt | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Rt.b[0];</pre> | | memb(gp+#u16:0)=Rt | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+ #u; *EA = Rt.b[0];</pre> | # Class: ST (slots 0,1) ## Intrinsics | memb(Rx++#s4:0:circ(Mu))=Rt | <pre>void Q6_memb_IMR_circ(void** StartAddress, Word32 Is4_0, Word32 Mu, Word32 Rt, void* BaseAddress)</pre> | |--------------------------------------|--------------------------------------------------------------------------------------------------------------| | <pre>memb (Rx++I:circ(Mu)) =Rt</pre> | <pre>void Q6_memb_MR_circ(void** StartAddress, Word32 Mu, Word32 Rt, void* BaseAddress)</pre> | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|------|--------|----|----|----|-----|----|---------|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------------------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Par | se | | | | u5 | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Ρ | i | u | u | u | u | u | i | - | - | t | t | t | t | t | memb(Rs+Ru<<#u2)=Rt | | | ICL | ASS | 3 | | • | • | • | • | | | | • | s5 | Par | se | | | | • | | | | | | | | | | • | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 0 | - | • | 0 | 0 | s | s | ø | s | s | Р | Р | I | ï | ï | i | i | i | i | I | I | I | I | I | I | I | memb(Rs+#u6:0)=#S8 | | | ICLA | ASS | 3 | | | | | | Туре | æ | | | | | | Par | se | | | | t5 | | | | | | | | | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 0 | 0 | 0 | i | i | i | i | i | Р | Р | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memb(gp+#u16:0)=Rt | | | ICLA | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | s5 | | | Par | se | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Ρ | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memb(Rs+#s11:0)=Rt | | | ICL/ | ASS | 3 | Α | mod | de | , | Туре | е | UN | | | х5 | | | Par | se | u1 | | | t5 | | | | | | | | | | • | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | x | х | x | х | х | Р | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | 1 | - | memb(Rx++I:circ(Mu))=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memb(Rx++#s4:0:circ(Mu))<br>=Rt | | | ICL/ | ASS | 3 | Α | mod | de | | Тур | е | UN | | | e5 | | | Par | se | | | | t5 | | | | | | | | | | • | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | е | е | е | е | е | Р | Ρ | 0 | t | t | t | t | t | 1 | - | I | I | ı | ı | ı | I | memb(Re=#U6)=Rt | | | ICL/ | ASS | 3 | Α | mod | de | , | Туре | е | UN | | | x5 | | | Par | se | | | | t5 | • | | | | | | | • | • | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Ρ | 0 | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memb(Rx++#s4:0)=Rt | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | ZC | | | u5 | • | | Par | se | | | | t5 | • | | | | • | | • | • | • | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | u | u | u | u | u | Р | Р | i | t | t | t | t | t | 1 | i | I | ı | ı | ı | I | I | memb(Ru<<#u2+#U6)=Rt | | | ICL | ASS | 3 | Α | mod | de | | Тур | е | UN | | | x5 | | | Parse u | | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | - | | - | - | - | - | - | memb(Rx++Mu)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | P | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | - | - | memb(Rx++Mu:brev)=Rt | | Field name | Description | |------------|-------------| | ICLASS | Instruction class | |--------|----------------------------| | Туре | Type | | Parse | Packet/loop parse bits | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | u5 | Field to encode register u | | x5 | Field to encode register x | | Amode | Amode | | UN | Unsigned | | | | # Store byte conditionally Store the least-significant byte in a source register at the effective address. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. Syntax Behavior ``` if ([!]Pv[.new]) apply extension(#u); memb(#u6)=Rt EA=#u; if ([!]Pv[.new][0]) { *EA = Rt.b[0]; } else { NOP; EA=Rs+#u; if ([!]Pv[.new]) memb (Rs+#u6:0) = #S6 if ([!]Pv[.new][0]){ apply extension(#S); *EA = #S; } else { NOP; if ([!]Pv[.new]) apply extension(#u); memb(Rs+#u6:0)=Rt EA=Rs+#u; if ([!]Pv[.new][0]) { *EA = Rt.b[0]; } else { NOP; if ([!]Pv[.new]) EA=Rs+(Ru<<\#u); memb(Rs+Ru<<\#u2)=Rt if ([!]Pv[.new][0]) { *EA = Rt.b[0]; } else { NOP; EA=Rx; if ([!]Pv[.new]) memb(Rx++#s4:0)=Rt if ([!]Pv[.new][0]){ Rx=Rx+#s; *EA = Rt.b[0]; } else { NOP; ``` Class: ST (slots 0,1) ### **Encoding** | 31 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------| | IC | CLA | SS | ; | | | | | | | | | | s5 | | | Pa | rse | | | | u5 | | | | | | | | t5 | | | | | 0 ( | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv)<br>memb(Rs+Ru<<#u2)=Rt | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----------|----------|----|------|------|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------------------------| | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (!Pv)<br>memb(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv.new)<br>memb(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (!Pv.new)<br>memb(Rs+Ru<<#u2)=Rt | | | ICL/ | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | I | i | i | i | i | i | i | v | v | I | I | ı | ı | I | if (Pv)<br>memb(Rs+#u6:0)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | ø | s | s | Р | Р | - | · | i | i | - | i | - | v | v | ı | ı | ı | I | - | if (!Pv)<br>memb(Rs+#u6:0)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | s | s | ø | s | s | Р | Р | - | ï | i | i | ï | i | i | v | v | ı | ı | ı | I | - | if (Pv.new)<br>memb(Rs+#u6:0)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | ı | i | i | i | i | i | i | v | ٧ | ı | ı | ı | ı | ı | if (!Pv.new)<br>memb(Rs+#u6:0)=#S6 | | | ICLA | ASS | 3 | | Se<br>ns | Pr<br>ed | | | Туре | 9 | | | s5 | | • | Pa | rse | | | | t5 | | | | | | | • | • | • | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv) memb(Rs+#u6:0)=Rt | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv.new)<br>memb(Rs+#u6:0)=Rt | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv)<br>memb(Rs+#u6:0)=Rt | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memb(Rs+#u6:0)=Rt | | | ICL | ASS | 3 | Α | mod | de | | Туре | е | U | | | х5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 0 | v | > | if (Pv)<br>memb(Rx++#s4:0)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 1 | v | > | if (!Pv)<br>memb(Rx++#s4:0)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memb(Rx++#s4:0)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memb(Rx++#s4:0)=Rt | | | ICL/ | ASS | 3 | Α | mod | de | | Туре | Э | U<br>N | | | | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | - | - | - | i | i | P | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv) memb(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | - | - | - | i | i | Р | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv) memb(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | - | - | - | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new) memb(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | - | - | • | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | ٧ | if (!Pv.new) memb(#u6)=Rt | #### Field name #### Description ICLASS Instruction class Type Type PredNew PredNew Sense Sense Parse Packet/loop parse bits s5 Field to encode register s t5 Field to encode register t u5 Field to encode register u v2 Field to encode register v x5 Field to encode register x Amode Amode UN Unsigned # Store halfword Store the upper or lower 16 bits of a source register at the effective address. | Syntax | Behavior | |--------|----------| |--------|----------| | memh (Re=#U6) =Rt.H | <pre>apply_extension(#U); EA=#U; *EA = Rt.h[1]; Re=#U;</pre> | |---------------------------------------------|------------------------------------------------------------------------------| | memh (Re=#U6)=Rt | <pre>apply_extension(#U);<br/>EA=#U;<br/>*EA = Rt.h[0];<br/>Re=#U;</pre> | | memh (Rs+#s11:1) =Rt.H | <pre>apply_extension(#s); EA=Rs+#s; *EA = Rt.h[1];</pre> | | memh(Rs+#s11:1)=Rt | <pre>apply_extension(#s); EA=Rs+#s; *EA = Rt.h[0];</pre> | | memh(Rs+#u6:1)=#S8 | <pre>EA=Rs+#u; apply_extension(#S); *EA = #S;</pre> | | memh(Rs+Ru<<#u2)=Rt.H | EA=Rs+(Ru<<#u);<br>*EA = Rt.h[1]; | | memh (Rs+Ru<<#u2)=Rt | EA=Rs+(Ru<<#u);<br>*EA = Rt.h[0]; | | memh(Ru<<#u2+#U6)=Rt.H | <pre>apply_extension(#U);<br/>EA=#U+(Ru&lt;&lt;#u);<br/>*EA = Rt.h[1];</pre> | | memh (Ru<<#u2+#U6)=Rt | <pre>apply_extension(#U);<br/>EA=#U+(Ru&lt;&lt;#u);<br/>*EA = Rt.h[0];</pre> | | memh (Rx++#s4:1) =Rt.H | EA=Rx;<br>Rx=Rx+#s;<br>*EA = Rt.h[1]; | | memh (Rx++#s4:1) =Rt | EA=Rx;<br>Rx=Rx+#s;<br>*EA = Rt.h[0]; | | <pre>memh (Rx++#s4:1:circ(Mu)) =Rt. H</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Rt.h[1];</pre> | | memh(Rx++#s4:1:circ(Mu))=Rt | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Rt.h[0];</pre> | | <pre>memh(Rx++I:circ(Mu))=Rt.H</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;1,MuV); *EA = Rt.h[1];</pre> | Syntay | Syntax | Behavior | |--------------------------------------|----------------------------------------------------------------------------------------------| | <pre>memh (Rx++I:circ(Mu)) =Rt</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;1,MuV); *EA = Rt.h[0];</pre> | | memh (Rx++Mu) =Rt.H | EA=Rx;<br>Rx=Rx+MuV;<br>*EA = Rt.h[1]; | | memh (Rx++Mu) =Rt | EA=Rx;<br>Rx=Rx+MuV;<br>*EA = Rt.h[0]; | | memh(Rx++Mu:brev)=Rt.H | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Rt.h[1];</pre> | | memh(Rx++Mu:brev)=Rt | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Rt.h[0];</pre> | | memh(gp+#u16:1)=Rt.H | <pre>apply_extension(#u);<br/>EA=(Constant_extended ? (0) : GP)+#u;<br/>*EA = Rt.h[1];</pre> | | memh(gp+#u16:1)=Rt | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; *EA = Rt.h[0];</pre> | Robavior ### Class: ST (slots 0,1) #### **Intrinsics** ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|----|----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------------------| | | ICL | ASS | ; | | | | | | | | | | s5 | | | Pa | rse | | | | u5 | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | P | i | u | u | u | u | u | i | | - | t | t | t | t | t | memh(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | - | - | t | t | t | t | t | memh(Rs+Ru<<#u2)=Rt.H | | | ICL | ASS | | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 0 | 1 | 1 | 1 | 1 | 0 | - | - | 0 | 1 | s | s | s | s | s | Р | Р | I | i | i | i | i | i | i | I | I | ı | I | ı | I | ı | memh(Rs+#u6:1)=#S8 | | | ICL | ASS | ; | | | | • | - | Туре | е | | • | | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-----------------------------------| | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 0 | 1 | 0 | i | i | i | i | i | Р | Р | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memh(gp+#u16:1)=Rt | | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 0 | 1 | 1 | i | i | i | i | i | Р | Р | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memh(gp+#u16:1)=Rt.H | | | ICL | ASS | ; | Α | mod | le | | Туре | е | UN | | | s5 | | | Pa | rse | | | | t5 | | | | | | | • | | • | • | | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memh(Rs+#s11:1)=Rt | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 0 | 1 | 1 | s | s | s | s | s | P | Р | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memh(Rs+#s11:1)=Rt.H | | | ICL | ASS | ; | Α | mod | le | | Тур | е | U<br>N | | | x5 | | | Pa | rse | u1 | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | X | X | x | x | x | P | Р | u | t | t | t | t | t | 0 | • | | - | - | - | 1 | - | memh(Rx++I:circ(Mu))=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | x | x | x | x | x | P | P | u | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memh(Rx++#s4:1:circ(Mu))<br>=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | x | x | x | x | x | P | P | u | t | t | t | t | t | 0 | - | - | - | - | - | 1 | - | memh(Rx++I:circ(Mu))=Rt.<br>H | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | x | x | x | x | x | P | P | u | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memh(Rx++#s4:1:circ(Mu))<br>=Rt.H | | | ICL | ASS | ; | Α | mod | le | | Тур | е | U<br>N | | | e5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | е | е | е | е | е | Р | Р | 0 | t | t | t | t | t | 1 | - | I | ı | I | I | ı | I | memh(Re=#U6)=Rt | | | ICL | ASS | ; | Α | mod | le | • | Тур | е | UN | | | х5 | | | Pa | rse | | | | t5 | | | | | | | • | | • | • | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 0 | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memh(Rx++#s4:1)=Rt | | | ICL | ASS | ; | Α | mod | le | | Тур | е | U<br>N | | | e5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | е | е | е | е | е | Р | Р | 0 | t | t | t | t | t | 1 | - | I | I | ı | I | I | I | memh(Re=#U6)=Rt.H | | | ICL | ASS | ; | Α | mod | le | | Тур | е | U<br>N | | | х5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | P | Р | 0 | t | t | t | t | t | 0 | i | i | i | i | - | 0 | - | memh(Rx++#s4:1)=Rt.H | | | ICL | ASS | ; | Α | mod | le | • | Тур | е | ZC | • | | u5 | | | Pa | rse | | | | t5 | | | | • | • | • | • | | • | • | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | u | u | u | u | u | Р | Р | i | t | t | t | t | t | 1 | i | I | I | ı | I | I | I | memh(Ru<<#u2+#U6)=Rt | | | ICL | ASS | ; | Α | mod | le | | Тур | е | U<br>N | | | х5 | | | Pa | rse | u1 | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | - | - | memh(Rx++Mu)=Rt | | | ICL | ASS | ; | Α | mod | le | - | Туре | е | U<br>N | | | u5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | u | u | u | u | u | P | P | - | t | t | t | t | t | 1 | i | ı | ı | ı | ı | ı | ı | memh(Ru<<#u2+#U6)=Rt.<br>H | | | ICL | ASS | ; | Α | mod | le | | Тур | е | ZC | | | х5 | | | Pa | rse | u1 | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | | | memh(Rx++Mu)=Rt.H | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | P | P | u | t | t | t | t | t | 0 | - | - | - | - | - | - | - | memh(Rx++Mu:brev)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | X | X | x | x | x | P | Р | u | t | t | t | t | t | 0 | | | | - | - | - | - | memh(Rx++Mu:brev)=Rt.H | ### Field name ## Description | ICLASS | Instruction class | |--------|----------------------------| | Type | Туре | | Parse | Packet/loop parse bits | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | u5 | Field to encode register u | | x5 | Field to encode register x | | Amode | Amode | UN Unsigned ## Store halfword conditionally Store the upper or lower 16 bits of a source register in memory at the effective address. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. Syntax Behavior ``` if ([!]Pv[.new]) memh(#u6) = apply extension(#u); Rt.H EA=#u; if ([!]Pv[.new][0]) { *EA = Rt.h[1]; } else { NOP; if ([!]Pv[.new]) memh(#u6) = apply extension(#u); if ([!]Pv[.new][0]) { *EA = Rt.h[0]; } else { NOP; if ([!]Pv[.new]) memh(Rs + EA=Rs+#u; \#u6:1) = \#S6 if ([!]Pv[.new][0]){ apply_extension(#S); *EA = #S; } else { NOP; apply extension(#u); if ([!]Pv[.new]) memh (Rs + \#u6:1) = Rt.H EA=Rs+#u; if ([!]Pv[.new][0]) { *EA = Rt.h[1]; } else { NOP; if ([!]Pv[.new]) apply extension(#u); memh(Rs+#u6:1)=Rt EA=Rs+#u; if ([!]Pv[.new][0]) { *EA = Rt.h[0]; } else { NOP; if ([!]Pv[.new]) EA=Rs+(Ru<<\#u); memh(Rs+Ru<<\#u2)=Rt.H if ([!]Pv[.new][0]) { *EA = Rt.h[1]; } else { NOP; ``` #### **Syntax Behavior** if ([!]Pv[.new]) EA=Rs+(Ru<<#u);memh(Rs+Ru<<#u2)=Rtif ([!]Pv[.new][0]) { \*EA = Rt.h[0];} else { NOP; if ([!]Pv[.new]) EA=Rx; memh(Rx++#s4:1)=Rt.Hif ([!]Pv[.new][0]){ Rx=Rx+#s;\*EA = Rt.h[1];} else { NOP; EA=Rx; if ([!]Pv[.new]) memh(Rx++#s4:1)=Rtif ([!]Pv[.new][0]){ Rx=Rx+#s;\*EA = Rt.h[0];} else { NOP; Class: ST (slots 0,1) ## **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|---------------|---------------------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------------------------------------| | | ICL | ASS | 3 | | | | | | | | | | s5 | | | Pa | rse | | | | u5 | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv)<br>memh(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | ٧ | t | t | t | t | t | if (Pv)<br>memh(Rs+Ru<<#u2)=Rt.H | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (!Pv)<br>memh(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | u | u | u | u | u | · | v | v | t | t | t | t | t | if (!Pv)<br>memh(Rs+Ru<<#u2)=Rt.H | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv.new)<br>memh(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | ٧ | t | t | t | t | t | if (Pv.new)<br>memh(Rs+Ru<<#u2)=Rt.H | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | ٧ | t | t | t | t | t | if (!Pv.new)<br>memh(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | ٧ | t | t | t | t | t | if (!Pv.new)<br>memh(Rs+Ru<<#u2)=Rt.H | | | ICL | ASS | 3 | | | | | | | | | | s5 | | • | Pa | rse | | • | • | | | • | | • | | | • | • | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | i | i | i | i | i | i | v | ٧ | ı | ı | ı | ı | ı | if (Pv)<br>memh(Rs+#u6:1)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | · | i | · | v | v | - | ı | ı | I | I | if (!Pv)<br>memh(Rs+#u6:1)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | ï | i | i | v | v | - | ı | ı | ı | ı | if (Pv.new)<br>memh(Rs+#u6:1)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | ı | i | i | i | i | i | i | v | v | ı | ı | ı | ı | ı | if (!Pv.new)<br>memh(Rs+#u6:1)=#S6 | | | ICL | ASS | 6 | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | | Туре | Э | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | P | ï | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv) memh(Rs+#u6:1)=Rt | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|----|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|--------------------------------------| | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv)<br>memh(Rs+#u6:1)=Rt.H | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | ٧ | if (Pv.new)<br>memh(Rs+#u6:1)=Rt | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv.new)<br>memh(Rs+#u6:1)=Rt.H | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv)<br>memh(Rs+#u6:1)=Rt | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | s | s | ø | s | ø | Р | Р | i | t | t | t | t | t | ï | i | i | i | i | 0 | v | > | if (!Pv)<br>memh(Rs+#u6:1)=Rt.H | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memh(Rs+#u6:1)=Rt | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memh(Rs+#u6:1)=Rt.H | | | ICL. | ASS | 3 | Α | mod | de | - | Туре | 9 | U<br>N | | | х5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 0 | v | v | if (Pv)<br>memh(Rx++#s4:1)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 1 | v | v | if (!Pv)<br>memh(Rx++#s4:1)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memh(Rx++#s4:1)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | ٧ | if (!Pv.new)<br>memh(Rx++#s4:1)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 0 | v | ٧ | if (Pv)<br>memh(Rx++#s4:1)=Rt.H | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 1 | v | > | if (!Pv)<br>memh(Rx++#s4:1)=Rt.H | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | > | if (Pv.new)<br>memh(Rx++#s4:1)=Rt.H | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | > | if (!Pv.new)<br>memh(Rx++#s4:1)=Rt.H | | | ICL. | ASS | 3 | Α | mod | le | - | Туре | Э | U<br>N | | | | | | Ра | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | - | - | - | i | i | Р | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv) memh(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | - | - | - | i | i | Р | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv) memh(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | - | - | - | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new) memh(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | - | - | - | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new) memh(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | - | - | - | i | i | Р | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv) memh(#u6)=Rt.H | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | - | - | • | i | i | Р | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | ٧ | if (!Pv) memh(#u6)=Rt.H | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | - | - | • | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new)<br>memh(#u6)=Rt.H | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | - | - | - | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | v | if (!Pv.new)<br>memh(#u6)=Rt.H | #### Field name ### Description ICLASS Instruction class Type Type PredNew PredNew Sense Sense Parse Packet/loop parse bits 55 Field to encode register s t5 Field to encode register t u5 Field to encode register u v2 Field to encode register v x5 Field to encode register x Amode Amode UN Unsigned ### Release The release memory operation is observed after all preceding memory operations have been observed at the local point of serialization. A different order might be observed at the global point of serialization. (see Ordering and Synchronization). No data is modified by this instruction. When the :st (same thread) option is specified, the preceding memory operations are those that precede this instruction in program order. When the :at (all threads) option is specified, the preceding memory operations are those that were committed on any thread before this instruction was committed. The Store release address is limited to certain memory regions. The following are excluded memory regions: - AHB memory space - AXI M2 memory space - Hexagon memory cut-out is excluded (with the exception of addressable TCM and VTCM memory), - Memory with the CCCC types 2, 3, or 4 | Syntax | Behavior | |----------------|--------------------| | release(Rs):at | EA=Rs;<br>*EA = Rs | | release(Rs):st | EA=Rs;<br>*EA = Rs | Class: ST (slots 0) #### **Encoding** | ; | 31 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-------|------|----|----|-----|----|----|------|----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|----------------| | | ICI | LAS: | S | Α | mod | de | | Туре | Э | UN | | | s5 | | | Pai | rse | | | | t5 | | | | | | | | | d | 2 | | | Ī | 1 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | - | - | 0 | 0 | 1 | 1 | d | d | release(Rs):at | | Γ | 1 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | ø | s | s | s | Р | Р | 0 | t | t | t | t | t | - | - | 1 | 0 | 1 | 1 | d | d | release(Rs):st | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d2 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | Amode | Amode | | | Type | Туре | | | UN | Unsigned | | # Store word Store a 32-bit register in memory at the effective address. | Syntax | Behavior | |--------|----------| |--------|----------| | memw(Re=#U6)=Rt | <pre>apply_extension(#U); EA=#U; *EA = Rt; Re=#U;</pre> | |------------------------------------|---------------------------------------------------------------------------------| | memw(Rs+#s11:2)=Rt | <pre>apply_extension(#s); EA=Rs+#s; *EA = Rt;</pre> | | memw (Rs+#u6:2) = #S8 | <pre>EA=Rs+#u; apply_extension(#S); *EA = #S;</pre> | | memw (Rs+Ru<<#u2)=Rt | EA=Rs+(Ru<<#u);<br>*EA = Rt; | | memw(Ru<<#u2+#U6)=Rt | <pre>apply_extension(#U);<br/>EA=#U+(Ru&lt;&lt;#u);<br/>*EA = Rt;</pre> | | memw(Rx++#s4:2)=Rt | EA=Rx;<br>Rx=Rx+#s;<br>*EA = Rt; | | memw(Rx++#s4:2:circ(Mu))=Rt | <pre>EA=Rx; Rx=Rx=circ_add(Rx, #s, MuV); *EA = Rt;</pre> | | <pre>memw(Rx++I:circ(Mu))=Rt</pre> | <pre>EA=Rx; Rx=Rx=circ_add(Rx,I&lt;&lt;2,MuV); *EA = Rt;</pre> | | memw(Rx++Mu)=Rt | EA=Rx;<br>Rx=Rx+MuV;<br>*EA = Rt; | | memw(Rx++Mu:brev)=Rt | <pre>EA=Rx.h[1] brev(Rx.h[0]); Rx=Rx+MuV; *EA = Rt;</pre> | | memw(gp+#u16:2)=Rt | <pre>apply_extension(#u); EA=(Constant_extended ? (0) : GP)+#u; *EA = Rt;</pre> | # Class: ST (slots 0,1) ### **Intrinsics** | memw(Rx++#s4:2:circ(Mu))=Rt | <pre>void Q6_memw_IMR_circ(void** StartAddress, Word32 Is4_2, Word32 Mu, Word32 Rt, void* BaseAddress)</pre> | |--------------------------------------|--------------------------------------------------------------------------------------------------------------| | <pre>memw (Rx++I:circ(Mu)) =Rt</pre> | <pre>void Q6_memw_MR_circ(void** StartAddress, Word32 Mu, Word32 Rt, void* BaseAddress)</pre> | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|------|--------|----|----|----|----|----|-----|----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------------------------------| | | ICL/ | ASS | 3 | | | | | | | | | | s5 | | | Par | se | | | | u5 | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Ρ | i | u | u | u | u | u | i | - | - | t | t | t | t | t | memw(Rs+Ru<<#u2)=Rt | | | ICL/ | ٩SS | 3 | | • | | • | • | | | | • | s5 | | | Par | se | | | | | | | | | | | | | • | | | | 0 | 0 | 1 | 1 | 1 | 1 | 0 | - | • | 1 | 0 | s | s | ø | s | s | Р | Р | I | ï | ï | i | i | i | i | I | I | I | I | I | ı | I | memw(Rs+#u6:2)=#S8 | | | ICLA | ASS | ; | | | | | | Туре | œ. | | | | | | Par | se | | | | t5 | | | | | | | | | | | | | 0 | 1 | 0 | 0 | 1 | i | i | 0 | 1 | 0 | 0 | i | i | i | i | i | Р | Р | i | t | t | t | t | t | i | i | i | i | i | i | i | i | memw(gp+#u16:2)=Rt | | | ICLA | ASS | 3 | Α | mod | de | | Тур | е | U<br>N | | | s5 | | | Par | se | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 0 | i | i | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | į | i | i | i | i | i | memw(Rs+#s11:2)=Rt | | | ICLA | ASS | 3 | Α | mod | de | | Туре | е | ZC | | | х5 | | | Par | se | u1 | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | x | х | x | x | х | Р | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | 1 | - | memw(Rx++I:circ(Mu))=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | i | i | i | i | - | 0 | | memw(Rx++#s4:2:circ(Mu)<br>)=Rt | | | ICL/ | ASS | 3 | Α | mod | de | | Тур | е | UN | | | e5 | | | Par | se | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | е | е | е | е | е | Р | Ρ | 0 | t | t | t | t | t | 1 | - | I | I | ı | ı | ı | I | memw(Re=#U6)=Rt | | | ICL/ | ASS | 3 | Α | mod | de | , | Тур | е | UN | | | x5 | | | Par | se | | | | t5 | | | | | | | | • | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Ρ | 0 | t | t | t | t | t | 0 | i | i | i | i | - | 0 | | memw(Rx++#s4:2)=Rt | | | ICL/ | ASS | 3 | Α | mod | de | | Тур | е | ZC | | • | u5 | | | Par | se | | | | t5 | • | • | | • | • | | • | • | | • | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | u | u | u | u | u | Р | Р | i | t | t | t | t | t | 1 | i | I | ı | ı | ı | ı | I | memw(Ru<<#u2+#U6)=Rt | | | ICL/ | ASS | 3 | Α | mod | de | | Тур | e | UN | | | x5 | | | Par | se | u1 | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | u | t | t | t | t | t | 0 | | | - | | - | - | | memw(Rx++Mu)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | P | Р | u | t | t | t | t | t | 0 | - | - | - | - | - | - | - | memw(Rx++Mu:brev)=Rt | | Field name | Description | |------------|-------------| | ICLASS | Instruction class | |--------|----------------------------| | Туре | Type | | Parse | Packet/loop parse bits | | e5 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | | u1 | Field to encode register u | | u5 | Field to encode register u | | x5 | Field to encode register x | | Amode | Amode | | UN | Unsigned | | | | ### Store-release word Store a 32-bit register in memory at the effective address. The store-release memory operation is observed after all preceding memory operations have been observed at the local point of serialization. A different order might be observed at the global point of serialization. (see Ordering and Synchronization). When the :st (same thread) option is specified, the preceding memory operations are those that precede this instruction in program order. When the :at (all threads) option is specified, the preceding memory operations are those that were committed on any thread before this instruction was committed. The Store release address is limited to certain memory regions. The following are excluded memory regions: - AHB memory space - AXI M2 memory space - Hexagon memory cut-out is excluded (with the exception of addressable TCM and VTCM memory) - Memory with the CCCC types 2, 3, or 4 | Syntax | Behavior | |--------|----------| | | | | memw_rl(Rs):at=Rt | EA=Rs;<br>*EA = Rt | |-------------------|--------------------| | memw_rl(Rs):st=Rt | EA=Rs;<br>*EA = Rt | Class: ST (slots 0) ### **Encoding** d2 s5 t5 | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|----|-------------------| | | ICL | ASS | 3 | Α | mod | de | - | Туре | 9 | U | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | d | 12 | | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | | 0 | 0 | 1 | 0 | d | d | memw_rl(Rs):at=Rt | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | - | t | t | t | t | t | - | - | 1 | 0 | 1 | 0 | d | d | memw_rl(Rs):st=Rt | | Field name | | Description | |------------|------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | Field to encode register d Field to encode register s Field to encode register t Amode Amode Type Type UN Unsigned ## Store word conditionally Store a 32-bit register in memory at the effective address. This instruction is conditional based on a predicate value. When the predicate is true, perform the instruction, otherwise it is treated as a NOP. Syntax Behavior ``` if ([!]Pv[.new]) apply extension(#u); memw(#u6)=Rt EA=#u; if ([!]Pv[.new][0]) { *EA = Rt; } else { NOP; EA=Rs+#u; if ([!]Pv[.new]) memw(Rs+#u6:2)=#S6 if ([!]Pv[.new][0]){ apply extension(#S); *EA = #S; } else { NOP; if ([!]Pv[.new]) apply extension(#u); memw(Rs+#u6:2)=Rt EA=Rs+#u; if ([!]Pv[.new][0]) { *EA = Rt; } else { NOP; if ([!]Pv[.new]) EA=Rs+(Ru<<\#u); memw(Rs+Ru<<\#u2)=Rt if ([!]Pv[.new][0]) { *EA = Rt; } else { NOP; EA=Rx; if ([!]Pv[.new]) memw(Rx++#s4:2)=Rt if ([!]Pv[.new][0]){ Rx=Rx+#s; *EA = Rt; } else { NOP; ``` Class: ST (slots 0,1) ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |-----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------------------| | - 1 | CLA | ASS | ; | | | | | | | | | | s5 | | | Pai | rse | | | | u5 | | | | | | | | t5 | | | | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv)<br>memw(Rs+Ru<<#u2)=Rt | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|---------------|---------------------|----|------|------|--------|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------------------------| | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (!Pv)<br>memw(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (Pv.new)<br>memw(Rs+Ru<<#u2)=Rt | | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | u | u | u | u | u | i | v | v | t | t | t | t | t | if (!Pv.new)<br>memw(Rs+Ru<<#u2)=Rt | | | ICL | ASS | ; | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | ı | i | i | i | i | i | i | v | v | ı | ı | ı | ı | ı | if (Pv)<br>memw(Rs+#u6:2)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | ø | s | Р | Р | ı | · | - | i | - | i | - | v | v | ı | ı | I | I | - | if (!Pv)<br>memw(Rs+#u6:2)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | s | s | s | ø | s | Р | Р | ı | ï | ï | i | ï | i | i | v | v | ı | ı | ı | I | - | if (Pv.new)<br>memw(Rs+#u6:2)=#S6 | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | s | s | s | ø | s | Р | Р | ı | ï | · | i | · | i | · | v | ٧ | ı | ı | I | I | - | if (!Pv.new)<br>memw(Rs+#u6:2)=#S6 | | | ICLA | ASS | ; | | Se<br>ns<br>e | Pr<br>ed<br>Ne<br>w | | - | Туре | € | | | s5 | | | Ра | rse | | | | t5 | | | | | | | | | | | | | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (Pv)<br>memw(Rs+#u6:2)=Rt | | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | ٧ | if (Pv.new)<br>memw(Rs+#u6:2)=Rt | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | s | s | s | ø | s | Р | Р | i | t | t | t | t | t | ï | i | i | i | i | 0 | v | > | if (!Pv)<br>memw(Rs+#u6:2)=Rt | | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | i | i | 0 | v | v | if (!Pv.new)<br>memw(Rs+#u6:2)=Rt | | | ICL | ASS | ; | Α | mod | de | | Туре | Э | U<br>N | | | x5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 0 | v | v | if (Pv)<br>memw(Rx++#s4:2)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 0 | i | i | i | i | 1 | v | v | if (!Pv)<br>memw(Rx++#s4:2)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | ٧ | if (Pv.new)<br>memw(Rx++#s4:2)=Rt | | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | x | x | x | x | x | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | > | if (!Pv.new)<br>memw(Rx++#s4:2)=Rt | | | ICL | ASS | ; | Α | mod | le | | Туре | 9 | U | | | | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | - | - | - | - | i | P | P | 0 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | ٧ | if (Pv) memw(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | - | - | - | - | i | Р | Р | 0 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | > | if (!Pv) memw(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | - | - | - | i | i | Р | Р | 1 | t | t | t | t | t | 1 | i | i | i | i | 0 | v | v | if (Pv.new) memw(#u6)=Rt | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | - | - | - | i | i | Р | P | 1 | t | t | t | t | t | 1 | i | i | i | i | 1 | v | ٧ | if (!Pv.new) memw(#u6)=Rt | #### Field name ### Description ICLASS Instruction class Type Type PredNew PredNew Sense Sense Parse Packet/loop parse bits s5 Field to encode register s t5 Field to encode register t u5 Field to encode register u v2 Field to encode register v x5 Field to encode register x Amode Amode UN Unsigned ### Allocate stack frame Allocate a stack frame on the call stack. This instruction first pushes LR and FP to the top of stack. It then subtracts an unsigned immediate from SP to allocate room for local variables. FP is set to the address of the old frame pointer on the stack. The following figure shows the stack layout. #### Stack in memory #### Syntax Behavior # Class: ST (slots 0) # **Encoding** | | 31 3 | 0 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|----|----|-----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------------| | | IC | LAS | S | , | Amc | de | | Тур | е | U | | | х5 | | | Pa | rse | | | | | | | | | | | | | | | | | Ī | 1 ( | ) 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | х | x | х | х | х | Р | Р | 0 | 0 | 0 | i | i | i | i | i | i | i | i | i | i | i | allocframe(Rx,#u11:3):raw | Field name Description ICLASS Instruction class Parse Packet/loop parse bits x5 Field to encode register x Amode Amode Type Type UN Unsigned # **11.9 SYSTEM** The SYSTEM instruction class includes instructions to manage system resources. ## 11.9.1 SYSTEM USER The SYSTEM USER instruction subclass includes instructions that allow user access to system resources. ### Load locked This memory lock instruction performs a word or double-word locked load. This instruction returns the contents of the memory at address Rs and also reserves a lock reservation at that address. See Atomic operations. | Syntax | Behavior | |---------------------|----------------------| | Rd=memw_locked(Rs) | EA=Rs;<br>Rd = *EA; | | Rdd=memd_locked(Rs) | EA=Rs;<br>Rdd = *EA; | Class: SYSTEM (slots 0) #### **Notes** Group this instruction only with ALU32 or non-floating-point XTYPE instructions. ### **Encoding** s5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|---------------------| | | ICL. | ASS | ; | Α | mod | de | - | Туре | Э | U | | | s5 | | | Pa | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 0 | 0 | | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=memw_locked(Rs) | | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 1 | 0 | | - | - | 0 | 0 | 0 | d | d | d | d | d | Rdd=memd_locked(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Amode | Amode | | | Type | Туре | | | UN | Unsigned | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | Field to encode register s ### Store conditional This memory lock instruction performs a word or double-word conditional store operation. If this thread holds the address reservation and there are no intervening accesses to the memory location, the store is performed and the predicate is set to true. Otherwise, the store is not performed and the predicate returns false. See Atomic operations. #### **Syntax Behavior** memd locked(Rs,Pd)=Rtt EA=Rs; if (lock\_valid) { \*EA = Rtt;Pd = 0xff;lock valid = 0; } else { Pd = 0;memw locked(Rs,Pd)=Rt EA=Rs; if (lock valid) { \*EA = Rt;Pd = 0xff;lock\_valid = 0; } else { Pd = 0; Class: SYSTEM (slots 0) #### **Notes** - Group this instruction only with ALU32 or non-floating-point XTYPE instructions. - The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. ### **Encoding** Field name | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|----|------------------------| | Ī | ICI | ASS | 3 | Α | mod | de | - | Туре | 9 | U | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | d | 12 | | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | ø | s | Р | Р | - | t | t | t | t | t | - | | - | - | 0 | 0 | d | d | memw_locked(Rs,Pd)=Rt | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | - | - | - | - | 0 | 0 | d | d | memd_locked(Rs,Pd)=Rtt | | | ICL | ASS | 6 | Δ | mo | de | | Тур | е | U<br>N | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | d | 2 | | |---|-----|-----|---|---|----|----|---|-----|---|--------|---|---|----|---|---|----|-----|---|---|---|----|---|---|---|---|---|---|---|---|---|---|------------------------| | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | P | | t | t | t | t | t | - | - | - | - | 0 | 0 | d | d | memw_locked(Rs,Pd)=Rt | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | - | - | - | - | 0 | 0 | d | d | memd_locked(Rs,Pd)=Rtt | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Amode | Amode | | Туре | Туре | | UN | Unsigned | ### Zero a cache line The dczeroa instruction clears 32 bytes of memory. If the memory is marked write-back cacheable, a cache line is allocated in the data cache and 32 bytes are cleared. If the memory is write-through or write-back, 32 bytes of zeros are sent to memory. This instruction is useful to efficiently handle write-only data by preallocating lines in the cache. The address must be 32-byte aligned. If not, an unaligned error exception is raised. If this instruction appears in a packet, slot 1 must be A-type or empty. | Syntax | Behavior | |-------------|----------------------------------| | dczeroa(Rs) | EA=Rs; | | | <pre>dcache_zero_addr(EA);</pre> | ### Class: SYSTEM (slots 0) #### **Notes** A packet containing this instruction must have slot 1 either empty or executing an ALU32 instruction. #### **Intrinsics** dczeroa(Rs) void Q6\_dczeroa\_A(Address a) ### **Encoding** | | 31 3 | 0 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|----|----|----|------|----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------| | | IC | LASS | 3 | Δ | mo | de | | Туре | Э | UN | | | s5 | | | Pai | rse | | | | | | | | | | | | | | | | | Ī | 1 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | - | - | - | - | - | - | - | - | - | - | - | - | - | dczeroa(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | | Amode | Amode | | | Type | Type | | | UN | Unsigned | | # Memory barrier The barrier instruction establishes a memory barrier to ensure proper ordering between accesses before the barrier instruction and accesses after the barrier instruction. Accesses before the barrier are globally observable before any access after the barrier can be observed. The use of this instruction is system-dependent. | Syntax | Behavior | |---------|-----------------| | barrier | memory_barrier; | Class: SYSTEM (slots 0) ### **Notes** ■ This is a solo instruction. It must not be grouped with other instructions in a packet. # **Encoding** UN | 3 | 1 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------| | | IC | LA | SS | | А | mod | le | - | Туре | е | UN | | | | | | Pa | rse | | | | | | | | | | | | | | | | | 1 | C | ) | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | - | - | - | - | - | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | - | - | - | - | - | barrier | | Field name | | Description | |------------|------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | Amode | Amode | | | Type | Type | | Unsigned # Breakpoint The brkpt operation causes the program to enter Debug mode if enabled by ISDB. Execution control hands to ISDB and the program does not proceed until directed by the debugger. If ISDB is disabled, this instruction is treated as a NOP. | Syntax | Behavior | |--------|-------------------| | brkpt | Enter Debug mode; | Class: SYSTEM (slot 3) #### **Notes** ■ This is a solo instruction. It must not be grouped with other instructions in a packet. ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------| | | ICI | ASS | S | | sm | | | | | | | | | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | - | - | | - | - | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | - | - | - | - | - | brkpt | | Field name | | Description | |------------|------------------------|-------------| | sm | Supervisor mode only | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | # Data cache prefetch The defetch instruction prefetches the data at address Rs + unsigned immediate. This instruction is a hint to the memory system, and is handled in an implementation-dependent manner. | Syn | tax | Behavior | |-----|-----|----------| | | | | | dcfetch(Rs) | Assembler mapped to: "dcfetch(Rs+#0)" | |-------------------------------|---------------------------------------| | <pre>dcfetch(Rs+#u11:3)</pre> | EA=Rs+#u;<br>dcache_fetch(EA); | ## Class: SYSTEM (slots 0) ## **Intrinsics** dcfetch(Rs) void Q6\_dcfetch\_A(Address a) # **Encoding** | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|--------------------| | | | ICL | ASS | 3 | Α | mod | de | | Туре | е | UN | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | Ī | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | - | - | i | i | i | i | i | i | i | i | i | i | i | dcfetch(Rs+#u11:3) | | Field name | Description | |------------|-------------| |------------|-------------| ICLASS Instruction class Amode Amode Amode Amode Type Type UN Unsigned Parse Packet/loop parse bits s5 Field to encode register s # Data cache maintenance user operations Perform maintenance operations on the data cache. The dccleaninva instruction looks up the data cache at address Rs. If this address is in the cache and has dirty data, the data is flushed out to memory and the line is invalidated. The dccleana instruction looks up the data cache at address Rs. If this address is in the cache and has dirty data, the data is flushed out to memory. The dcinva instruction looks up the data cache at address Rs. If this address is in the cache, the line containing the data is invalidated. If an instruction appears in a packet, slot 1 must be A-type or empty. In implementations that support L2 cache, these instructions operate on both L1 data and L2 caches. | Syntax | Behavior | |-----------------|---------------------------------------------| | dccleana (Rs) | <pre>EA=Rs; dcache_clean_addr(EA);</pre> | | dccleaninva(Rs) | <pre>EA=Rs; dcache_cleaninv_addr(EA);</pre> | | dcinva(Rs) | <pre>EA=Rs; dcache_cleaninv_addr(EA);</pre> | # Class: SYSTEM (slots 0) #### **Notes** A packet that contains this instruction must have slot 1 either empty or executing an ALU32 instruction. #### **Intrinsics** | dccleana(Rs) | <pre>void Q6_dccleana_A(Address a)</pre> | |-----------------|---------------------------------------------| | dccleaninva(Rs) | <pre>void Q6_dccleaninva_A(Address a)</pre> | | dcinva(Rs) | <pre>void Q6_dcinva_A(Address a)</pre> | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-----------------| | | ICL | ASS | 3 | Α | mod | de | | Туре | Э | U | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | dccleana(Rs) | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | dcinva(Rs) | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | dccleaninva(Rs) | Field name Description ICLASSInstruction classParsePacket/loop parse bitss5Field to encode register s Amode Amode Type Type UN Unsigned # Send value to DIAG trace These instructions send the sources to the external DIAG trace. | Syntax | Behavior | |--------|----------| |--------|----------| | diag(Rs) | | |----------------|--| | diag0(Rss,Rtt) | | | diag1(Rss,Rtt) | | Class: SYSTEM (slot 3) # **Encoding** sm | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|----------------| | | ICL | ASS | ć | | sm | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | P | | | - | | | - | 0 | 0 | 1 | - | - | - | - | - | diag(Rs) | | | ICL | ASS | 3 | | sm | | | • | • | | | | s5 | • | • | Pa | rse | | | • | t5 | | | | | • | | • | • | | | | | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | - | - | - | - | - | diag0(Rss,Rtt) | | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | | - | - | - | - | diag1(Rss,Rtt) | ## Field name Description Supervisor mode only ICLASS Instruction class Parse Packet/loop parse bits 5 Field to encode register s Field to encode register t # Instruction cache maintenance user operations The icinva instruction looks up the address specified in Rs in the instruction cache. If a translation for Rs cannot be found, a TLB-miss-X exception with cause code 0x62 is indicated. If a translation is found but the user does not have proper permissions to the page to invalidate, the instruction converts to a NOP. If a translation is found and the user has proper permissions, all ways of all sets matching Rs[11:5] in any instruction cache are invalidated. | Syntax | Behavior | |------------|----------------------------------------| | icinva(Rs) | <pre>EA=Rs; icache_inv_addr(EA);</pre> | Class: SYSTEM (slot 2) #### **Notes** This is a solo instruction. It must not be grouped with other instructions in a packet. ## **Encoding** | 3 | 1 3 | 80 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|------------| | Ī | IC | CLA | SS | ; | | | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | ( | ) 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | 0 | 0 | - | - | - | - | - | - | - | - | - | - | - | icinva(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | # Instruction synchronization The isync instruction ensures that previous instructions have committed before continuing to the next instruction. Execute this instruction after the following events (when subsequent instructions must observe the results of the event): - After modifying the TLB with a TLBW instruction - After modifying the SSR register - After modifying the SYSCFG register - After any instruction cache maintenance operation - After modifying the TID register | Syntax | Behavior | |--------|------------------------------| | isync | <pre>instruction_sync;</pre> | Class: SYSTEM (slot 2) #### **Notes** ■ This is a solo instruction. It must not be grouped with other instructions in a packet. # **Encoding** | ; | 31 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------| | | IC | LA | SS | | | | | | | | | | | | | | Pa | rse | | | | | | | | | | | | | | | | | Ī | 0 . | 1 ( | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | Р | Р | 0 | - | - | - | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | isync | | Field name | | Description | |------------|------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | # L2 cache prefetch The L2fetch instruction initiates background prefetching into the L2 cache. Rs specifies the 32-bit virtual start address. There are two forms of this instruction. In the first form, the dimensions of the area to prefetch are encoded in source register Rt as follows: - Rt[15:8] = width of a fetch block in bytes. - Rt[7:0] = height: the number of width-sized blocks to fetch. - Rt[31:16] = stride: an unsigned byte offset that increments the pointer after each width-sized block is fetched. In the second form, the operands are encoded in register pair Rtt as follows: - Rtt[31:16] = width of a fetch block in bytes. - Rtt[15:0] = height: number of width-sized blocks to fetch. - Rtt[47:32] = stride: an unsigned byte offset, increments the pointer after each width-sized block is fetched. - Rtt[48] = direction. If clear, perform the prefetches in row major form meaning fetch all cache lines in a row before proceeding to the next row. If the bit is set, perform prefetch in column major form meaning all cache lines in a column are fetched before proceeding to the next column. The following figure shows two examples of the L2fetch instruction. In the box prefetch, a 2D range of memory is defined within a larger frame. The second example shows prefetch for a large linear area of memory, which has size Lines × 128. L2fetch is nonblocking. After the instruction initiates, the program continues on to the next instruction while the prefetching is performed in the background. L2fetch can bring in either code or data to the L2 cache. If the lines of interest are already in the L2, no action is performed. If the lines are missing from the L2\$, the hardware attempts to fetch them from the system memory. The hardware prefetch engine continues to request lines in the programmed memory range. The prefetching hardware makes a best-effort to prefetch the requested data, and attempts to perform prefetching at a lower priority than demand fetches. This prevents prefetch from adding traffic while the system is under heavy load. If a program initiates a new L2fetch while an older L2fetch operation still pends, the new request is queued, up to three deep. If three L2fetches already pend, the oldest request is dropped. During the time a L2 prefetch is active for a thread, the USR:PFA status bit is set to indicate that prefetches are in progress. The programmer can use this bit to decide whether to start a new L2fetch before the previous one completes. Executing an L2fetch with any subfield programmed as zero cancels pending prefetches by the calling thread. The implementation is free to drop prefetches when needed. | Syntax | Behavior | |-----------------|-------------------| | 12fetch(Rs,Rt) | 12fetch(Rs,INFO); | | 12fetch(Rs,Rtt) | 12fetch(Rs,INFO); | Class: SYSTEM (slots 0) #### **Notes** Group this instruction only with ALU32 or non-floating-point XTYPE instructions. #### **Intrinsics** | 12fetch(Rs,Rt) | <pre>void Q6_12fetch_AR(Address a, Word32 Rt)</pre> | |-----------------|------------------------------------------------------| | 12fetch(Rs,Rtt) | <pre>void Q6_12fetch_AP(Address a, Word64 Rtt)</pre> | #### **Encoding** UN | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-----------------| | | ICL | ASS | 3 | Α | mod | de | - | Туре | e | U | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 0 | 0 | - | - | - | | | I2fetch(Rs,Rt) | | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | - | - | - | - | - | I2fetch(Rs,Rtt) | | 0 | • | А | mod | je | | туре | <b>e</b> | Ν | | | SO | | | Ра | rse | | | | เอ | | | | | | | | | | |---|---|-----|-----|----|----|------|----------|---|---|---|----|---|---|----|-----|---|---|---|-----|-----|------|----|---|---|--|---|---|-----------------| | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | 0 | | - | - | I2fetch(Rs,Rt) | | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | | - | | | - | - | I2fetch(Rs,Rtt) | | | | Fie | eld | na | me | ) | | | | | | | | | | | | D | )es | cri | ptic | on | | | | | | | Instruction class ICLASS Packet/loop parse bits Parse Field to encode register s s5 t5 Field to encode register t Amode Amode Type Type Unsigned ## Pause The pause instruction pauses execution for a specified period of time. During the pause duration, the program enters a low-power state and does not fetch and execute instructions. The instruction provides a short immediate, which indicates the pause duration. The program pauses for at most the number of cycles specified in the immediate plus 8. The minimum pause is 0 cycles, and the maximum pause is implementation defined. An interrupt to the program exits the paused state. System events such as hardware or DMA completion can trigger exits from Pause mode. An implementation is free to pause for durations shorter than (immediate + 8), but not longer. This instruction is useful for implementing user-level low-power synchronization operations, such as spin locks or wait-for-event signaling. | Syntax | Behavior | |------------|----------------------| | pause(#u8) | Pause for #u cycles; | Class: SYSTEM (slot 2) #### **Notes** ■ This is a solo instruction. It must not be grouped with other instructions in a packet. ## **Encoding** | 3 | 1 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|------------| | | IC | LA | SS | | | | | | | | | | | | | | Pa | rse | | | | | | | | | | | | | | | | | ( | ) 1 | ı | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | - | - | - | - | - | - | Р | Р | - | i | i | i | i | i | - | - | - | i | i | i | - | - | pause(#u8) | | Field name | | Description | |------------|------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | # Memory thread synchronization The syncht instruction synchronizes memory. All outstanding memory operations, including cached and uncached loads and stores, complete before the processor continues to the next instruction. This ensures that certain memory operations are performed in the desired order (for example, when accessing I/O devices). After performing a syncht operation, the processor ceases fetching and executing instructions from the program until all outstanding memory operations of that program complete. In multi-threaded or multi-core environments, syncht is not concerned with other execution contexts. The use of this instruction is system-dependent. | Syntax | Behavior | |---------------|---------------| | Rd = dmsyncht | Rd = DM0; | | syncht | memory_synch; | ## Class: SYSTEM (slots 0) #### **Notes** - This is a solo instruction. It must not be grouped with other instructions in a packet. - This is a monitor-level feature. If performed in User or Guest mode, a privilege error exception occurs. #### **Intrinsics** Rd=dmsyncht () Word32 Q6\_R\_dmsyncht() ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|----|----|------|----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------| | | ICL | ASS | 3 | Α | mod | de | | Туре | 9 | U | | | | | | Pai | rse | | | | | | | | | | | | d5 | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | - | - | - | - | - | Р | Р | - | - | - | - | - | 0 | 1 | 1 | 1 | d | d | d | d | d | Rd=dmsyncht | | | ICL | ASS | 3 | Α | mod | de | | Туре | Э | U | | | | | | Pai | rse | | | | | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | - | - | - | - | - | Р | Р | - | - | - | - | - | - | - | - | - | - | - | - | - | - | syncht | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | Amode Amode Type Type UN Unsigned # Send value to ETM trace The trace instruction takes the value of register Rs and emits it to the ETM trace. The ETM block must be enabled, and the thread must have permissions to perform tracing. The contents of Rs are user-defined. | Syntax | Behavior | |-----------|--------------------------| | trace(Rs) | Send value to ETM trace; | Class: SYSTEM (slot 3) #### **Notes** ■ Group this instruction only with ALU32 or non-floating-point XTYPE instructions. ## **Encoding** | : | 1 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-----------| | Ī | IC | LAS | SS | | | sm | | | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | | | | | | | ) 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | - | - | - | - | - | trace(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | sm | Supervisor mode only | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | # Trap The trap instruction causes a precise exception. Executing a trap instruction sets the EX bit in SSR to 1, which disables interrupts and enables Supervisor mode. The program then jumps to the vector location (either trap0 or trap1). The instruction specifies an 8-bit immediate field. This field is copied into the system status register cause field. Upon returning from the service routine with a RTE, execution resumes at the packet after the trap instruction. These instructions are for user code to request services from the operating system. Two trap instructions are provided so the OS can optimize for fast service routines and slower service routines. | Syntax | Behavior | |---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | trap0(#u8) | SSR.CAUSE = #u;<br>TRAP "0"; | | trap1(#u8) | Assembler mapped to: "trap1(R0, #u8)" | | trap1(Rx,#u8) | <pre>if (!can_handle_trap1_virtinsn(#u)) { SSR.CAUSE = #u; TRAP "1"; } else if (#u == 1) { VMRTE; } else if (#u == 3) { VMSETIE; } else if (#u == 4) { VMGETIE; } else if (#u == 6) { VMSPSWAP;</pre> | ## Class: SYSTEM (slot 2) #### **Notes** This is a solo instruction. It must not be grouped with other instructions in a packet. ## **Encoding** Field name | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------| | | ICL | ASS | 3 | | | | | | | | | | | | | Pa | rse | | | | | | | | | | | | | | | | | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | - | - | - | - | - | - | Р | Р | - | i | i | i | i | i | - | - | - | i | i | i | - | - | trap0(#u8) | | | ICL. | ASS | 3 | | | | | | | | | | х5 | | | Pa | rse | | • | | | | | • | • | • | | • | • | | • | | | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | - | х | x | х | x | x | Р | Р | | i | i | i | i | i | - | - | - | i | i | i | - | - | trap1(Rx,#u8) | | • | 0 | • | | ٠. | 0 | " | 0 | " | _ | _ | _ | _ | _ | _ | • | • | _ | ٠. | ٠. | ٠. | • | • | _ | _ | _ | • | • | | _ | 1 | trapo(#do) | |----|-----|---|---|----|---|---|---|---|---|---|---|----|---|---|----|-----|---|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------| | CL | ASS | | | | | | | | | | | х5 | | | Pa | rse | | | | | | | | | | | | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | | x | x | x | x | x | P | P | - | i | i | i | i | i | - | | - | i | i | i | - | - | trap1(Rx,#u8) | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | • | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Description ICLASS Instruction class Packet/loop parse bits Parse Field to encode register x x5 # 11.10 XTYPE The XTYPE instruction class includes instructions that perform most of the data processing done by the Hexagon processor. XTYPE instructions execute on slot 2 or slot 3. # 11.10.1 XTYPE ALU The XTYPE ALU instruction subclass includes instructions that perform arithmetic and logical operations. ## Absolute value doubleword Take the absolute value of the 64-bit source register and place it in the destination register. | Syntax | Behavior | |--------------------------|-------------------------------| | Rdd=abs(Rss) | Rdd = ABS(Rss); | | | | | Class: XTYPE (slots 2,3) | | | Lateta et e | | | Intrinsics | | | | | | Rdd=abs(Rss) | Word64 Q6_P_abs_P(Word64 Rss) | # **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|----|---|---|------|----|---|---|----|---|---|---|--------------| | | ICL | ASS | ı | Reg | Тур | е | Ma | ајОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | | 1 | 1 | 0 | d | d | d | d | d | Rdd=abs(Rss) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | ## Absolute value word Take the absolute value of the source register and place it in the destination register. The 32-bit absolute value is available with optional saturation. The single case of saturation is when the source register is equal to 0x8000\_0000, the destination saturates to 0x7fff\_ffff. | Syntax | Behavior | |------------------|--------------------------------------------| | Rd=abs(Rs)[:sat] | $Rd = [sat_{32}] (ABS(sxt_{32->64}(Rs)));$ | Class: XTYPE (slots 2,3) #### **Notes** ■ If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | Rd=abs(Rs) | Word32 Q6_R_abs_R(Word32 Rs) | |----------------|----------------------------------| | Rd=abs(Rs):sat | Word32 Q6_R_abs_R_sat(Word32 Rs) | # **Encoding** RegType | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|-----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Maj | jОр | | | | s5 | | | Pa | rse | | | | | | | Λ | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | | - | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rd=abs(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 1 | d | d | d | d | d | Rd=abs(Rs):sat | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | Register type # Add and accumulate Add Rs and Rt or a signed immediate, then add or subtract the resulting value. The result is saved in Rx. | Syntax | Behavior | |---------------------------|------------------------------------------------------| | Rd=add(Rs,add(Ru,#s6)) | <pre>Rd = Rs + Ru + apply_extension(#s);</pre> | | Rd=add(Rs, sub(#s6, Ru)) | <pre>Rd = Rs - Ru + apply_extension(#s);</pre> | | Rx+=add(Rs, #s8) | <pre>apply_extension(#s); Rx = Rx + Rs + #s;</pre> | | <pre>Rx+=add(Rs,Rt)</pre> | Rx = Rx + Rs + Rt; | | Rx-=add(Rs, #s8) | <pre>apply_extension(#s); Rx = Rx - (Rs + #s);</pre> | | Rx-=add(Rs,Rt) | Rx = Rx - (Rs + Rt); | # Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rd = add(Rs,add(Ru, #s6)) Rd = add(Rs,sub(#s6, Ru)) Rd = add(Rs,sub(#s6, Ru)) Rd = add(Rs,sub(#s6, Ru)) Rx += add(Rs, #s8) Rx += add(Rs, Rt) Rx -= add(Rs, #s8) Word32 Q6_R_addacc_RI(Word32 Rx, Word32 Rs, Word32 Is8) Rx -= add(Rs, Rt) Word32 Q6_R_addacc_RI(Word32 Rx, Word32 Rs, Word32 Rt) Rx -= add(Rs, #s8) Word32 Q6_R_addacc_RI(Word32 Rx, Word32 Rs, Word32 Is8) Rx -= add(Rs, Rt) Word32 Q6_R_addacc_RI(Word32 Rx, Word32 Rs, Word32 Is8) Rx -= add(Rs, Rt) Word32 Q6_R_addacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) ``` # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|------|------|---|---|----|----|---|---|------------------------| | | ICL. | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | d5 | | | | | | | | u5 | | | | | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | i | i | s | s | s | s | s | Р | Р | i | d | d | d | d | d | i | i | i | u | u | u | u | u | Rd=add(Rs,add(Ru,#s6)) | | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | i | i | s | s | s | s | s | Р | Р | i | d | d | d | d | d | i | i | i | u | u | u | u | u | Rd=add(Rs,sub(#s6,Ru)) | | | ICL. | ASS | 3 | F | Reg | Тур | е | N | 1ajC | )p | | • | s5 | • | | Pa | rse | | • | | • | | • | N | 1inC | р | | x5 | | | | | | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | х | х | х | х | х | Rx+=add(Rs,#s8) | | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | - | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | х | х | х | х | х | Rx-=add(Rs,#s8) | | | ICL. | ASS | 3 | F | Reg | Тур | е | N | 1ajC | )p | | • | s5 | • | | Pa | rse | | t5 | | | | N | 1inC | р | | | х5 | | • | | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | х | х | х | х | Rx+=add(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | x | x | х | х | Rx-=add(Rs,Rt) | | Field name | | Description | |------------|----------------------------|-------------| | RegType | Register type | | | ICLASS | Instruction class | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | u5 | Field to encode register u | | | x5 | Field to encode register x | | ## Add doublewords The first form of this instruction adds two 32-bit registers. If the result overflows 32 bits, saturate the result to 0x7FFF\_FFFF for a positive result, or 0x8000\_0000 for a negative result. The 32-bit nonsaturating register add is a ALU32-class instruction and can execute on any slot. The second instruction form sign-extends a 32-bit register Rt to 64 bits and performs a 64-bit add with Rss. The result is stored in Rdd. The third instruction form adds 64-bit registers Rss and Rtt and places the result in Rdd. The final instruction form adds two 64-bit registers Rss and Rtt. If the result overflows 64 bits, it is saturated to 0x7fff\_ffff\_ffff for a positive result, or 0x8000\_0000\_0000\_0000 for a negative result. | Syntax | Behavior | |-----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| | <pre>Rd=add(Rs,Rt):sat:deprecated</pre> | $Rd = sat_{32}(Rs + Rt);$ | | Rdd=add(Rs,Rtt) | <pre>if ("Rs &amp; 1") { Assembler mapped to: "Rdd = add(Rss, Rtt):raw:hi"; } else { Assembler mapped to: "Rdd = add(Rss, Rtt):raw:lo"; }</pre> | | Rdd=add(Rss,Rtt) | Rdd=Rss+Rtt; | | <pre>Rdd=add(Rss,Rtt):raw:hi</pre> | $Rdd=Rtt+sxt_{32->64}(Rss.w[1]);$ | | Rdd=add(Rss,Rtt):raw:lo | $Rdd=Rtt+sxt_{32->64}(Rss.w[0]);$ | | Rdd=add(Rss,Rtt):sat | <pre>Rdd=sat<sub>64</sub>(Rss+Rtt);</pre> | ## Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=add(Rs,Rtt) Word64 Q6_P_add_RP(Word32 Rs, Word64 Rtt) Rdd=add(Rss,Rtt) Word64 Q6_P_add_PP(Word64 Rss, Word64 Rtt) Rdd=add(Rss,Rtt):sat Word64 Q6_P_add_PP_sat(Word64 Rss, Word64 Rtt) ``` # **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=add(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=add(Rss,Rtt):sat | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=add(Rss,Rtt):raw:lo | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=add(Rss,Rtt):raw:hi | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | • | t | t | t | t | t | 0 | • | - | d | d | d | d | d | Rd=add(Rs,Rt):sat:depreca ted | # Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | ## Add halfword Perform a 16-bit add with optional saturation, and place the result in either the upper or lower half of a register. If the result goes in the upper half, the sources can be any high or low halfword of Rs and Rt. The lower 16 bits of the result are zeroed. If the result is to be placed in the lower 16 bits of Rd, the Rs source can be either high or low, but the other source must be the low halfword of Rt. In this case, the upper halfword of Rd is the sign-extension of the low halfword. ## Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rd=add(Rt.H,Rs.H): Word32 Q6_R_add_RhRh_s16(Word32 Rt, Word32 Rs) Rd=add(Rt.H,Rs.H):sat:<<16</td> Word32 Q6_R_add_RhRh_sat_s16(Word32 Rt, Word32 Rs) Rd=add(Rt.H,Rs.L):<<16</td> Word32 Q6_R_add_RhRl_s16(Word32 Rt, Word32 Rs) Rd=add(Rt.H,Rs.L):sat:<<16</td> Word32 Q6_R_add_RhRl_sat_s16(Word32 Rt, Word32 Rs) Rd=add(Rt.L,Rs.H) Word32 Q6_R_add_RlRh(Word32 Rt, Word32 Rs) ``` | Rd=add(Rt.L,Rs.H):<<16 | Word32 Q6_R_add_RlRh_s16(Word32 Rt, Word32 Rs) | |----------------------------|----------------------------------------------------| | Rd=add(Rt.L,Rs.H):sat | Word32 Q6_R_add_RlRh_sat(Word32 Rt, Word32 Rs) | | Rd=add(Rt.L,Rs.H):sat:<<16 | Word32 Q6_R_add_RlRh_sat_s16(Word32 Rt, Word32 Rs) | | Rd=add(Rt.L,Rs.L) | Word32 Q6_R_add_RlRl(Word32 Rt, Word32 Rs) | | Rd=add(Rt.L,Rs.L):<<16 | Word32 Q6_R_add_RlRl_s16(Word32 Rt, Word32 Rs) | | Rd=add(Rt.L,Rs.L):sat | Word32 Q6_R_add_RlRl_sat(Word32 Rt, Word32 Rs) | | Rd=add(Rt.L,Rs.L):sat:<<16 | Word32 Q6_R_add_RlRl_sat_s16(Word32 Rt, Word32 Rs) | # Encoding | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rd=add(Rt.L,Rs.L) | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rd=add(Rt.L,Rs.H) | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rd=add(Rt.L,Rs.L):sat | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rd=add(Rt.L,Rs.H):sat | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=add(Rt.L,Rs.L):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=add(Rt.L,Rs.H):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rd=add(Rt.H,Rs.L):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rd=add(Rt.H,Rs.H):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=add(Rt.L,Rs.L):sat:<<1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rd=add(Rt.L,Rs.H):sat:<<1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=add(Rt.H,Rs.L):sat:<<1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=add(Rt.H,Rs.H):sat:<<1 | | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | # Add or subtract doublewords with carry Add or subtract with carry. Predicate register Px is used as an extra input and output. For adds, the LSB of the predicate is added to the sum of the two input pairs. For subtracts, the predicate is considered a not-borrow. The LSB of the predicate is added to the first source register and the logical complement of the second argument. The carry-out from the sum is saved in predicate Px. These instructions allow efficient addition or subtraction of numbers larger than 64 bits. Syntax Rdd=add(Rss,Rtt,Px):carry Rdd=add(Rss,Rtt,Px):carry Rdd = Rss + Rtt + Px[0]; Px = carry\_from\_add(Rss,Rtt,Px[0]) ? 0xff : 0x00; Rdd=sub(Rss,Rtt,Px):carry Rdd=sub(Rss,Rtt,Px):carry PREDUSE\_TIMING; Rdd = Rss + ~Rtt + Px[0]; Px = carry\_from\_add(Rss,~Rtt,Px[0]) ? 0xff : 0x00; Class: XTYPE (slots 2,3) #### **Notes** The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. # **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-------------------------|----|----|----|----|----|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------------| | | ICL | CLASS RegType Maj s5 Pa | | | | | | rse | | | | t5 | | | | Х | 2 | | | d5 | | | | | | | | | | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | x | x | d | d | d | d | d | Rdd=add(Rss,Rtt,Px):carry | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | х | х | d | d | d | d | d | Rdd=sub(Rss,Rtt,Px):carry | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | - | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | x2 | Field to encode register x | | | Maj | Major opcode | | | Min | Minor opcode | | | RegType | Register type | | # Clip to unsigned Clip input to an unsigned integer. | avior | |-------| | i | Class: XTYPE (slots 2,3) #### **Notes** ■ This instruction can only execute on a core with the Hexagon audio extensions. ## **Intrinsics** Rd=clip(Rs, #u5) Word32 Q6\_R\_clip\_RI(Word32 Rs, Word32 Iu5) ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|--------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|----|---|---|-------|----|---|---|----|---|---|---|-----------------| | Ī | ICLASS | | | Reg | Тур | е | Ма | jОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | | - | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Ρ | 0 | i | i | i | i | i | 1 | 0 | 1 | d | d | d | d | d | Rd=clip(Rs,#u5) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register s MajOp Major opcode MinOp Minor opcode RegType Register type # Logical doublewords Perform bitwise logical AND, OR, XOR, and NOT operations. The source and destination registers are 64-bit. For 32-bit logical operations, see the ALU32 logical instructions. | Syntax | Behavior | |-------------------|------------------------------------| | Rdd=and(Rss,Rtt) | Rdd=Rss&Rtt | | Rdd=and(Rtt,~Rss) | <pre>Rdd = (Rtt &amp; ~Rss);</pre> | | Rdd=not(Rss) | Rdd=~Rss; | | Rdd=or(Rss,Rtt) | Rdd=Rss Rtt; | | Rdd=or(Rtt,~Rss) | <pre>Rdd = (Rtt ~Rss);</pre> | | Rdd=xor(Rss,Rtt) | Rdd=Rss^Rtt; | # Class: XTYPE (slots 2,3) ## **Intrinsics** | Rdd=and(Rss,Rtt) | Word64 | Q6_P_and_PP(Word64 Rss, Word64 Rtt) | |-------------------|--------|--------------------------------------| | Rdd=and(Rtt,~Rss) | Word64 | Q6_P_and_PnP(Word64 Rtt, Word64 Rss) | | Rdd=not(Rss) | Word64 | Q6_P_not_P(Word64 Rss) | | Rdd=or(Rss,Rtt) | Word64 | Q6_P_or_PP(Word64 Rss, Word64 Rtt) | | Rdd=or(Rtt,~Rss) | Word64 | Q6_P_or_PnP(Word64 Rtt, Word64 Rss) | | Rdd=xor(Rss,Rtt) | Word64 | Q6_P_xor_PP(Word64 Rss, Word64 Rtt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|----|----|---|---|-------------------| | | ICL | ASS | 3 | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | - | | - | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rdd=not(Rss) | | | ICL. | ASS | 6 | | Reg | Тур | е | | | | | • | s5 | • | | Pa | rse | | | • | t5 | | • | N | 1inO | )p | | d5 | | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=and(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=and(Rtt,~Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=or(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=or(Rtt,~Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=xor(Rss,Rtt) | | Field name | | Description | |------------|--|-------------| | _ | | | | RegType | Register type | |---------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | MajOp | Major opcode | | MinOp | Minor opcode | | | | # Logical-logical doublewords Perform a logical operation of the two source operands, then perform a second logical operation of the result with the destination register Rxx. The source and destination registers are 64-bit. | Syntax | | Behavior | | |--------------------|----------------|----------|--| | DyyA-yor (Dag D++) | DyyA-DagAD++ • | | | Rxx^=xor(Rss,Rtt) Rxx^=Rss^Rtt Class: XTYPE (slots 2,3) #### **Intrinsics** Rxx^=xor(Rss,Rtt) Word64 Q6\_P\_xorxacc\_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) ## **Encoding** RegType | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|-------------------| | | IC | CLA | SS | ; | | Reg | Тур | е | N | 1aj | | | | s5 | | | Pa | rse | | | | t5 | | | M | 1in | | | | х5 | | | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | - | s | s | Ø | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | х | x | x | x | x | Rxx^=xor(Rss,Rtt) | | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | | Maj | Major opcode | | Min | Minor opcode | Register type ## Description # Logical-logical words Perform a logical operation of the two source operands, then perform a second logical operation of the result with the destination register Rx. The source and destination registers are 32-bit. | Syntax | Behavior | |-------------------------------------|------------------------------------------------------| | <pre>Rx=or(Ru, and(Rx, #s10))</pre> | <pre>Rx = Ru (Rx &amp; apply_extension(#s));</pre> | | $Rx[\& ^]=and(Rs,Rt)$ | $Rx [ &^] = (Rs [ &^] Rt);$ | | $Rx[\& ^]=and(Rs, \sim Rt)$ | $Rx [ &^] = (Rs [ &^] ~Rt);$ | | <pre>Rx[&amp; ^]=or(Rs,Rt)</pre> | $Rx [ &^] = (Rs [ &^] Rt);$ | | <pre>Rx[&amp; ^]=xor(Rs,Rt)</pre> | <pre>Rx[ &amp;^]=Rs[ &amp;^]Rt;</pre> | | Rx = and (Rs, #s10) | <pre>Rx = Rx (Rs &amp; apply_extension(#s));</pre> | | Rx =or(Rs,#s10) | <pre>Rx = Rx (Rs apply_extension(#s));</pre> | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rx&=and(Rs,Rt) | ord32 Q6_R_andand_RR(Word32 Rx, Word32 | Rs, Word32 Rt) | |-------------------------------|-----------------------------------------|--------------------| | Rx&=and(Rs,~Rt) | ord32 Q6_R_andand_RnR(Word32 Rx, Word3 | 2 Rs, Word32 Rt) | | Rx&=or(Rs,Rt) | ord32 Q6_R_orand_RR(Word32 Rx, Word32 | Rs, Word32 Rt) | | Rx&=xor(Rs,Rt) | ord32 Q6_R_xorand_RR(Word32 Rx, Word32 | Rs, Word32 Rt) | | Rx = or(Ru, and(Rx, #s10)) | ord32 Q6_R_or_and_RRI(Word32 Ru, Word32 | 2 Rx, Word32 Is10) | | $Rx^=$ and $(Rs,Rt)$ | ord32 Q6_R_andxacc_RR(Word32 Rx, Word3 | 2 Rs, Word32 Rt) | | $Rx^=$ and $(Rs, \sim Rt)$ | ord32 Q6_R_andxacc_RnR(Word32 Rx, Word | 32 Rs, Word32 Rt) | | Rx^=or(Rs,Rt) | ord32 Q6_R_orxacc_RR(Word32 Rx, Word32 | Rs, Word32 Rt) | | <pre>Rx^=xor(Rs,Rt)</pre> | ord32 Q6_R_xorxacc_RR(Word32 Rx, Word3 | 2 Rs, Word32 Rt) | | $Rx \mid = and(Rs, #s10)$ | ord32 Q6_R_andor_RI(Word32 Rx, Word32 | Rs, Word32 Is10) | | $Rx \mid = and(Rs, Rt)$ | ord32 Q6_R_andor_RR(Word32 Rx, Word32 | Rs, Word32 Rt) | | $Rx \mid = and (Rs, \sim Rt)$ | ord32 Q6_R_andor_RnR(Word32 Rx, Word32 | Rs, Word32 Rt) | | Rx =or(Rs, #s10) | ord32 Q6_R_oror_RI(Word32 Rx, Word32 R | s, Word32 Is10) | | <pre>Rx =or(Rs,Rt)</pre> | ord32 Q6_R_oror_RR(Word32 Rx, Word32 R | s, Word32 Rt) | | <pre>Rx =xor(Rs,Rt)</pre> | ord32 Q6_R_xoror_RR(Word32 Rx, Word32 | Rs, Word32 Rt) | # **Encoding** | 3 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--| | | I | CLA | ٩SS | 3 | | Re | тур | е | | | | | | s5 | | | Pa | rse | | | | | | | | | | | | х5 | | | | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|------------------------| | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | х | x | х | х | x | Rx =and(Rs,#s10) | | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | х5 | | | Pa | rse | | • | • | | | • | | | | | | u5 | • | | | | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | i | х | x | x | x | x | Р | Р | i | i | i | i | i | i | i | i | i | u | u | u | u | u | Rx=or(Ru,and(Rx,#s10)) | | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | • | | | • | | • | | | х5 | | | | | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | i | s | s | s | s | s | Р | Р | i | i | i | i | i | i | i | i | i | х | х | х | х | х | Rx =or(Rs,#s10) | | | ICL | ASS | 3 | ı | Reg | Тур | е | ٨ | /lajC | )p | | | s5 | | | Pa | rse | | | • | t5 | | • | N | 1inC | )p | | | х5 | • | | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rx =and(Rs,~Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | х | х | х | x | Rx&=and(Rs,~Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | х | x | х | х | x | Rx^=and(Rs,~Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rx&=and(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rx&=or(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | х | x | х | х | x | Rx&=xor(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rx =and(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rx^=xor(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | х | х | х | х | x | Rx =or(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | х | х | х | x | Rx =xor(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | х | x | х | х | x | Rx^=and(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | X | Rx^=or(Rs,Rt) | #### Field name ## Description RegType Register type ICLASS Instruction class MajOp Major opcode MinOp Minor opcode Parse Packet/loop parse bits 55 Field to encode register s t5 Field to encode register t u5 Field to encode register u x5 Field to encode register x # Maximum words Select either the signed or unsigned maximum of two source registers and place in a destination register Rdd. | Rd=max(Rs,Rt) | Rd = max(Rs,Rt); | |----------------|------------------------------------------| | Rd=maxu(Rs,Rt) | <pre>Rd = max(Rs.uw[0], Rt.uw[0]);</pre> | Class: XTYPE (slots 2,3) ## **Intrinsics** | Rd=max(Rs,Rt) | Word32 Q6_R_max_RR(Word32 Rs, Word32 Rt) | |----------------|-------------------------------------------------------| | Rd=maxu(Rs,Rt) | <pre>UWord32 Q6_R_maxu_RR(Word32 Rs, Word32 Rt)</pre> | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------| | | ICL | ASS | ; | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | s | s | s | s | s | P | P | | t | t | t | t | t | 0 | - | | d | d | d | d | d | Rd=max(Rs,Rt) | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | - | - | d | d | d | d | d | Rd=maxu(Rs,Rt) | #### Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Maximum doublewords Select either the signed or unsigned maximum of two 64-bit source registers and place in a destination register. | Rdd=max(Rss,Rtt) | <pre>Rdd = max(Rss,Rtt);</pre> | |-------------------|----------------------------------------| | Rdd=maxu(Rss,Rtt) | <pre>Rdd = max(Rss.u64,Rtt.u64);</pre> | Class: XTYPE (slots 2,3) ## **Intrinsics** | Rdd=max(Rss,Rtt) | Word64 Q6_P_max_PP(Word64 Rss, Word64 Rtt) | |-------------------|----------------------------------------------| | Rdd=maxu(Rss,Rtt) | UWord64 Q6_P_maxu_PP(Word64 Rss, Word64 Rtt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------| | | ICL | ASS | ; | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | Λ | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=max(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=maxu(Rss,Rtt) | ## Field name Description | i ioia iiaiiio | | |----------------|----------------------------| | RegType | Register type | | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Minimum words Select either the signed or unsigned minimum of two source registers and place in destination register Rd. | Rd=min(Rt,Rs) | Rd = min(Rt,Rs); | |----------------|------------------------------------------| | Rd=minu(Rt,Rs) | <pre>Rd = min(Rt.uw[0], Rs.uw[0]);</pre> | Class: XTYPE (slots 2,3) ## **Intrinsics** | Rd=min(Rt,Rs) | Word32 Q6_R_min_RR(Word32 Rt, Word32 Rs) | |----------------|--------------------------------------------| | Rd=minu(Rt,Rs) | UWord32 Q6_R_minu_RR(Word32 Rt, Word32 Rs) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------| | | ICL. | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | ٨ | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | - | - | d | d | d | d | d | Rd=min(Rt,Rs) | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | - | - | d | d | d | d | d | Rd=minu(Rt,Rs) | #### Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Minimum doublewords Select either the signed or unsigned minimum of two 64-bit source registers and place in the destination register Rdd. | Syn | ıtax | Behavior | |-----|------|----------| | | | | | Rdd=min(Rtt,Rss) | <pre>Rdd = min(Rtt,Rss);</pre> | |-------------------|----------------------------------------| | Rdd=minu(Rtt,Rss) | <pre>Rdd = min(Rtt.u64,Rss.u64);</pre> | Class: XTYPE (slots 2,3) ## **Intrinsics** | Rdd=min(Rtt,Rss) | Word64 Q6_P_min_PP(Word64 Rtt, Word64 Rss) | |-------------------|----------------------------------------------| | Rdd=minu(Rtt,Rss) | UWord64 Q6_P_minu_PP(Word64 Rtt, Word64 Rss) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------| | | ICL | ASS | ; | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=min(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=minu(Rtt,Rss) | ## Field name Description | i ioia iiaiiio | | |----------------|----------------------------| | RegType | Register type | | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Modulo wrap Wrap the Rs value into the modulo range from 0 to Rt. If Rs is greater than or equal to Rt, wrap it to the bottom of the range by subtracting Rt. If Rs is less than zero, wrap it to the top of the range by adding Rt. Otherwise, when Rs fits within the range, no adjustment is necessary. The result is returned in register Rd. ### Syntax Behavior ``` Rd=modwrap(Rs,Rt) if (Rs < 0) { Rd = Rs + Rt.uw[0]; } else if (Rs.uw[0] >= Rt.uw[0]) { Rd = Rs - Rt.uw[0]; } else { Rd = Rs; } ``` Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rd=modwrap(Rs,Rt) Word32 Q6_R_modwrap_RR(Word32 Rs, Word32 Rt) ``` # **Encoding** | 3 | 1 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|---|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------| | | I | CLA | ASS | | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | • | ı | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=modwrap(Rs,Rt) | | Field name | | Description | |------------|---------------|-------------| | RegType | Register type | | | J 11 | 0 71 | |--------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Negate The first form of this instruction performs a negate on a 32-bit register with saturation. If the input is 0x80000000, the result is saturated to 0x7fffffff. The non-saturating 32-bit register negate is a ALU32-class instruction and can execute on any slot. The second form of this instruction negates a 64-bit source register and places the result in destination Rdd. | Syntax | Behavior | |----------------|-------------------------| | Rd=neg(Rs):sat | $Rd = sat_32(-Rs.s64);$ | | Rdd=neg(Rss) | Rdd = -Rss; | Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | Rd=neg(Rs):sat | Word32 Q6_R_neg_R_sat(Word32 Rs) | |----------------|----------------------------------| | Rdd=neg(Rss) | Word64 Q6_P_neg_P(Word64 Rss) | ## **Encoding** | 3 | 1 3 | 0 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------| | | IC | LAS | S | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | - | - | - | - | - | - | 1 | 0 | 1 | d | d | d | d | d | Rdd=neg(Rss) | | | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 1 | 0 | d | d | d | d | d | Rd=neg(Rs):sat | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type ## Round Perform either arithmetic (.5 is rounded up) or convergent (.5 is rounded towards even) rounding to any bit location. Arithmetic rounding has optional saturation. In this version, the result is saturated to a 32-bit number after adding the rounding constant. After the rounding and saturation have been performed, the final result is right shifted using a sign-extending shift. **Syntax Behavior** Rd=cround(Rs, #u5) Rd = (#u==0)?Rs:convround(Rs, 2\*\*(#u-1)) >> #u;Rd=cround(Rs,Rt) $Rd = (zxt_{5->32}(Rt) == 0) ?Rs: convround (Rs, 2** (zxt_{5->32}(Rt) = (zxt_{5->32}(R$ 1))>> $zxt_{5->32}(Rt)$ ; Rd=round(Rs, #u5)[:sat] $Rd = ([sat_{32}]((#u==0)?(Rs):round(Rs, 2**(#u-1))))>>#u;$ Rd=round(Rs,Rt)[:sat] $Rd = ([sat_{32}] ((zxt_{5->32}(Rt) == 0)?(Rs):round(Rs, 2**(zxt_{5-}))?(Rs):round(Rs, 2**(zxt_$ >32 (Rt) -1))))>>zxt<sub>5->32</sub> (Rt); Rd=round(Rss):sat $tmp=sat_{64}(Rss+0x080000000ULL);$ Rd = tmp.w[1];Rdd=cround(Rss, #u6) if (#u == 0) { Rdd = Rss;} else if ((Rss & (size8s t)((1LL << (#u - 1)) - 1LL))</pre> $src_{128} = sxt_{64->128}(Rss);$ rndbit 128 = $sxt_{64->128}$ (1LL); rndbit 128 = (rndbit 128 << #u);</pre> rndbit 128 = (rndbit 128 & src 128); rndbit 128 = (size8s t) (rndbit 128 >> 1); tmp128 = src 128+rndbit 128; $tmp128 = (size8s_t) (tmp128 >> #u);$ $Rdd = sxt_{128->64}(tmp128);$ } else { $size16s_t rndbit_128 = sxt_{64->128}((1LL << (#u - 1)));$ $size16s_t src_128 = sxt_{64->128}(Rss);$ size16s t tmp128 = src 128+rndbit 128; tmp128 = (size8s t) (tmp128 >> #u);Rdd = $sxt_{128->64}$ (tmp128); Syntax Behavior ``` Rdd=cround(Rss,Rt) if (zxt_{6->32}(Rt) == 0) { Rdd = Rss; } else if ((Rss & (size8s_t))((1LL << (zxt_{6->32}(Rt) - 1)) - 1LL)) == 0) { src_{128} = sxt_{64->128}(Rss); rndbit 128 = sxt_{64->128} (1LL); rndbit_128 = (rndbit_128 << zxt_{6->32}(Rt)); rndbit 128 = (rndbit 128 & src 128); rndbit_128 = (size8s_t) (rndbit_128 >> 1); tmp128 = src_128+rndbit_128; tmp128 = (size8s_t) (tmp128 >> zxt_{6->32}(Rt)); Rdd = sxt_{128->64}(tmp128); } else { size16s_t rndbit_128 = sxt_{64->128}((1LL << (zxt_{6-}))) >32 (Rt) - 1))); size16s_t src_128 = sxt_{64->128}(Rss); size16s t tmp128 = src 128+rndbit 128; tmp128 = (size8s t) (tmp128 >> zxt_{6->32}(Rt)); Rdd = sxt_{128->64} (tmp128); } ``` ## Class: XTYPE (slots 2,3) #### **Notes** - This instruction can only execute on a core with the Hexagon audio extensions - If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | Rd=cround(Rs, #u5) | Word32 Q6_R_cround_RI(Word32 Rs, Word32 Iu5) | |-----------------------|-------------------------------------------------| | Rd=cround(Rs,Rt) | Word32 Q6_R_cround_RR(Word32 Rs, Word32 Rt) | | Rd=round(Rs, #u5) | Word32 Q6_R_round_RI(Word32 Rs, Word32 Iu5) | | Rd=round(Rs, #u5):sat | Word32 Q6_R_round_RI_sat(Word32 Rs, Word32 Iu5) | | Rd=round(Rs,Rt) | Word32 Q6_R_round_RR(Word32 Rs, Word32 Rt) | | Rd=round(Rs,Rt):sat | Word32 Q6_R_round_RR_sat(Word32 Rs, Word32 Rt) | | Rd=round(Rss):sat | Word32 Q6_R_round_P_sat(Word64 Rss) | | Rdd=cround(Rss,#u6) | Word64 Q6_P_cround_PI(Word64 Rss, Word32 Iu6) | | Rdd=cround(Rss,Rt) | Word64 Q6_P_cround_PR(Word64 Rss, Word32 Rt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|------------------------|----|----|----|-------|----|----|----|----|----|----|-------|-------|----|----|----|----|----|-------|---|---|----|---|---|---|---|---|---|----------------------| | | ICL | ASS | 3 | RegType | | | Ма | МајОр | | s5 | | | | | Parse | | | | | | | MinOp | | | d5 | | | | | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | | - | - | | 0 | 0 | 1 | d | d | d | d | d | Rd=round(Rss):sat | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | - | d | d | d | d | d | Rd=cround(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 0 | - | d | d | d | d | d | Rd=round(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 1 | - | d | d | d | d | d | Rd=round(Rs,#u5):sat | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | - | d | d | d | d | d | Rdd=cround(Rss,#u6) | | | ICL | ASS | 3 | RegType MajOp s5 Parse | | | | | | t5 | | | | | | MinOp | | | d5 | | | | | | | | | | | | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rd=cround(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rdd=cround(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rd=round(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rd=round(Rs,Rt):sat | | Field name | Description | |----------------|-------------| | i iciu iiaiiic | Description | ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register st5Field to encode register t MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type ### Subtract doublewords Subtract the 64-bit register Rss from register Rtt. | Syntax | Behavior | |-----------------------------------------|---------------------------------| | <pre>Rd=sub(Rt,Rs):sat:deprecated</pre> | Rd=sat <sub>32</sub> (Rt - Rs); | | Rdd=sub(Rtt,Rss) | Rdd=Rtt-Rss; | Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** Rdd=sub(Rtt,Rss) Word64 Q6\_P\_sub\_PP(Word64 Rtt, Word64 Rss) ### **Encoding** | 3 | 1 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------------| | Ī | IC | LA | SS | | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | Λ | /linC | )p | | | d5 | | | | | | 1 | ( | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=sub(Rtt,Rss) | | | 1 | | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | - | - | d | d | d | d | d | Rd=sub(Rt,Rs):sat:depreca ted | | Field name | | Description | |------------|----------------------------|-------------| | RegType | Register type | | | MinOp | Minor opcode | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | ### Subtract and accumulate words Subtract Rs from Rt, then add the resulting value with Rx. The result is saved in Rx. | avior | |-------| | i | Rx += sub(Rt, Rs) Rx = Rx + Rt - Rs; Class: XTYPE (slots 2,3) ### **Intrinsics** Rx+=sub(Rt,Rs) Word32 Q6\_R\_subacc\_RR(Word32 Rx, Word32 Rt, Word32 Rs) ### **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------| | | ICL | ASS | ; | ı | Reg | Тур | е | N | 1ajO | p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rx+=sub(Rt,Rs) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | x5 | Field to encode register x | | ### Subtract halfword Perform a 16-bit subtract with optional saturation and place the result in either the upper or lower half of a register. If the result goes in the upper half, the sources can be any high or low halfword of Rs and Rt. The lower 16 bits of the result are zeroed. If the result is to be placed in the lower 16 bits of Rd, the Rs source can be either high or low, but the other source must be the low halfword of Rt. In this case, the upper halfword of Rd is the sign-extension of the low halfword. ### Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | Rd=sub(Rt.H,Rs.L):<<16 | Word32 Q6_R_sub_RhRl_s16(Word32 Rt, Word32 Rs) | |----------------------------|----------------------------------------------------| | Rd=sub(Rt.H,Rs.L):sat:<<16 | Word32 Q6_R_sub_RhRl_sat_s16(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.H) | Word32 Q6_R_sub_RlRh(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.H):<<16 | Word32 Q6_R_sub_RlRh_s16(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.H):sat | Word32 Q6_R_sub_RlRh_sat(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.H):sat:<<16 | Word32 Q6_R_sub_RlRh_sat_s16(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.L) | Word32 Q6_R_sub_RlRl(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.L):<<16 | Word32 Q6_R_sub_RlRl_s16(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.L):sat | Word32 Q6_R_sub_RlRl_sat(Word32 Rt, Word32 Rs) | | Rd=sub(Rt.L,Rs.L):sat:<<16 | Word32 Q6_R_sub_RlRl_sat_s16(Word32 Rt, Word32 Rs) | # Encoding | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rd=sub(Rt.L,Rs.L) | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rd=sub(Rt.L,Rs.H) | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Р | | t | t | t | t | t | 1 | 0 | | d | d | d | d | d | Rd=sub(Rt.L,Rs.L):sat | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Р | | t | t | t | t | t | 1 | 1 | | d | d | d | d | d | Rd=sub(Rt.L,Rs.H):sat | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=sub(Rt.L,Rs.L):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=sub(Rt.L,Rs.H):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rd=sub(Rt.H,Rs.L):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rd=sub(Rt.H,Rs.H):<<16 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=sub(Rt.L,Rs.L):sat:<<1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rd=sub(Rt.L,Rs.H):sat:<<1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=sub(Rt.H,Rs.L):sat:<<1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=sub(Rt.H,Rs.H):sat:<<1 | | Field name | | Description | |------------|---------------|-------------| | egType | Register type | | | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | # Sign extend word to doubleword Sign-extend a 32-bit word to a 64-bit doubleword. $Rdd=sxtw(Rs) Rdd = sxt_{32->64}(Rs);$ Class: XTYPE (slots 2,3) **Intrinsics** Rdd=sxtw(Rs) Word64 Q6\_P\_sxtw\_R(Word32 Rs) ### **Encoding** s5 | | 31 3 | 0 29 | 9 2 | 8 2 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|-----|-----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------| | | IC | LAS | S | | F | Reg | Тур | е | Ma | ijОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | Эр | | | d5 | | | | | Ī | 1 0 | 0 | 0 | ) | 0 | 1 | 0 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | - | d | d | d | d | d | Rdd=sxtw(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | Field to encode register s MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type ### Vector absolute value halfwords Take the absolute value of each of the four halfwords in the 64-bit source vector Rss. Place the result in Rdd. Saturation is optionally available. #### Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vabsh(Rss) Word64 Q6_P_vabsh_P(Word64 Rss) Rdd=vabsh(Rss):sat Word64 Q6_P_vabsh_P_sat(Word64 Rss) ``` ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | ICLASS RegType | | | | | | | | Ma | јОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | | - | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rdd=vabsh(Rss) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | P | • | • | - | - | - | - | 1 | 0 | 1 | d | d | d | d | d | Rdd=vabsh(Rss):sat | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | |---------|----------------------------| | d5 | Field to encode register d | | s5 | Field to encode register s | | MajOp | Major opcode | | MinOp | Minor opcode | | RegType | Register type | | | | ### Vector absolute value words Take the absolute value of each of the two words in the 64-bit source vector Rss. Place the result in Rdd. Saturation is optionally available. #### Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vabsw(Rss) Word64 Q6_P_vabsw_P(Word64 Rss) Rdd=vabsw(Rss):sat Word64 Q6_P_vabsw_P_sat(Word64 Rss) ``` ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | ICLASS RegType | | | | | | | | Ma | јОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | | - | | - | - | 1 | 1 | 0 | d | d | d | d | d | Rdd=vabsw(Rss) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | P | - | • | - | • | - | - | 1 | 1 | 1 | d | d | d | d | d | Rdd=vabsw(Rss):sat | # Field name Description | Parse | Packet/loop parse bits | |---------|----------------------------| | d5 | Field to encode register d | | s5 | Field to encode register s | | MajOp | Major opcode | | MinOp | Minor opcode | | RegType | Register type | | | | # Vector absolute difference bytes For each element in the source vector Rss, subtract the corresponding element in source vector Rtt. Take the absolute value of the results, and store into Rdd. #### Syntax Behavior | Rdd=vabsdiffb(Rtt,Rss) | <pre>for (i=0;i&lt;8;i++) { Rdd.b[i]=ABS(Rtt.b[i] - Rss.b[i]); }</pre> | | |-------------------------|------------------------------------------------------------------------------|--| | Rdd=vabsdiffub(Rtt,Rss) | <pre>for (i=0;i&lt;8;i++) { Rdd.b[i]=ABS(Rtt.ub[i] - Rss.ub[i]); }</pre> | | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=vabsdiffb(Rtt,Rss) | Word64 Q6_P_vabsdiffb_PP(Word64 Rtt, Word64 Rss) | |-------------------------|---------------------------------------------------| | Rdd=vabsdiffub(Rtt,Rss) | Word64 Q6_P_vabsdiffub_PP(Word64 Rtt, Word64 Rss) | ### **Encoding** Maj0p | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------| | | ICL | ASS | ; | ı | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vabsdiffub(Rtt,Rss) | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vabsdiffb(Rtt,Rss) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MinOp | winor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | Major opcode ### Vector absolute difference halfwords For each element in the source vector Rss, subtract the corresponding element in source vector Rtt. Take the absolute value of the results, and store into Rdd. #### Syntax Behavior Class: XTYPE (slots 2,3) ### **Intrinsics** ``` Rdd=vabsdiffh(Rtt,Rss) Word64 Q6_P_vabsdiffh_PP(Word64 Rtt, Word64 Rss) ``` ### **Encoding** Maj0p | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------| | | I | CLA | ASS | ; | I | Reg | Тур | е | Ν | /lajC | р | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | Ī | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vabsdiffh(Rtt,Rss) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | • | MinOp | Minor opcode | |----------------------------------------------------------|---------|----------------------------| | s5 Field to encode register d Field to encode register s | RegType | Register type | | s5 Field to encode register s | Parse | Packet/loop parse bits | | • | d5 | Field to encode register d | | t5 Field to encode register t | s5 | Field to encode register s | | | t5 | Field to encode register t | Major opcode ### Vector absolute difference words For each element in the source vector Rss, subtract the corresponding element in source vector Rtt. Take the absolute value of the results, and store into Rdd. #### Syntax Behavior Class: XTYPE (slots 2,3) ### **Intrinsics** ``` Rdd=vabsdiffw(Rtt,Rss) Word64 Q6_P_vabsdiffw_PP(Word64 Rtt, Word64 Rss) ``` ### **Encoding** Maj0p | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------| | Ī | I | CLA | ASS | 3 | I | Reg | Тур | е | Ν | /lajC | р | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | Ī | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vabsdiffw(Rtt,Rss) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | Major opcode ### Vector add compare and select maximum bytes Add each byte element in Rxx and Rtt, and compare the resulting sums with the corresponding differences between Rss and Rtt. Store the maximum value of each compare in Rxx, and set the corresponding bits in a predicate destination to '1' if the compare result is greater, '0' if not. Each sum and difference is saturated to eight bits before the compare, and the compare operation is a signed byte compare. Class: N/A ### Vector add compare and select maximum halfwords Add each halfword element in Rxx and Rtt, and compare the resulting sums with the corresponding differences between Rss and Rtt. Store the maximum value of each compare in Rxx, and set the corresponding bits in a predicate destination to '11' if the compare result is greater, '00' if not. Each sum and difference is saturated to 16 bits before the compare, and the compare operation is a signed halfword compare. #### Syntax Behavior ``` Rxx, Pe=vacsh(Rss,Rtt) for (i = 0; i < 4; i++) { xv = (int) Rxx.h[i]; sv = (int) Rss.h[i]; tv = (int) Rtt.h[i]; xv = xv + tv; sv = sv - tv; Pe.i*2 = (xv > sv); Pe.i*2+1 = (xv > sv); Rxx.h[i]=sat<sub>16</sub>(max(xv,sv)); } ``` Class: XTYPE (slots 2,3) #### **Notes** - The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. - If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Encoding** | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------------------| | Ī | I | CL | ASS | 3 | | Reg | Тур | е | Ν | ЛајС | )p | | | s5 | | | Pa | rse | | | | t5 | | | | е | 2 | | | х5 | | | | | Ī | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | е | е | х | х | х | x | х | Rxx,Pe=vacsh(Rss,Rtt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | | Parse | Packet/loop parse bits | | | e2 | Field to encode register e | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | x5 | Field to encode register x | | ### Vector add halfwords Add each of the four halfwords in 64-bit vector Rss to the corresponding halfword in vector Rtt. Optionally saturate each 16-bit addition to either a signed or unsigned 16-bit value. Applying saturation to the yaddh instruction clamps the result to the signed range 0x8000 to 0x7fff, whereas applying saturation to the vadduh instruction ensures that the unsigned result falls within the range 0 to 0xffff. When saturation is not needed, the vaddh form should be used. For the 32-bit version of this vector operation, see the ALU32 instructions. ### **Syntax Behavior** Rdd=vaddh(Rss,Rtt)[:sat] for (i=0; i<4; i++) { $Rdd.h[i] = [sat_{16}] (Rss.h[i] + Rtt.h[i]);$ Rdd=vadduh(Rss,Rtt):sat for (i=0; i<4; i++) { $Rdd.h[i] = usat_{16}(Rss.uh[i] + Rtt.uh[i]);$ Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Word64 Q6 P vaddh PP(Word64 Rss, Word64 Rtt) Rdd=vaddh (Rss, Rtt) Rdd=vaddh(Rss,Rtt):sat Word64 Q6 P vaddh PP sat (Word64 Rss, Word64 Rtt) Rdd=vadduh(Rss,Rtt):sat Word64 Q6 P vadduh PP sat(Word64 Rss, Word64 Rtt) ``` #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vaddh(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vaddh(Rss,Rtt):sat | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vadduh(Rss,Rtt):sat | | 1 | U | 1 | U | U | 1 | 1 | U | U | U | S | S | S | S | S | Р | Р | • | t | t | τ | t | τ | U | 1 | U | a | a | a | a | a | Rdd=vaddh(Rss | |---|---|---|-----|-----|----|----|---|---|---|---|---|---|---|---|---|---|---|---|---|-----|-----|------|----|---|---|---|---|---|---|---|---------------| | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vaddh(Rss | | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vadduh(Rs | | | | | Fie | eld | na | me | ) | | | | | | | | | | | | D | )es | cri | ptic | on | | | | | | | | | | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | ± 5 | Field to encode register t | ### Vector add halfwords with saturate and pack to unsigned bytes Add the four 16-bit halfwords of Rss to the four 16-bit halfwords of Rtt. The results are saturated to unsigned 8-bits and packed in destination register Rd. #### Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rd=vaddhub(Rss,Rtt):sat Word32 Q6_R_vaddhub_PP_sat(Word64 Rss, Word64 Rtt) ``` ### **Encoding** Parse | ; | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|-------------------------| | Ī | ı | CL | ASS | 3 | I | Reg | Тур | е | N | Лаj | | | | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | d5 | | | | | Ī | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=vaddhub(Rss,Rtt):sat | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | Packet/loop parse bits Field to encode register d | s5 | Field to encode register s | |---------|----------------------------| | t5 | Field to encode register t | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | ### Vector reduce add unsigned bytes For each byte in the source vector Rss, add the corresponding byte in the source vector Rtt. Add the four upper intermediate results and optionally the upper word of the destination. Add the four lower results and optionally the lower word of the destination. Syntax Behavior ### Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rdd=vraddub(Rss,Rtt) Word64 Q6_P_vraddub_PP(Word64 Rss, Word64 Rtt) Rxx+=vraddub(Rss,Rtt) Word64 Q6_P_vraddubacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) ``` ### **Encoding** x5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-------------------------------|-----|----|----|-----|-----|----|----|------|----|----|------|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-----------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | Ø | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vraddub(Rss,Rtt) | | | ICLASS RegType MajOp s5 Parse | | | | | | | t5 | | | N | linC | )p | | | х5 | • | | | | | | | | | | | | | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | x | x | х | х | Rxx+=vraddub(Rss,Rtt) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | Field to encode register x Majop Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t ### Vector reduce add halfwords For each halfword in the source vector Rss, add the corresponding halfword in the source vector Rtt. Add these intermediate results together, and place the result in Rd. Syntax Behavior | Rd=vraddh(Rss,Rtt) | <pre>Rd = 0; for (i=0;i&lt;4;i++) { Rd += (Rss.h[i]+Rtt.h[i]); }</pre> | |---------------------|------------------------------------------------------------------------------| | Rd=vradduh(Rss,Rtt) | <pre>Rd = 0; for (i=0;i&lt;4;i++) { Rd += (Rss.uh[i]+Rtt.uh[i]); }</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=vraddh(Rss,Rtt) | Word32 Q6_R_vraddh_PP(Word64 Rss, Word64 Rtt) | |---------------------|------------------------------------------------| | Rd=vradduh(Rss,Rtt) | Word32 Q6_R_vradduh_PP(Word64 Rss, Word64 Rtt) | ### **Encoding** | ; | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|---|---|---|----|---|---|---------------------| | | ICL | ASS | 3 | ١ | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | Λ | /linC | р | | | d5 | | | | | Ī | 1 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | | | s | s | s | ø | s | Р | Р | 0 | t | t | t | t | t | | 0 | 1 | d | d | d | d | d | Rd=vradduh(Rss,Rtt) | | | 1 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | - | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=vraddh(Rss,Rtt) | | Field name | Description | |------------|-------------| |------------|-------------| ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t ### Vector add bytes Add each of the eight bytes in 64-bit vector Rss to the corresponding byte in vector Rtt. Optionally, saturate each 8-bit addition to an unsigned value between 0 and 255. The eight results are stored in destination register Rdd. #### Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vaddb(Rss,Rtt) Word64 Q6_P_vaddb_PP(Word64 Rss, Word64 Rtt) Rdd=vaddub(Rss,Rtt) Word64 Q6_P_vaddub_PP(Word64 Rss, Word64 Rtt) Rdd=vaddub(Rss,Rtt):sat Word64 Q6_P_vaddub_PP_sat(Word64 Rss, Word64 Rtt) ``` ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|------|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------| | | ICL | ASS | 3 | ı | Reg | Туре | Э | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vaddub(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vaddub(Rss,Rtt):sat | | Register type | |----------------------------| | Minor opcode | | Instruction class | | Packet/loop parse bits | | Field to encode register d | | Field to encode register s | | Field to encode register t | | | ### Description ### Vector add words Add each of the two words in 64-bit vector Rss to the corresponding word in vector Rtt. Optionally, saturate each 32-bit addition to a signed value between 0x80000000 and 0x7fffffff. The two word results are stored in destination register Rdd. #### Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vaddw(Rss,Rtt) Word64 Q6_P_vaddw_PP(Word64 Rss, Word64 Rtt) Rdd=vaddw(Rss,Rtt):sat Word64 Q6_P_vaddw_PP_sat(Word64 Rss, Word64 Rtt) ``` ### **Encoding** ICLASS Parse d5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inO | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vaddw(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vaddw(Rss,Rtt):sat | | Field name | | Description | |------------|---------------|-------------| | RegType | Register type | | | MinOp | Minor opcode | | Field to encode register d s5 Field to encode register s t5 Field to encode register t Instruction class Packet/loop parse bits ### Vector average halfwords Average each of the four halfwords in the 64-bit source vector Rss with the corresponding halfword in Rtt. The average operation performed on each halfword adds the two halfwords and shifts the result right by 1 bit. Unsigned average uses a logical right shift (shift in 0), whereas signed average uses an arithmetic right shift (shift in the sign bit). When using the round option, a 0x0001 is also added to each result before shifting. This operation does not overflow. When a summation (before right shift by 1) causes an overflow of 32 bits, the value shifted in is the most-significant carry out. The signed average and negative average halfwords is available with optional convergent rounding. In convergent rounding, if the two LSBs after the addition/subtraction are 11, a rounding constant of 1 is added, otherwise a 0 is added. This result is then shifted right by one bit. Convergent rounding accumulates less error than arithmetic rounding. Syntax Behavior ``` Rdd=vavgh(Rss,Rtt) for (i=0; i<4; i++) { Rdd.h[i] = (Rss.h[i] +Rtt.h[i]) >>1; Rdd=vavgh(Rss,Rtt):crnd for (i=0; i<4; i++) { Rdd.h[i]=convround(Rss.h[i]+Rtt.h[i])>>1; Rdd=vavgh(Rss,Rtt):rnd for (i=0; i<4; i++) { Rdd.h[i] = (Rss.h[i] + Rtt.h[i] + 1) >> 1; Rdd=vavquh (Rss, Rtt) for (i=0; i<4; i++) { Rdd.h[i] = (Rss.uh[i] + Rtt.uh[i]) >> 1; Rdd=vavguh (Rss, Rtt):rnd for (i=0; i<4; i++) { Rdd.h[i] = (Rss.uh[i] + Rtt.uh[i] + 1) >> 1; for (i=0; i<4; i++) { Rdd=vnavgh(Rtt,Rss) Rdd.h[i] = (Rtt.h[i] -Rss.h[i]) >> 1; Rdd=vnavgh(Rtt,Rss):crnd:sat for (i=0;i<4;i++) {</pre> Rdd.h[i]=sat<sub>16</sub>(convround(Rtt.h[i]- Rss.h[i])>>1); } for (i=0;i<4;i++) { Rdd=vnavgh(Rtt,Rss):rnd:sat Rdd.h[i] = sat_{16}((Rtt.h[i] - Rss.h[i] + 1) >> 1); ``` Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### Intrinsics | Rdd=vavgh(Rss,Rtt) | Word64 Q6_P_vavgh_PP(Word64 Rss, Word64 Rtt) | |------------------------------|--------------------------------------------------------| | Rdd=vavgh(Rss,Rtt):crnd | Word64 Q6_P_vavgh_PP_crnd(Word64 Rss, Word64 Rtt) | | Rdd=vavgh(Rss,Rtt):rnd | Word64 Q6_P_vavgh_PP_rnd(Word64 Rss, Word64 Rtt) | | Rdd=vavguh(Rss,Rtt) | Word64 Q6_P_vavguh_PP(Word64 Rss, Word64 Rtt) | | Rdd=vavguh(Rss,Rtt):rnd | Word64 Q6_P_vavguh_PP_rnd(Word64 Rss, Word64 Rtt) | | Rdd=vnavgh(Rtt,Rss) | Word64 Q6_P_vnavgh_PP(Word64 Rtt, Word64 Rss) | | Rdd=vnavgh(Rtt,Rss):crnd:sat | Word64 Q6_P_vnavgh_PP_crnd_sat(Word64 Rtt, Word64 Rss) | | Rdd=vnavgh(Rtt,Rss):rnd:sat | Word64 Q6_P_vnavgh_PP_rnd_sat(Word64 Rtt, Word64 Rss) | ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|----------------------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vavgh(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vavgh(Rss,Rtt):rnd | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vavgh(Rss,Rtt):crnd | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vavguh(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rdd=vavguh(Rss,Rtt):rnd | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vnavgh(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vnavgh(Rtt,Rss):rnd:s at | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vnavgh(Rtt,Rss):crnd:<br>sat | | Field name | Description | |------------|-------------| |------------|-------------| | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ### Vector average unsigned bytes Average each of the eight unsigned bytes in the 64-bit source vector Rss with the corresponding byte in Rtt. The average operation performed on each byte is the sum of the two bytes shifted right by one bit. If the round option is used, a 0x01 is also added to each result before shifting. This operation does not overflow. When a summation (before right shift by 1) causes an overflow of 8 bits, the value shifted in is the most-significant carry out. ### Syntax Behavior | Rdd=vavgub(Rss,Rtt) | <pre>for (i = 0; i &lt; 8; i++) { Rdd.b[i]=((Rss.ub[i] + Rtt.ub[i])&gt;&gt;1); }</pre> | |-------------------------|--------------------------------------------------------------------------------------------| | Rdd=vavgub(Rss,Rtt):rnd | <pre>for (i = 0; i &lt; 8; i++) { Rdd.b[i]=((Rss.ub[i]+Rtt.ub[i]+1)&gt;&gt;1); }</pre> | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rdd=vavgub(Rss,Rtt) | Word64 Q6_P_vavgub_PP(Word64 Rss, Word64 Rtt) | |-------------------------|---------------------------------------------------| | Rdd=vavgub(Rss,Rtt):rnd | Word64 Q6_P_vavgub_PP_rnd(Word64 Rss, Word64 Rtt) | ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vavgub(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vavgub(Rss,Rtt):rnd | #### Field name Description | Regrype | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ### Vector average words Average each of the two words in the 64-bit source vector Rss with the corresponding word in Rtt. The average operation performed on each halfword adds the two words and shifts the result right by 1 bit. Unsigned average uses a logical right shift (shift in 0), whereas signed average uses an arithmetic right shift (shift in the sign bit). When using the round option, then a 0x1 is also added to each result before shifting. This operation does not overflow. When a summation (before right shift by 1) causes an overflow of 32 bits, the value shifted in is the most-significant carry out. The signed average and negative average words is available with optional convergent rounding. In convergent rounding, if the two LSBs after the addition/subtraction are 11, a rounding constant of 1 is added, otherwise a 0 is added. This result is then shifted right by one bit. Convergent rounding accumulates less error than arithmetic rounding. Syntax Behavior ``` Rdd=vavguw(Rss,Rtt)[:rnd] for (i=0;i<2;i++) { Rdd.w[i] = (zxt_{32->33}(Rss.uw[i]) + zxt_{32-} >33 (Rtt.uw[i])+1)>>1; for (i=0; i<2; i++) { Rdd=vavgw(Rss,Rtt):crnd Rdd.w[i] = (convround(sxt_{32->33}(Rss.w[i])+sxt_{32-}) >33 (Rtt.w[i]))>>1); } Rdd=vavgw(Rss,Rtt)[:rnd] for (i=0; i<2; i++) { Rdd.w[i] = (sxt_{32->33}(Rss.w[i]) + sxt_{32-} >33 (Rtt.w[i])+1)>>1; Rdd=vnavgw(Rtt,Rss) for (i=0; i<2; i++) { Rdd.w[i] = (sxt_{32->33}(Rtt.w[i]) - sxt_{32-} >33 (Rss.w[i]))>>1; Rdd=vnavgw(Rtt,Rss):crnd:sat for (i=0;i<2;i++) { Rdd.w[i] = sat_{32} (convround(sxt_{32->33}(Rtt.w[i]) - sxt_{32->33}(Rss.w[i]))>>1); for (i=0;i<2;i++) { Rdd=vnavgw(Rtt,Rss):rnd:sat Rdd.w[i] = sat_{32}((sxt_{32->33}(Rtt.w[i]) - sxt_{32-}) >33 (Rss.w[i])+1)>>1); ``` Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### Intrinsics | Rdd=vavguw(Rss,Rtt) | Word64 Q6_P_vavguw_PP(Word64 Rss, Word64 Rtt) | |------------------------------|--------------------------------------------------------| | Rdd=vavguw(Rss,Rtt):rnd | Word64 Q6_P_vavguw_PP_rnd(Word64 Rss, Word64 Rtt) | | Rdd=vavgw(Rss,Rtt) | Word64 Q6_P_vavgw_PP(Word64 Rss, Word64 Rtt) | | Rdd=vavgw(Rss,Rtt):crnd | Word64 Q6_P_vavgw_PP_crnd(Word64 Rss, Word64 Rtt) | | Rdd=vavgw(Rss,Rtt):rnd | Word64 Q6_P_vavgw_PP_rnd(Word64 Rss, Word64 Rtt) | | Rdd=vnavgw(Rtt,Rss) | Word64 Q6_P_vnavgw_PP(Word64 Rtt, Word64 Rss) | | Rdd=vnavgw(Rtt,Rss):crnd:sat | Word64 Q6_P_vnavgw_PP_crnd_sat(Word64 Rtt, Word64 Rss) | | Rdd=vnavgw(Rtt,Rss):rnd:sat | Word64 Q6_P_vnavgw_PP_rnd_sat(Word64 Rtt, Word64 Rss) | ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|---|---|---|----|---|---|----------------------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | р | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vavgw(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vavgw(Rss,Rtt):rnd | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vavgw(Rss,Rtt):crnd | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vavguw(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vavguw(Rss,Rtt):rnd | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vnavgw(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rdd=vnavgw(Rtt,Rss):rnd:s at | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rdd=vnavgw(Rtt,Rss):crnd:<br>sat | | Description | |-------------| | | | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | # Vector clip to unsigned Clip input to an unsigned integer. | Syntax | Behavior | |--------|----------| | | | | Rdd=vclip(Rss, #u5) | tmp=MIN((1<<#u)-1,MAX(Rss.w[0],-(1<<#u))); | |---------------------|-------------------------------------------------------------| | | Rdd.w[0]=tmp; | | | tmp=MIN( $(1 << \#u) -1$ , MAX(Rss.w[1], - $(1 << \#u)$ )); | | | Rdd.w[1]=tmp; | Class: XTYPE (slots 2,3) #### **Notes** This instruction can only execute on a core with the Hexagon audio extensions ### **Intrinsics** ``` Rdd=vclip(Rss, #u5) Word64 Q6_P_vclip_PI(Word64 Rss, Word32 Iu5) ``` ### **Encoding** RegType | 3 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|--------------------| | | 10 | CLA | ASS | | ı | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | P | 0 | i | i | i | ï | i | 1 | 1 | 0 | d | d | d | d | d | Rdd=vclip(Rss,#u5) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | Register type ### Vector conditional negate Based on bits in Rt, conditionally negate halves in Rss. ### Syntax Behavior ``` Rdd=vcnegh(Rss,Rt) for (i = 0; i < 4; i++) { if (Rt.i) { Rdd.h[i]=sat<sub>16</sub>(-Rss.h[i]); } else { Rdd.h[i]=Rss.h[i]; } } Rxx+=vrcnegh(Rss,Rt) for (i = 0; i < 4; i++) { if (Rt.i) { Rxx += -Rss.h[i]; } else { Rxx += Rss.h[i]; } }</pre> ``` Class: XTYPE (slots 2,3) #### **Notes** ■ If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd = vcnegh(Rss,Rt) Word64 Q6_P_vcnegh_PR(Word64 Rss, Word32 Rt) Rxx+=vrcnegh(Rss,Rt) Word64 Q6_P_vrcneghacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) Rt) ``` ### **Encoding** | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|----------------------| | | IC | CLA | SS | ; | ı | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | d5 | | | | | 1 | , | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | | d | d | d | d | d | Rdd=vcnegh(Rss,Rt) | | | IC | CLA | ASS | ; | ı | Reg | Тур | е | M | laj | | | • | s5 | • | | Pa | rse | | | | t5 | | | М | lin | | | | х5 | | | | | 1 | , | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | 1 | 1 | х | х | x | х | х | Rxx+=vrcnegh(Rss,Rt) | | - | - | - | | | _ | 1 | | _ | _ | _ | _ | _ | - | - | _ | - | - | - | - | | | |----|-----|-----|----|----|---|-----|-----|------|-----|------|----|---|---|---|---|---|----|-----|------|----|--| | | | | | | | | | | | | | | | | | | | | | | | | | Fie | eld | na | me | | | | | | | | | | | | D | es | cri | ptic | on | | | IC | LA | SS | | | | Ins | tru | ctic | n d | clas | ss | | | | | | | | | | | | Parse | Packet/loop parse bits | |---------|----------------------------| | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | # Vector maximum bytes Compare each of the eight unsigned bytes in the 64-bit source vector Rss to the corresponding byte in Rtt. For each comparison, select the maximum of the two bytes and place that byte in the corresponding location in Rdd. ### Syntax Behavior | Rdd=vmaxb(Rtt,Rss) | <pre>for (i = 0; i &lt; 8; i++) { Rdd.b[i]=max(Rtt.b[i],Rss.b[i]); }</pre> | |---------------------|----------------------------------------------------------------------------------| | Rdd=vmaxub(Rtt,Rss) | <pre>for (i = 0; i &lt; 8; i++) { Rdd.b[i]=max(Rtt.ub[i],Rss.ub[i]); }</pre> | ### Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=vmaxb(Rtt,Rss) | Word64 Q6_P_vmaxb_PP(Word64 Rtt, Word64 Rss) | |---------------------|-----------------------------------------------| | Rdd=vmaxub(Rtt,Rss) | Word64 Q6_P_vmaxub_PP(Word64 Rtt, Word64 Rss) | ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------| | | ICL | ASS | ; | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linO | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vmaxub(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vmaxb(Rtt,Rss) | | Field name | | Description | |------------|---------------|-------------| | Doomroo | Pogistor type | | | Regrype | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | ### Vector maximum halfwords Compare each of the four halfwords in the 64-bit source vector Rss to the corresponding halfword in Rtt. For each comparison, select the maximum of the two halfwords and place that halfword in the corresponding location in Rdd. Comparisons are available in both signed and unsigned form. ### Syntax Behavior ### Class: XTYPE (slots 2,3) #### **Intrinsics** | Rdd=vmaxh(Rtt,Rss) | Word64 Q6_P_vmaxh_PP(Word64 Rtt, Word64 Rss) | |---------------------|-----------------------------------------------| | Rdd=vmaxuh(Rtt,Rss) | Word64 Q6_P_vmaxuh_PP(Word64 Rtt, Word64 Rss) | ### **Encoding** | ; | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------| | Ī | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linO | )p | | | d5 | | | | | Ī | 1 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vmaxh(Rtt,Rss) | | | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vmaxuh(Rtt,Rss) | ### Field name Description | Reg'I'ype | Register type | |-----------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | ### Vector reduce maximum halfwords Register Rxx contains a maximum value in the low word and the address of that maximum value in the high word. Register Rss contains a vector of four halfword values, and register Ru contains the address of this data. The instruction finds the maximum halfword between the previous maximum in Rxx[0] and the four values in Rss. The address of the new maximum is stored in Rxx[1]. ### Syntax Behavior ``` Rxx=vrmaxh (Rss, Ru) max = Rxx.h[0]; addr = Rxx.w[1]; for (i = 0; i < 4; i++) { if (max < Rss.h[i]) { max = Rss.h[i]; addr = Ru \mid i << 1; Rxx.w[0]=max; Rxx.w[1] = addr; Rxx=vrmaxuh (Rss, Ru) max = Rxx.uh[0]; addr = Rxx.w[1]; for (i = 0; i < 4; i++) { if (max < Rss.uh[i]) {</pre> max = Rss.uh[i]; addr = Ru | i<<1; Rxx.w[0]=max; Rxx.w[1] = addr; ``` ### Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rxx=vrmaxh(Rss,Ru) Word64 Q6_P_vrmaxh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) Rxx=vrmaxuh(Rss,Ru) Word64 Q6_P_vrmaxuh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) ``` #### **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|---------------------| | | IC | LAS | 3 | | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | х5 | | | N | 1in | | | | u5 | | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | P | 0 | x | x | x | x | x | 0 | 0 | 1 | u | u | u | u | u | Rxx=vrmaxh(Rss,Ru) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | х | х | х | х | х | 0 | 0 | 1 | u | u | u | u | u | Rxx=vrmaxuh(Rss,Ru) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | - | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | | u5 | Field to encode register u | | | x5 | Field to encode register x | | | Maj | Major opcode | | | Min | Minor opcode | | | RegType | Register type | | | | | | ### Vector reduce maximum words Find the maximum word between the previous maximum in Rxx[0] and the two values in Rss. The address of the new maximum is stored in Rxx[1]. Register Rxx contains a maximum value in the low word and the address of that maximum value in the high word. Register Rss contains a vector of two word values, and register Ru contains the address of this data. #### **Behavior Syntax** ``` Rxx=vrmaxuw(Rss,Ru) max = Rxx.uw[0]; addr = Rxx.w[1]; for (i = 0; i < 2; i++) { if (max < Rss.uw[i]) {</pre> max = Rss.uw[i]; addr = Ru | i << 2; } } Rxx.w[0]=max; Rxx.w[1] = addr; Rxx=vrmaxw(Rss,Ru) max = Rxx.w[0]; addr = Rxx.w[1]; for (i = 0; i < 2; i++) { if (max < Rss.w[i]) {</pre> max = Rss.w[i]; addr = Ru | i << 2; } } Rxx.w[0]=max; Rxx.w[1] = addr; ``` ### Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rxx=vrmaxuw(Rss,Ru) Word64 Q6 P vrmaxuw PR(Word64 Rxx, Word64 Rss, Word32 Ru) Rxx=vrmaxw(Rss,Ru) Word64 Q6 P vrmaxw PR(Word64 Rxx, Word64 Rss, Word32 Ru) ``` #### **Encoding** Field name | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|---------------------| | | ICL | ASS | 3 | ١ | Reg | Тур | Э | М | laj | | | | s5 | | | Pa | rse | | | | х5 | | | N | 1in | | | | u5 | | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | x | x | x | x | x | 0 | 1 | 0 | u | u | u | u | u | Rxx=vrmaxw(Rss,Ru) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Р | 1 | x | x | x | x | x | 0 | 1 | 0 | u | u | u | u | u | Rxx=vrmaxuw(Rss,Ru) | | 3 | 30 | 29 | 20 | 21 | 20 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 10 | 17 | 10 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 0 | ′ | О | J | 4 | ว | | | U | | |---|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|---------------------| | | ICL | ASS | ; | | Reg | Тур | е | М | aj | | | | s5 | | | Pa | rse | | | | х5 | | | M | lin | | | | u5 | | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | x | x | x | x | x | 0 | 1 | 0 | u | u | u | u | u | Rxx=vrmaxw(Rss,Ru) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Ρ | 1 | x | x | x | x | x | 0 | 1 | 0 | u | u | u | u | u | Rxx=vrmaxuw(Rss,Ru) | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Description #### Instruction class ICLASS Packet/loop parse bits Parse s5 Field to encode register s u5 Field to encode register u Field to encode register x x5 Мај Major opcode Minor opcode Min RegType Register type ### Vector maximum words Compare each of the two words in the 64-bit source vector Rss to the corresponding word in Rtt. For each comparison, select the maximum of the two words and place that word in the corresponding location in Rdd. Comparisons are available in both signed and unsigned form. ### Syntax Behavior | Rdd=vmaxuw(Rtt,Rss) | <pre>for (i = 0; i &lt; 2; i++) { Rdd.w[i]=max(Rtt.uw[i],Rss.uw[i]); }</pre> | |---------------------|---------------------------------------------------------------------------------| | Rdd=vmaxw(Rtt,Rss) | <pre>for (i = 0; i &lt; 2; i++) { Rdd.w[i]=max(Rtt.w[i],Rss.w[i]); }</pre> | ### Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=vmaxuw(Rtt,Rss) | Word64 Q6_P_vmaxuw_PP(Word64 Rtt, Word64 Rss) | |---------------------|-----------------------------------------------| | Rdd=vmaxw(Rtt,Rss) | Word64 Q6_P_vmaxw_PP(Word64 Rtt, Word64 Rss) | ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | P | P | | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vmaxuw(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vmaxw(Rtt,Rss) | | Field name | Descript | ion | |------------|----------|-----| | | | | | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ### Vector minimum bytes Compare each of the eight unsigned bytes in the 64-bit source vector Rss to the corresponding byte in Rtt. For each comparison, select the minimum of the two bytes and place that byte in the corresponding location in Rdd. ### Syntax Behavior | Rdd, Pe=vminub (Rtt, Rss) | <pre>for (i = 0; i &lt; 8; i++) { Pe.i = (Rtt.ub[i] &gt; Rss.ub[i]); Rdd.b[i]=min(Rtt.ub[i],Rss.ub[i]); }</pre> | |---------------------------|-------------------------------------------------------------------------------------------------------------------------| | Rdd=vminb(Rtt,Rss) | <pre>for (i = 0; i &lt; 8; i++) { Rdd.b[i]=min(Rtt.b[i],Rss.b[i]); }</pre> | | Rdd=vminub(Rtt,Rss) | <pre>for (i = 0; i &lt; 8; i++) { Rdd.b[i]=min(Rtt.ub[i],Rss.ub[i]); }</pre> | Class: XTYPE (slots 2,3) #### **Notes** The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. ### **Intrinsics** ``` Rdd=vminb(Rtt,Rss) Word64 Q6_P_vminb_PP(Word64 Rtt, Word64 Rss) Rdd=vminub(Rtt,Rss) Word64 Q6_P_vminub_PP(Word64 Rtt, Word64 Rss) ``` ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|----------------|-----|----|----|-----|-----|----|----|------|-------|----|----|----|-------|----|----|----|----|-------|----|----|----|---|----|---|---|---|---|---|---|---|------------------------| | | ICLASS RegType | | | | s5 | | | | | Parse | | | t5 | | | | | | MinOp | | | d5 | | | | | | | | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vminub(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vminb(Rtt,Rss) | | | ICL | ASS | 3 | F | Reg | Тур | е | N | 1ajC | )p | s5 | | | Parse | | | t5 | | | | e2 | | | d5 | | | | | | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | е | е | d | d | d | d | d | Rdd,Pe=vminub(Rtt,Rss) | | Field name | | Description | |------------|---------------|-------------| | RegType | Register type | | | MinOp | Minor opcode | |--------|----------------------------| | ICLASS | Instruction class | | MajOp | Major opcode | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | e2 | Field to encode register e | | s5 | Field to encode register s | | t5 | Field to encode register t | ### Vector minimum halfwords Compare each of the four halfwords in the 64-bit source vector Rss to the corresponding halfword in Rtt. For each comparison, select the minimum of the two halfwords and place that halfword in the corresponding location in Rdd. Comparisons are available in both signed and unsigned form. ### Syntax Behavior | Rdd=vminh(Rtt,Rss) | <pre>for (i = 0; i &lt; 4; i++) { Rdd.h[i]=min(Rtt.h[i],Rss.h[i]); }</pre> | |---------------------|----------------------------------------------------------------------------------| | Rdd=vminuh(Rtt,Rss) | <pre>for (i = 0; i &lt; 4; i++) { Rdd.h[i]=min(Rtt.uh[i],Rss.uh[i]); }</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=vminh(Rtt,Rss) | Word64 Q6_P_vminh_PP(Word64 Rtt, Word64 Rss) | |---------------------|-----------------------------------------------| | Rdd=vminuh(Rtt,Rss) | Word64 Q6_P_vminuh_PP(Word64 Rtt, Word64 Rss) | ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|----------------|----|----|----|----|----|----|----|----|----|-------|----|----|----|----|----|----|----|-------|----|----|---|---|----|---|---|---|---|---|---|---|---------------------| | | ICLASS RegType | | | | | s5 | | | | | Parse | | | t5 | | | | | MinOp | | | | | d5 | | | | | | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vminh(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vminuh(Rtt,Rss) | | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | ### Vector reduce minimum halfwords Find the minimum halfword between the previous minimum in Rxx[0] and the four values in Rss. The address of the new minimum is stored in Rxx[1]. Register Rxx contains a minimum value in the low word and the address of that minimum value in the high word. Register Rss contains a vector of four halfword values, and register Ru contains the address of this data. #### Syntax Behavior ``` Rxx=vrminh(Rss,Ru) min = Rxx.h[0]; addr = Rxx.w[1]; for (i = 0; i < 4; i++) { if (min > Rss.h[i]) { min = Rss.h[i]; addr = Ru | i<<1; Rxx.w[0]=min; Rxx.w[1] = addr; Rxx=vrminuh(Rss,Ru) min = Rxx.uh[0]; addr = Rxx.w[1]; for (i = 0; i < 4; i++) { if (min > Rss.uh[i]) { min = Rss.uh[i]; addr = Ru | i << 1; Rxx.w[0]=min; Rxx.w[1] = addr; ``` ### Class: XTYPE (slots 2,3) #### **Intrinsics** | Rxx=vrminh(Rss,Ru) | Word64 Q6_P_vrminh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) | |---------------------|-----------------------------------------------------------| | Rxx=vrminuh(Rss,Ru) | Word64 Q6_P_vrminuh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) | ## **Encoding** | 3 | 1 30 | 29 | 9 2 | 8 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|----|-----|---|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|---------------------| | | ICL | AS | S | | F | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | х5 | | | M | lin | | | | u5 | | | | | 1 | 1 | 0 | 0 | ) | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Р | 0 | x | x | x | x | x | 1 | 0 | 1 | u | u | u | u | u | Rxx=vrminh(Rss,Ru) | | 1 | 1 | 0 | 0 | ) | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Р | 1 | x | x | x | x | x | 1 | 0 | 1 | u | u | u | u | u | Rxx=vrminuh(Rss,Ru) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | - | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | | u5 | Field to encode register u | | | x5 | Field to encode register x | | | Maj | Major opcode | | | Min | Minor opcode | | | RegType | Register type | | | | | | ### Vector reduce minimum words Find the minimum word between the previous minimum in Rxx[0] and the two values in Rss. The address of the new minimum is stored in Rxx[1]. Register Rxx contains a minimum value in the low word and the address of that minimum value in the high word. Register Rss contains a vector of two word values, and register Ru contains the address of this data. #### Syntax Behavior ``` Rxx=vrminuw(Rss,Ru) min = Rxx.uw[0]; addr = Rxx.w[1]; for (i = 0; i < 2; i++) { if (min > Rss.uw[i]) { min = Rss.uw[i]; addr = Ru | i << 2; Rxx.w[0]=min; Rxx.w[1] = addr; Rxx=vrminw(Rss,Ru) min = Rxx.w[0]; addr = Rxx.w[1]; for (i = 0; i < 2; i++) { if (min > Rss.w[i]) { min = Rss.w[i]; addr = Ru | i << 2; Rxx.w[0]=min; Rxx.w[1] = addr; ``` ### Class: XTYPE (slots 2,3) #### **Intrinsics** | Rxx=vrminuw(Rss,Ru) | Word64 Q6_P_vrminuw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) | |---------------------|-----------------------------------------------------------| | Rxx=vrminw(Rss,Ru) | Word64 Q6_P_vrminw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) | ### **Encoding** | 3 | 1 30 | 0 2 | 9 2 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|---------------------| | | ICI | LAS | SS | | F | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | х5 | | | M | lin | | | | u5 | | | | | 1 | 1 | 0 | ) ( | ) | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | х | х | х | х | х | 1 | 1 | 0 | u | u | u | u | u | Rxx=vrminw(Rss,Ru) | | 1 | 1 | 0 | ) ( | C | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 1 | х | х | х | х | х | 1 | 1 | 0 | u | u | u | u | u | Rxx=vrminuw(Rss,Ru) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | - | | Parse | Packet/loop parse bits | | | s5 | Field to encode register s | | | u5 | Field to encode register u | | | x5 | Field to encode register x | | | Maj | Major opcode | | | Min | Minor opcode | | | RegType | Register type | | | | | | ## Vector minimum words Compare each of the two words in the 64-bit source vector Rss to the corresponding word in Rtt. For each comparison, select the minimum of the two words and place that word in the corresponding location in Rdd. Comparisons are available in both signed and unsigned form. ### Syntax Behavior | Rdd=vminuw(Rtt,Rss) | <pre>for (i = 0; i &lt; 2; i++) { Rdd.w[i]=min(Rtt.uw[i],Rss.uw[i]); }</pre> | |---------------------|---------------------------------------------------------------------------------| | Rdd=vminw(Rtt,Rss) | <pre>for (i = 0; i &lt; 2; i++) { Rdd.w[i]=min(Rtt.w[i],Rss.w[i]); }</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=vminuw(Rtt,Rss) | Word64 Q6_P_vminuw_PP(Word64 Rtt, Word64 Rss) | |---------------------|-----------------------------------------------| | Rdd=vminw(Rtt,Rss) | Word64 Q6_P_vminw_PP(Word64 Rtt, Word64 Rss) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vminw(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vminuw(Rtt,Rss) | | Field name | Description | |------------|-------------| | | - | | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | # Vector sum of absolute differences unsigned bytes For each byte in the source vector Rss, subtract the corresponding byte in source vector Rtt. Take the absolute value of the intermediate results, and the upper four together and add the lower four together. Optionally, add the destination upper and lower words to these results. This instruction is useful in determining distance between two vectors, in applications such as motion estimation. Syntax Behavior ## Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rdd=vrsadub(Rss,Rtt) Word64 Q6_P_vrsadub_PP(Word64 Rss, Word64 Rtt) Rxx+=vrsadub(Rss,Rtt) Word64 Q6_P_vrsadubacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) ``` ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |--------|--------|----|----|-----|-----|-----|----|------|-------|----|----|----|----|----|-----|-----|-----|----|----|----|----|---|---|-------|-------|----|---|----|----|---|---|-----------------------| | ICLASS | | | | | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vrsadub(Rss,Rtt) | | | ICLASS | | | Reg | Тур | е | N | /ajC | )p | | | s5 | | | Pai | rse | | | | t5 | | | ٨ | /linC | )p | | | х5 | | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | х | x | х | х | x | Rxx+=vrsadub(Rss,Rtt) | | Field name | Description | |------------|-------------| |------------|-------------| ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x ### Vector subtract halfwords Subtract each of the four halfwords in 64-bit vector Rss from the corresponding halfword in vector Rtt. Optionally, saturate each 16-bit addition to either a signed or unsigned 16-bit value. Applying saturation to the vsubh instruction clamps the result to the signed range 0x8000 to 0x7fff, whereas applying saturation to the vsubuh instruction ensures that the unsigned result falls within the range 0 to 0xffff. Use the vsubh instruction when saturation is not needed. ### Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vsubh(Rtt,Rss) Word64 Q6_P_vsubh_PP(Word64 Rtt, Word64 Rss) Rdd=vsubh(Rtt,Rss):sat Word64 Q6_P_vsubh_PP_sat(Word64 Rtt, Word64 Rss) Rdd=vsubh(Rtt,Rss):sat Word64 Q6 P vsubh PP sat(Word64 Rtt, Word64 Rss) ``` #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------| | Г | ICL | ASS | ; | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vsubh(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vsubh(Rtt,Rss):sat | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vsubuh(Rtt,Rss):sat | ### Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ## Vector subtract bytes Subtract each of the eight bytes in 64-bit vector Rss from the corresponding byte in vector Rtt. Optionally, saturate each 8-bit subtraction to an unsigned value between 0 and 255. The eight results are stored in destination register Rdd. ## Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vsubb(Rss,Rtt) Word64 Q6_P_vsubb_PP(Word64 Rss, Word64 Rtt) Rdd=vsubub(Rtt,Rss) Word64 Q6_P_vsubub_PP(Word64 Rtt, Word64 Rss) Rdd=vsubub(Rtt,Rss):sat Word64 Q6_P_vsubub_PP_sat(Word64 Rtt, Word64 Rss) ``` ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------| | | ICL/ | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vsubub(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vsubub(Rtt,Rss):sat | #### Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ### Vector subtract words Subtract each of the two words in 64-bit vector Rss from the corresponding word in vector Rtt. Optionally, saturate each 32-bit subtraction to a signed value between 0x8000\_0000 and 0x7fff\_ffff. The two word results are stored in destination register Rdd. ## Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** ■ If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vsubw(Rtt,Rss) Word64 Q6_P_vsubw_PP(Word64 Rtt, Word64 Rss) Rdd=vsubw(Rtt,Rss):sat Word64 Q6_P_vsubw_PP_sat(Word64 Rtt, Word64 Rss) ``` ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | ٨ | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | P | Р | | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vsubw(Rtt,Rss) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vsubw(Rtt,Rss):sat | ### Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ### 11.10.2 XTYPE BIT The XTYPE BIT instruction subclass includes instructions for bit manipulation. Bit manipulation operations modify bit fields in a register or register pair. ## Count leading The count leading zeros (c10) instruction counts the number of consecutive zeros starting with the most significant bit. The count leading ones (cl1) instruction counts the number of consecutive ones starting with the most significant bit. The count leading bits (clb) instruction counts both leading ones and leading zeros and then selects the maximum. The normamt instruction returns the number of leading bits minus one. For a two's-complement number, the number of leading zeros is zero for negative numbers. The number of leading ones is zero for positive numbers. Use the number of leading bits to judge the magnitude of the value. Syntax Behavior ``` Rd=add(clb(Rs), #s6) Rd = (max(count leading ones(Rs), count_leading_ones(~Rs))) + #s; Rd=add(clb(Rss), #s6) Rd = (max(count leading ones(Rss), count leading ones(~Rss))) + #s; Rd=cl0(Rs) Rd = count_leading_ones(~Rs); Rd=cl0(Rss) Rd = count leading ones(~Rss); Rd=cl1(Rs) Rd = count leading ones(Rs); Rd = count leading ones(Rss); Rd=cl1(Rss) Rd=clb(Rs) Rd = max(count leading ones(Rs), count leading ones(~Rs)); Rd=clb(Rss) Rd = max(count leading ones(Rss), count leading ones (~Rss)); Rd=normamt(Rs) if (Rs == 0) { Rd = 0; } else { Rd = (max(count leading ones(Rs), count leading ones(~Rs))) - 1; Rd=normamt(Rss) if (Rss == 0) { Rd = 0; } else { Rd = (max(count leading ones(Rss), count_leading_ones(~Rss))) - 1; ``` ## Class: XTYPE (slots 2,3) ## **Intrinsics** | Rd=add(clb(Rs), #s6) | Word32 Q6_R_add_clb_RI(Word32 Rs, Word32 Is6) | |-----------------------|------------------------------------------------| | Rd=add(clb(Rss), #s6) | Word32 Q6_R_add_clb_PI(Word64 Rss, Word32 Is6) | | Rd=cl0(Rs) | Word32 Q6_R_cl0_R(Word32 Rs) | | Rd=cl0(Rss) | Word32 Q6_R_cl0_P(Word64 Rss) | | Rd=cl1(Rs) | Word32 Q6_R_cl1_R(Word32 Rs) | | Rd=cl1(Rss) | Word32 Q6_R_cl1_P(Word64 Rss) | | Rd=clb(Rs) | Word32 Q6_R_clb_R(Word32 Rs) | | Rd=clb(Rss) | Word32 Q6_R_clb_P(Word64 Rss) | | Rd=normamt(Rs) | Word32 Q6_R_normamt_R(Word32 Rs) | | Rd=normamt(Rss) | Word32 Q6_R_normamt_P(Word64 Rss) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------------| | | ICL/ | ASS | 3 | I | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=clb(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 1 | 0 | d | d | d | d | d | Rd=cl0(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rd=cl1(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=normamt(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | ø | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 0 | d | d | d | d | d | Rd=add(clb(Rss),#s6) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | s | Ø | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rd=add(clb(Rs),#s6) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rd=clb(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | ø | s | s | s | Р | Р | | - | - | | - | - | 1 | 0 | 1 | d | d | d | d | d | Rd=cl0(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 1 | 0 | d | d | d | d | d | Rd=cl1(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 1 | 1 | d | d | d | d | d | Rd=normamt(Rs) | | Field name | Description | |---------------|-------------| | rielu liaille | Description | | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type # Count population The population count (popcount) instruction counts the number of bits in Rss that are set. Rd = popcount(Rss) Rd = count\_ones(Rss); Class: XTYPE (slots 2,3) **Intrinsics** Rd=popcount(Rss) Word32 Q6\_R\_popcount\_P(Word64 Rss) ## **Encoding** | 3 | 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------| | | IC | LAS | SS | | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | | 1 0 | ) ( | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 1 | 1 | d | d | d | d | d | Rd=popcount(Rss) | | Field name | | | Description | |------------|--|--|-------------| | | | | | ICLASS Instruction class Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s MajOp Major opcode MinOp Minor opcode RegType Register type # Count trailing The count trailing zeros (ct0) instruction counts the number of consecutive zeros starting with the least significant bit. The count trailing ones (ct1) instruction counts the number of consecutive ones starting with the least significant bit. | Syntax | Behavior | |--------|----------| |--------|----------| | Rd = ct0(Rs) | <pre>Rd = count_leading_ones(~reverse_bits(Rs));</pre> | |---------------|---------------------------------------------------------| | Rd = ct0(Rss) | <pre>Rd = count_leading_ones(~reverse_bits(Rss));</pre> | | Rd =ct1(Rs) | <pre>Rd = count_leading_ones(reverse_bits(Rs));</pre> | | Rd = ct1(Rss) | <pre>Rd = count_leading_ones(reverse_bits(Rss));</pre> | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Rd=ct0(Rs) | Word32 Q6_R_ct0_R(Word32 Rs) | |-------------|-------------------------------| | Rd=ct0(Rss) | Word32 Q6_R_ct0_P(Word64 Rss) | | Rd=ct1(Rs) | Word32 Q6_R_ct1_R(Word32 Rs) | | Rd=ct1(Rss) | Word32 Q6_R_ct1_P(Word64 Rss) | ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|---|---|-------------| | | ICL | ASS | 3 | I | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | р | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Ρ | P | - | | - | | - | - | 0 | 1 | 0 | d | d | d | d | d | Rd=ct0(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | | - | | - | - | 1 | 0 | 0 | d | d | d | d | d | Rd=ct1(Rss) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | | - | | 1 | 0 | 0 | d | d | d | d | d | Rd=ct0(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 1 | d | d | d | d | d | Rd=ct1(Rs) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | |---------|----------------------------| | d5 | Field to encode register d | | s5 | Field to encode register s | | MajOp | Major opcode | | MinOp | Minor opcode | | RegType | Register type | ### Extract bit field Extract a bit field from the source register (or register pair) and deposit into the least significant bits of the destination register (or register pair). The other, more significant bits in the destination are either cleared or sign-extended, depending on the instruction. The width of the extracted field is obtained from the first immediate or from the most-significant word of Rtt. The field offset is obtained from either the second immediate or from the least-significant word of Rtt. For register-based extract, where Rtt supplies the offset and width, the offset value is treated as a signed 7-bit number. If this value is negative, the source register Rss is shifted left (the reverse direction). Width number of bits are then taken from the least-significant portion of this result. If the shift amount and/or offset captures data beyond the most significant end of the input, these bits are taken as zero. # Syntax Behavior ``` Rd=extract(Rs, #u5, #U5) width = \#u; offset = \#U; Rd = sxt_{width->32}((Rs >> offset)); width = zxt_{6->32}((Rtt.w[1])); Rd=extract(Rs,Rtt) offset = sxt_{7->32}((Rtt.w[0])); Rd = sxt_{width->64}((offset>0)?(zxt_{32->64}(zxt_{32->64}(Rs)) >>> offset): (zxt_{32->64}(zxt_{32->64}(Rs)) << offset)); Rd=extractu(Rs, #u5, #U5) width = \#u; offset = #U; Rd = zxt_{width->32}((Rs >> offset)); width = zxt_{6\rightarrow32}((Rtt.w[1])); Rd=extractu(Rs,Rtt) offset = sxt_{7->32}((Rtt.w[0])); Rd = zxt_{width->64}((offset>0)?(zxt_{32->64}(zxt_{32->64}(Rs)) >>> offset): (zxt_{32->64}(zxt_{32->64}(Rs)) << offset)); ``` | Syntax | Behavior | |-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rdd=extract(Rss, #u6, #U6) | <pre>width = #u; offset = #U; Rdd = sxt<sub>width-&gt;64</sub>((Rss &gt;&gt; offset));</pre> | | Rdd=extract(Rss,Rtt) | $\label{eq:width} \begin{array}{ll} \text{width} = \text{zxt}_{6\rightarrow32} ((\text{Rtt.w[1]})); \\ \text{offset} = \text{sxt}_{7\rightarrow32} ((\text{Rtt.w[0]})); \\ \text{Rdd} = \text{sxt}_{\text{width}\rightarrow64} ((\text{offset}>0)?(\text{Rss}>>> \text{offset}):(\text{Rss}<< \text{offset})); \\ \end{array}$ | | Rdd=extractu(Rss, #u6, #U6) | <pre>width=#u; offset=#U; Rdd = zxtwidth-&gt;64((Rss &gt;&gt; offset));</pre> | | Rdd=extractu(Rss,Rtt) | $\label{eq:width} \begin{array}{ll} \text{width} = \text{zxt}_{6\rightarrow32}((\text{Rtt.w[1]}));\\ \text{offset} = \text{sxt}_{7\rightarrow32}((\text{Rtt.w[0]}));\\ \text{Rdd} = \text{zxt}_{\text{width}\rightarrow64}((\text{offset}>0)?(\text{Rss}>>> \text{offset}):(\text{Rss}<< \text{offset})); \end{array}$ | # Class: XTYPE (slots 2,3) ## **Intrinsics** | Rd=extract(Rs, #u5, #U5) | Word32 Q6_R_extract_RII(Word32 Rs, Word32 Iu5, Word32 IU5) | |---------------------------|--------------------------------------------------------------| | Rd=extract(Rs,Rtt) | Word32 Q6_R_extract_RP(Word32 Rs, Word64 Rtt) | | Rd=extractu(Rs, #u5, #U5) | Word32 Q6_R_extractu_RII(Word32 Rs, Word32 Iu5, Word32 IU5) | | Rd=extractu(Rs,Rtt) | Word32 Q6_R_extractu_RP(Word32 Rs, Word64 Rtt) | | Rdd=extract(Rss,#u6,#U6) | Word64 Q6_P_extract_PII(Word64 Rss, Word32 Iu6, Word32 IU6) | | Rdd=extract(Rss,Rtt) | Word64 Q6_P_extract_PP(Word64 Rss, Word64 Rtt) | | Rdd=extractu(Rss,#u6,#U6) | Word64 Q6_P_extractu_PII(Word64 Rss, Word32 Iu6, Word32 IU6) | | Rdd=extractu(Rss,Rtt) | Word64 Q6_P_extractu_PP(Word64 Rss, Word64 Rtt) | # Encoding | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|-----|------|----|---|---|----|---|---|---------------------------| | | ICL. | ASS | 3 | | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ı | I | I | s | s | s | s | s | Р | Р | i | i | i | i | i | i | ı | ı | I | d | d | d | d | d | Rdd=extractu(Rss,#u6,#U6) | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | ı | ı | I | s | s | s | s | s | Р | P | i | i | i | i | i | i | ı | ı | I | d | d | d | d | d | Rdd=extract(Rss,#u6,#U6) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | ı | I | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | I | I | I | d | d | d | d | d | Rd=extractu(Rs,#u5,#U5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | ı | I | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | I | ı | I | d | d | d | d | d | Rd=extract(Rs,#u5,#U5) | | | ICL | ASS | 3 | | Reg | Тур | е | Ма | jOp | | | • | s5 | | | Pa | rse | | | | t5 | | | Mir | Ор | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rdd=extractu(Rss,Rtt) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rdd=extract(Rss,Rtt) | | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rd=extractu(Rs,Rtt) | | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rd=extract(Rs,Rtt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | | | | | ### Insert bit field Replace a bit field in the destination register (or register pair) with bits from the least significant portion of Rs/Rss. The number of bits is obtained from the first immediate or the most-significant word of Rtt. The second immediate or the least significant word of Rtt shift the bits. If register Rtt specifies the offset, the low 7 bits of Rtt are treated as a signed 7-bit value. If this value is negative, the result is zero. Shift amounts and offsets that are too large can push bits beyond the end of the destination register, in this case the bits do not appear in the destination register. #### Syntax Behavior ``` Rx=insert(Rs, #u5, #U5) width=#u; offset=#U; Rx &= \sim (((1 << width) -1) << offset); Rx \mid = ((Rs \& ((1 << width) -1)) << offset); Rx=insert(Rs,Rtt) width=zxt_{6->32}((Rtt.w[1])); offset=sxt_{7->32}((Rtt.w[0])); mask = ((1 << width) -1); if (offset < 0) { Rx = 0; } else { Rx &= \sim (mask<<offset); Rx \mid = ((Rs \& mask) << offset); Rxx=insert(Rss, #u6, #U6) width=#u; offset=#U; Rxx &= \sim (((1 << width) -1) << offset); Rxx \mid = ((Rss & ((1 << width) -1)) << offset); ``` #### **Syntax** #### **Behavior** Class: XTYPE (slots 2,3) ### **Intrinsics** | Rx=insert(Rs,#u5,#U5) | Word32 Q6_R_insert_RII(Word32 Rx, Word32 Rs, Word32 Iu5, Word32 IU5) | |--------------------------------|------------------------------------------------------------------------| | <pre>Rx=insert(Rs,Rtt)</pre> | Word32 Q6_R_insert_RP(Word32 Rx, Word32 Rs, Word64 Rtt) | | Rxx=insert(Rss, #u6, #U6) | Word64 Q6_P_insert_PII(Word64 Rxx, Word64 Rss, Word32 Iu6, Word32 IU6) | | <pre>Rxx=insert(Rss,Rtt)</pre> | Word64 Q6_P_insert_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | ## **Encoding** | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-------------------------| | | IC | CLA | ASS | | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | х5 | | | | | - | | 0 | 0 | 0 | 0 | 0 | 1 | 1 | I | ı | I | s | s | s | s | s | Р | Р | i | i | i | i | i | i | I | ı | I | x | x | x | x | x | Rxx=insert(Rss,#u6,#U6) | | 1 | | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | ı | ı | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | ı | I | I | x | x | x | х | x | Rx=insert(Rs,#u5,#U5) | | | IC | CLA | ASS | | | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | х5 | | | | | - | I | 1 | 0 | 0 | 1 | 0 | 0 | 0 | | - | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | x | x | x | x | x | Rx=insert(Rs,Rtt) | | | IC | CLA | ASS | | | Reg | Тур | е | М | laj | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | х5 | | | | | • | | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | - | - | - | x | x | x | x | x | Rxx=insert(Rss,Rtt) | | Field name | | Description | |------------|-------------------|-------------| | CLASS | Instruction class | | | ICLASS | Instruction class | |---------|----------------------------| | Parse | Packet/loop parse bits | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | | MajOp | Major opcode | | MinOp | Minor opcode | | Maj | Major opcode | | RegType | Register type | ## Interleave/deinterleave For interleave, bits I+32 of Rss (the bits from the upper source word) are placed in the odd bits (I\*2)+1 of Rdd, while bits I of Rss (the bits from the lower source word) are placed in the even bits (I\*2) of Rdd. For deinterleave, the even bits of the source register are placed in the even register of the result pair, and the odd bits of the source register are placed in the odd register of the result pair. r1:0 = deinterleave(r1:0) is the inverse of r1:0 = interleave(r1:0). | Syntax | Benavior | |-----------------------|-------------------------------------------| | Rdd=deinterleave(Rss) | <pre>Rdd = deinterleave(ODD, EVEN);</pre> | | Rdd=interleave(Rss) | Rdd = interleave(Rss w[1].Rss w[0]): | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rdd=deinterleave(Rss) | Word64 Q6_P_deinterleave_P(Word64 Rss) | |-----------------------|----------------------------------------| | Rdd=interleave(Rss) | Word64 Q6_P_interleave_P(Word64 Rss) | ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-----------------------| | | ICL | ASS | 3 | ١ | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rdd=deinterleave(Rss) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 1 | d | d | d | d | d | Rdd=interleave(Rss) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | ## Linear feedback-shift iteration Count the number of ones of the logical AND of the two source input values, and take the least significant value of that sum. The first source value is shifted right by one bit, and the parity is placed in the MSB. ### Syntax Behavior Class: XTYPE (slots 2,3) ### **Intrinsics** Rdd=lfs(Rss,Rtt) Word64 Q6\_P\_lfs\_PP(Word64 Rss, Word64 Rtt) ### **Encoding** | 3 | 1 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|------------------| | | 10 | CLA | ASS | 3 | ı | Reg | Тур | е | N | laj | | | | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=lfs(Rss,Rtt) | ### Field name Description | ICLASS | Instruction class | |---------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | # Masked parity Count the number of ones of the logical AND of the two source input values, and take the least significant bit of that sum. | Syntax | Behavior | |---------|-----------| | Cylitax | Deliavior | | Rd=parity(Rs,Rt) | <pre>Rd = 1&amp;count_ones(Rs &amp; Rt);</pre> | |--------------------|--------------------------------------------------| | Rd=parity(Rss,Rtt) | <pre>Rd = 1&amp;count_ones(Rss &amp; Rtt);</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=parity(Rs,Rt) | Word32 | Q6_R_parity_RR(Word32 Rs, Word32 Rt) | |--------------------|----------------|--------------------------------------| | Rd=parity(Rss,Rtt) | Word32<br>Rtt) | Q6_R_parity_PP(Word64 Rss, Word64 | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------------| | | ICL. | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | - | - | - | s | s | s | s | s | P | Р | - | t | t | t | t | t | - | | | d | d | d | d | d | Rd=parity(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | d | d | d | d | d | Rd=parity(Rs,Rt) | # Field name Description | Reg'l'ype | Register type | |-----------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | ## Bit reverse Reverse the order of bits. The most significant bit swaps with the least significant bit, bit 30 swaps with bit 1, and so on. | Syntax | Behavior | |--------|-----------| | Symax | Deliavior | | Rd=brev(Rs) | <pre>Rd = reverse_bits(Rs);</pre> | |---------------|-------------------------------------| | Rdd=brev(Rss) | <pre>Rdd = reverse_bits(Rss);</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=brev(Rs) | Word32 Q6_R_brev_R(Word32 Rs) | |---------------|--------------------------------| | Rdd=brev(Rss) | Word64 Q6_P_brev_P(Word64 Rss) | ## **Encoding** | 3 | 1 30 | 0 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------| | | IC | LAS | 3 | | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | P | P | - | - | | | - | - | 1 | 1 | 0 | d | d | d | d | d | Rdd=brev(Rss) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | | - | 1 | 1 | 0 | d | d | d | d | d | Rd=brev(Rs) | #### Field name Description | ICLASS | instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | | | MajOp Major opcode MinOp Minor opcode RegType Register type # Set/clear/toggle bit Set (to 1), clear (to 0), or toggle a single bit in the source, and place the resulting value in the destination. Indicate the bit to manipulate by using an immediate or register value. When using a register to indicate the bit position, and the value of the least-significant 7 bits of Rt is out of range, the destination register is unchanged. | Syntax | Behavior | |-----------------------|----------------------------------------------------------------------------------------------------------------------------------| | Rd=clrbit(Rs, #u5) | $Rd = (Rs \& (\sim (1 << \#u)));$ | | Rd=clrbit(Rs,Rt) | Rd = (Rs & (~(( $sxt_{7->32}(Rt)>0$ )?( $zxt_{32->64}(1$ )<< $sxt_{7->32}(Rt)$ ):( $zxt_{32->64}(1$ ) >>> $sxt_{7->32}(Rt)$ ))); | | Rd=setbit(Rs, #u5) | Rd = (Rs (1 << #u)); | | Rd=setbit(Rs,Rt) | Rd = (Rs $(sxt_{7->32}(Rt)>0)?(zxt_{32->64}(1) << sxt_{7->32}(Rt)):(zxt_{32->64}(1) >>> sxt_{7->32}(Rt)));$ | | Rd=togglebit(Rs, #u5) | $Rd = (Rs ^ (1 << \#u));$ | | Rd=togglebit(Rs,Rt) | Rd = (Rs $^(sxt_{7->32}(Rt)>0)?(zxt_{32->64}(1)<< sxt_{7->32}(Rt)):(zxt_{32->64}(1)>>> sxt_{7->32}(Rt)));$ | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Rd=clrbit(Rs, #u5) | Word32 Q6_R_clrbit_RI(Word32 Rs, Word32 Iu5) | |----------------------|-------------------------------------------------| | Rd=clrbit(Rs,Rt) | Word32 Q6_R_clrbit_RR(Word32 Rs, Word32 Rt) | | Rd=setbit(Rs, #u5) | Word32 Q6_R_setbit_RI(Word32 Rs, Word32 Iu5) | | Rd=setbit(Rs,Rt) | Word32 Q6_R_setbit_RR(Word32 Rs, Word32 Rt) | | Rd=togglebit(Rs,#u5) | Word32 Q6_R_togglebit_RI(Word32 Rs, Word32 Iu5) | | Rd=togglebit(Rs,Rt) | Word32 Q6_R_togglebit_RR(Word32 Rs, Word32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|-----|-------|----|---|---|----|---|---|----------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rd=setbit(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 1 | d | d | d | d | d | Rd=clrbit(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | P | 0 | i | i | i | | i | 0 | 1 | 0 | d | d | d | d | d | Rd=togglebit(Rs,#u5) | | | ICL | ASS | 3 | I | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | • | t5 | | | Mir | ιОр | | | • | d5 | • | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rd=setbit(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rd=clrbit(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rd=togglebit(Rs,Rt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | | | | | # Split bit field Split the bit field in a register into upper and lower parts of variable size. The lower part is placed in the lower word of a destination register pair, and the upper part is placed in the upper word of the destination. An immediate value or register Rt is used to determine the bit position of the split. | Syntax | benavior | |-----------------------|-------------------------------------------------------------------------------------------------------------------------| | Rdd=bitsplit(Rs, #u5) | Rdd.w[1] = (Rs>>#u);<br>Rdd.w[0] = $zxt_{#u->32}$ (Rs); | | Rdd=bitsplit(Rs,Rt) | <pre>shamt = zxt<sub>5-&gt;32</sub>(Rt); Rdd.w[1] = (Rs&gt;&gt;shamt); Rdd.w[0] = zxt<sub>shamt-&gt;32</sub>(Rs);</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=bitsplit(Rs, #u5) | Word64 Q6_P_bitsplit_RI(Word32 Rs, Word32 Iu5) | |-----------------------|------------------------------------------------| | Rdd=bitsplit(Rs,Rt) | Word64 Q6_P_bitsplit_RR(Word32 Rs, Word32 Rt) | ## **Encoding** | : | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|----------------------| | Ī | I | CLA | ASS | 3 | | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 0 | 0 | d | d | d | d | d | Rdd=bitsplit(Rs,#u5) | | | I | CL | ASS | 6 | | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | • | t5 | • | | | | | | | d5 | • | | | | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | - | - | 1 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | - | - | - | d | d | d | d | d | Rdd=bitsplit(Rs,Rt) | | Field name | | Description | |------------|----------------------------|-------------| | RegType | Register type | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | MajOp | Major opcode | | | MinOp | Minor opcode | | ### Table index Supports fast lookup tables where the index into the table is stored in a bit-field. This instruction forms the address of a table element by extracting the bit-field and inserting it into the appropriate bits of a pointer to the table element. Tables are defined to contain entries of bytes, halfwords, words, or doublewords. The table must be aligned to a power-of-2 size greater than or equal to the table size. For example, align a 4 kB table to a 4 kB boundary. This instruction supports tables with a maximum of 32 k table entries. Register Rx contains a pointer to within the table. Register Rs contains a field to extract and use as a table index. This instruction first extracts the field from register Rs and then inserts it into register Rx. The insertion point is bit 0 for tables of bytes, bit 1 for tables of halfwords, bit 2 for tables of words, and bit 3 for tables of doublewords. In the assembly syntax, the width and offset values represent the field in Rs to extract. Use unsigned constants to specify the width and offsets in assembly. In the encoded instruction, however, these values are adjusted by the assembler as follows. - For tableidxb, no adjustment is necessary. - For tableidxh, the assembler encodes offset-1 in the signed immediate field. - For tableidxw, the assembler encodes offset-2 in the signed immediate field. - For tableidxd, the assembler encodes offset-3 in the signed immediate field. #### Rx=TABLEIDXD(Rs,#width,#offset) | Syntax | Behavior | |--------|----------| | • | | | Rx=tableidxb(Rs, #u4, #S6):raw | <pre>width=#u; offset=#S; field = Rs[(width+offset-1):offset]; Rx[(width-1+0):0]=field;</pre> | |--------------------------------|-------------------------------------------------------------------------------------------------| | Rx=tableidxb(Rs, #u4, #U5) | Assembler mapped to: "Rx=tableidxb(Rs, #u4, #U5):raw" | | Rx=tableidxd(Rs, #u4, #S6):raw | <pre>width=#u; offset=#S+3; field = Rs[(width+offset-1):offset]; Rx[(width-1+3):3]=field;</pre> | | Rx=tableidxd(Rs, #u4, #U5) | Assembler mapped to: "Rx=tableidxd(Rs, #u4, #U5-3):raw" | | Syntax | Behavior | |-------------------------------------------|---------------------------------------------------------------------------------------------------| | <pre>Rx=tableidxh(Rs, #u4, #S6):raw</pre> | <pre>width=#u; offset=#S+1; field = Rs[(width+offset-1):offset]; Rx[(width-1+1):1]=field;</pre> | | Rx=tableidxh(Rs, #u4, #U5) | Assembler mapped to: "Rx=tableidxh(Rs, #u4, #U5-1):raw" | | <pre>Rx=tableidxw(Rs, #u4, #S6):raw</pre> | <pre>width=#u; offset=#S+2; field = Rs[(width+offset-1):offset]; Rx[(width-1+2):2] = field;</pre> | | <pre>Rx=tableidxw(Rs, #u4, #U5)</pre> | Assembler mapped to: "Rx = tableidxw(Rs, #114, #115 - 2):raw" | ## Class: XTYPE (slots 2,3) ### **Intrinsics** | Rx=tableidxb(Rs, #u4, #U5) | Word32 Q6_R_tableidxb_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) | |---------------------------------------|-------------------------------------------------------------------------| | <pre>Rx=tableidxd(Rs, #u4, #U5)</pre> | Word32 Q6_R_tableidxd_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) | | Rx=tableidxh(Rs, #u4, #U5) | Word32 Q6_R_tableidxh_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) | | Rx=tableidxw(Rs, #u4, #U5) | Word32 Q6_R_tableidxw_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------------------------| | | ICL | ASS | 3 | | Reg | Тур | е | Ма | jОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | х5 | | | | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | i | s | s | s | s | s | Р | Р | ı | ı | ı | ı | ı | ı | i | i | i | x | x | x | x | x | Rx=tableidxb(Rs,#u4,#S6):r | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | i | s | s | s | s | s | Р | Р | ı | ı | ı | ı | ı | ı | i | i | i | x | x | x | x | x | Rx=tableidxh(Rs,#u4,#S6):r | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | i | s | s | s | s | s | Р | Р | ı | ı | ı | ı | ı | ı | i | i | i | x | x | x | x | x | Rx=tableidxw(Rs,#u4,#S6): raw | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | i | s | s | ø | s | s | Р | Р | ı | ı | ı | ı | ı | ı | i | i | i | x | x | x | x | x | Rx=tableidxd(Rs,#u4,#S6):r<br>aw | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | · | Parse Packet/loop parse bits s5 Field to encode register s x5 Field to encode register x MajOp Major opcode MajOp Major opcode MinOp Minor opcode RegType Register type ## 11.10.3 XTYPE COMPLEX The XTYPE COMPLEX instruction subclass includes instructions that are for complex math, using imaginary values. ## Complex add/sub halfwords Cross vector add-sub or sub-add perform X + jY and X - jY complex operations. Each 16-bit result is saturated to 16 bits. | Syntax | Behavior | |--------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | <pre>Rdd = vxaddsubh(Rss, Rtt):sat</pre> | Rdd.h[0]=sat <sub>16</sub> (Rss.h[0]+Rtt.h[1]);<br>Rdd.h[1]=sat <sub>16</sub> (Rss.h[1]-Rtt.h[0]);<br>Rdd.h[2]=sat <sub>16</sub> (Rss.h[2]+Rtt.h[3]);<br>Rdd.h[3]=sat <sub>16</sub> (Rss.h[3]-Rtt.h[2]); | | <pre>Rdd = vxsubaddh(Rss, Rtt):rnd:&gt;&gt;1:sat</pre> | <pre>Rdd.h[0]=sat<sub>16</sub>((Rss.h[0]-Rtt.h[1]+1)&gt;&gt;1); Rdd.h[1]=sat<sub>16</sub>((Rss.h[1]+Rtt.h[0]+1)&gt;&gt;1); Rdd.h[2]=sat<sub>16</sub>((Rss.h[2]-Rtt.h[3]+1)&gt;&gt;1); Rdd.h[3]=sat<sub>16</sub>((Rss.h[3]+Rtt.h[2]+1)&gt;&gt;1);</pre> | | <pre>Rdd = vxsubaddh(Rss, Rtt):sat</pre> | Rdd.h[0]=sat <sub>16</sub> (Rss.h[0]-Rtt.h[1]);<br>Rdd.h[1]=sat <sub>16</sub> (Rss.h[1]+Rtt.h[0]);<br>Rdd.h[2]=sat <sub>16</sub> (Rss.h[2]-Rtt.h[3]);<br>Rdd.h[3]=sat <sub>16</sub> (Rss.h[3]+Rtt.h[2]); | Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** | <pre>Rdd = vxaddsubh(Rss, Rtt):rnd:&gt;&gt;1:sat</pre> | Word64 Q6_P_vxaddsubh_PP_rnd_rs1_sat(Word64 Rss, Word64 Rtt) | | | | | | | | | |----------------------------------------------------------|--------------------------------------------------------------|--|--|--|--|--|--|--|--| | Rdd = vxaddsubh(Rss,<br>Rtt):sat | Word64 Q6_P_vxaddsubh_PP_sat(Word64 Rss, Word64 Rtt) | | | | | | | | | | <pre>Rdd = vxsubaddh(Rss, Rtt):rnd: &gt;&gt; 1:sat</pre> | Word64 Q6_P_vxsubaddh_PP_rnd_rs1_sat(Word64 Rss, Word64 Rtt) | | | | | | | | | | Rdd = vxsubaddh(Rss,<br>Rtt):sat | Word64 Q6_P_vxsubaddh_PP_sat(Word64 Rss, Word64 Rtt) | | | | | | | | | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|----------------------------------------| | | ICL. | ASS | ; | ı | Reg | Тур | е | N | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vxaddsubh(Rss,Rtt):s at | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vxsubaddh(Rss,Rtt):s at | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rdd=vxaddsubh(Rss,Rtt):r<br>nd:>>1:sat | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rdd=vxsubaddh(Rss,Rtt):r<br>nd:>>1:sat | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | - | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | Maj | Major opcode | | | Min | Minor opcode | | | RegType | Register type | | | | | | ## Complex add/sub words Cross vector add-sub or sub-add perform X + jY and X - jY complex operations. Each 32-bit result is saturated to 32 bits. | Syntax | | | | | Behavior | |--------------|-----------|-----|------|------|----------| | addenbu (Dee | D++1 .ca+ | PAA | tr [ | /Dec | T.T [ | | Rdd=vxaddsubw(Rss,Rtt):sat | Rdd.w[0]=sat <sub>32</sub> (Rss.w[0]+Rtt.w[1]);<br>Rdd.w[1]=sat <sub>32</sub> (Rss.w[1]-Rtt.w[0]); | |----------------------------|----------------------------------------------------------------------------------------------------| | Rdd=vxsubaddw(Rss,Rtt):sat | Rdd.w[0]=sat <sub>32</sub> (Rss.w[0]-Rtt.w[1]);<br>Rdd.w[1]=sat <sub>32</sub> (Rss.w[1]+Rtt.w[0]); | Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** ``` Rdd=vxaddsubw(Rss,Rtt):sat Word64 Q6_P_vxaddsubw_PP_sat(Word64 Rss, Word64 Rtt) Rdd=vxsubaddw(Rss,Rtt):sat Word64 Q6_P_vxsubaddw_PP_sat(Word64 Rss, Word64 Rtt) ``` # **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|--------------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | N | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=vxaddsubw(Rss,Rtt):s | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vxsubaddw(Rss,Rtt):s | ### Field name ## Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | Maj Major opcode Min Minor opcode RegType Register type ## Complex multiply Multiply complex values Rs and Rt. The inputs have a real 16-bit value in the low halfword and an imaginary 16-bit value in the high halfword. Optionally, scale the result by 0 to 1 bits. Optionally, add a complex accumulator. Saturate the real and imaginary portions to 32 bits. The output has a real 32-bit value in the low word and an imaginary 32-bit value in the high word. The Rt input can be optionally conjugated. Another option is to subtract the result from the destination rather than accumulated. ## Syntax Behavior | Rdd=cmpy(Rs,Rt)[:<<1]:sat | Rdd.w[1]=sat <sub>32</sub> ((Rs.h[1] * Rt.h[0])[<<1] + (Rs.h[0] * Rt.h[1])[<<1]);<br>Rdd.w[0]=sat <sub>32</sub> ((Rs.h[0] * Rt.h[0])[<<1] - (Rs.h[1] * Rt.h[1])[<<1]); | |----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rdd=cmpy(Rs,Rt*)[:<<1]:sat | Rdd.w[1]=sat <sub>32</sub> ((Rs.h[1] * Rt.h[0])[<<1] - (Rs.h[0] * Rt.h[1])[<<1]);<br>Rdd.w[0]=sat <sub>32</sub> ((Rs.h[0] * Rt.h[0])[<<1] + (Rs.h[1] * Rt.h[1])[<<1]); | | Syntax | Behavior | |----------------------------------------------|-------------------------------------------------------| | <pre>Rxx+=cmpy(Rs,Rt)[:&lt;&lt;1]:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx+=cmpy(Rs,Rt*)[:&lt;&lt;1]:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx-=cmpy(Rs,Rt)[:&lt;&lt;1]:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx-=cmpy(Rs,Rt*)[:&lt;&lt;1]:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | ## Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** | Rdd=cmpy(Rs,Rt):<<1:sat | Word64 Q6_P_cmpy_RR_s1_sat(Word32 Rs, Word32 Rt) | |--------------------------------------------|----------------------------------------------------------------------| | Rdd=cmpy(Rs,Rt):sat | Word64 Q6_P_cmpy_RR_sat(Word32 Rs, Word32 Rt) | | Rdd=cmpy(Rs,Rt*):<<1:sat | Word64 Q6_P_cmpy_RR_conj_s1_sat(Word32 Rs, Word32 Rt) | | Rdd=cmpy(Rs,Rt*):sat | Word64 Q6_P_cmpy_RR_conj_sat(Word32 Rs, Word32 Rt) | | <pre>Rxx+=cmpy(Rs,Rt):&lt;&lt;1:sat</pre> | Word64 Q6_P_cmpyacc_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=cmpy(Rs,Rt):sat</pre> | Word64 Q6_P_cmpyacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=cmpy(Rs,Rt*):&lt;&lt;1:sat</pre> | Word64 Q6_P_cmpyacc_RR_conj_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=cmpy(Rs,Rt*):sat</pre> | Word64 Q6_P_cmpyacc_RR_conj_sat(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx-=cmpy(Rs,Rt):&lt;&lt;1:sat</pre> | Word64 Q6_P_cmpynac_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) | #### **Encoding** ICLASS Maj0p | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---------|----|---|---|----------------------------------------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inO | р | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d d d d | | | d | Rdd=cmpy(Rs,Rt)[:< <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=cmpy(Rs,Rt*)[:< <n]:s at<="" td=""></n]:s> | | | ICL | ASS | 3 | I | Reg | Тур | е | ٨ | ЛаjС | )p | | | s5 | | • | Pa | rse | | | • | t5 | | • | N | linO | р | | • | х5 | • | • | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rxx+=cmpy(Rs,Rt)[:< <n]:s< td=""></n]:s<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rxx-<br>=cmpy(Rs,Rt)[:< <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rxx+=cmpy(Rs,Rt*)[:< <n]: sat<="" td=""></n]:> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | N | 1 | 0 | s | ø | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rxx-<br>=cmpy(Rs,Rt*)[:< <n]:sat< td=""></n]:sat<> | | Field name | Description | |----------------|-------------| | i iciu iiaiiic | Description | | Minor opcode | |----------------------------| | Register type | | Packet/loop parse bits | | Field to encode register d | | Field to encode register s | | Field to encode register t | | Field to encode register x | | | Instruction class Major opcode ## Complex multiply real or imaginary Multiply complex values Rs and Rt. The inputs have a real 16-bit value in the low halfword and an imaginary 16-bit value in the high halfword. Take either the real or imaginary result and optionally accumulate with a 64-bit destination. | Gyillax | Dellavior | |-------------------|-------------------------------------------------------------------| | Rdd=cmpyi(Rs,Rt) | Rdd = (Rs.h[1] * Rt.h[0]) + (Rs.h[0] * Rt.h[1]); | | Rdd=cmpyr(Rs,Rt) | Rdd = (Rs.h[0] * Rt.h[0]) - (Rs.h[1] * Rt.h[1]); | | Rxx+=cmpyi(Rs,Rt) | <pre>Rxx = Rxx + (Rs.h[1] * Rt.h[0]) + (Rs.h[0] * Rt.h[1]);</pre> | | Rxx+=cmpyr(Rs,Rt) | <pre>Rxx = Rxx + (Rs.h[0] * Rt.h[0]) - (Rs.h[1] * Rt.h[1]);</pre> | #### Class: XTYPE (slots 2,3) #### **Intrinsics** | Rdd=cmpyi(Rs,Rt) | Word64 Q6_P_cmpyi_RR(Word32 Rs, Word32 Rt) | |-------------------|-----------------------------------------------------------| | Rdd=cmpyr(Rs,Rt) | Word64 Q6_P_cmpyr_RR(Word32 Rs, Word32 Rt) | | Rxx+=cmpyi(Rs,Rt) | Word64 Q6_P_cmpyiacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx+=cmpyr(Rs,Rt) | Word64 Q6_P_cmpyracc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | ## **Encoding** Maj0p | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|-------------|----|-----|-----|----|-------|------|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|-------|-------|---|---|----|----|---|---|-------------------| | | ICL | ASS | ASS RegType | | | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | р | | | d5 | | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=cmpyi(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=cmpyr(Rs,Rt) | | | ICL | ASS | 6 | ı | Reg | Тур | е | N | /ajC | )p | | | s5 | | | Pa | rse | | | • | t5 | • | • | N | /linC | р | | | х5 | • | • | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | х | х | х | х | Rxx+=cmpyi(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | х | x | х | х | х | Rxx+=cmpyr(Rs,Rt) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | Major opcode ## Complex multiply with round and pack Multiply complex values Rs and Rt. The inputs have a real 16-bit value in the low halfword and an imaginary 16-bit value in the high halfword. The Rt input is optionally conjugated. The multiplier results are optionally scaled by 0 to 1 bits. A rounding constant is added to each real and imaginary sum. The real and imaginary parts are individually saturated to 32 bits. The upper 16 bits of each 32-bit results are packed in a 32-bit destination register. Syntax Behavior ## Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | <pre>Rd=cmpy(Rs,Rt):&lt;&lt;1:rnd:sat</pre> | Word32 Q6_R_cmpy_RR_s1_rnd_sat(Word32 Rs, Word32 Rt) | |----------------------------------------------|-------------------------------------------------------------------| | Rd=cmpy(Rs,Rt):rnd:sat | Word32 Q6_R_cmpy_RR_rnd_sat(Word32 Rs, Word32 Rt) | | <pre>Rd=cmpy(Rs,Rt*):&lt;&lt;1:rnd:sat</pre> | Word32 Q6_R_cmpy_RR_conj_s1_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=cmpy(Rs,Rt*):rnd:sat | <pre>Word32 Q6_R_cmpy_RR_conj_rnd_sat(Word32 Rs, Word32 Rt)</pre> | ## **Encoding** Field name ICLASS | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-----------------------------------------------------| | | IC | CLA | SS | ; | I | Reg | Тур | е | ٨ | ЛајС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | | 1 | 1 | 0 | 1 | 1 | 0 | 1 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=cmpy(Rs,Rt)[:< <n]:rnd: sat<="" td=""></n]:rnd:> | | 1 | | 1 | 1 | 0 | 1 | 1 | 0 | 1 | N | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=cmpy(Rs,Rt*)[:< <n]:rnd :sat<="" td=""></n]:rnd> | Description | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sai | |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|------------| | 1 | 0 | 1 | 1 | 0 | 1 | N | 1 | 1 | s | w | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=cmpy(Rs | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MajOp | Major opcode | |---------|----------------------------| | MinOp | Minor opcode | | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | Instruction class ## Complex multiply 32 x 16 Multiply 32- by 16-bit complex values Rss and Rt. The inputs have a real value in the low part of a register and the imaginary value in the upper part. The multiplier results are scaled by 1 bit and accumulated with a rounding constant. The result is saturated to 32 bits. Class: XTYPE (slots 2,3) cmpyrwh(Rss,Rt\*):<<1:rnd:sat</pre> #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. (Rss.w[1] \* Rt.h[1]) + 0x4000) >> 15); #### **Intrinsics** Rd = Word32 Q6\_R\_cmpyiwh\_PR\_s1\_rnd\_sat(Word64 Rss, Word32 Rt) Rd = Word32 Q6\_R\_cmpyiwh\_PR\_s1\_rnd\_sat(Word64 Rss, Word32 Rt) Rd = Word32 Q6\_R\_cmpyiwh\_PR\_conj\_s1\_rnd\_sat(Word64 Rss, Word32 Rt) Rd = Word32 Q6\_R\_cmpyrwh\_PR\_s1\_rnd\_sat(Word64 Rss, Word32 Rt) Rd = Word32 Q6\_R\_cmpyrwh\_PR\_s1\_rnd\_sat(Word64 Rss, Word32 Rt) Rd = Word32 Q6\_R\_cmpyrwh\_PR\_s1\_rnd\_sat(Word64 Rss, Word32 Rt) ## **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|----|---|---|---|----|---|---|----------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | М | in | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | - | - | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=cmpyiwh(Rss,Rt):<<1:r | | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | - | - | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rd=cmpyiwh(Rss,Rt*):<<1: rnd:sat | | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | - | - | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=cmpyrwh(Rss,Rt):<<1:r | | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | - | - | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=cmpyrwh(Rss,Rt*):<<1: rnd:sat | | Field name | | Description | |------------|-------------------|-------------| | TACC | Instruction class | | | ICLASS | instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Min | Minor opcode | Min Minor opcode RegType Register type ## Complex multiply real or imaginary 32-bit Multiply complex values Rss and Rtt. The inputs have a real 32-bit value in the low word and an imaginary 32-bit value in the high word. Take either the real or imaginary result and optionally accumulate with a 64-bit destination. Syntax Behavior ``` tmp128 = sxt_{64->128}((Rss.w[0] * Rtt.w[1])); Rd = cmpyiw(Rss,Rtt):<<1:rnd:sat</pre> acc128 = sxt_{64->128}((Rss.w[1] * Rtt.w[0])); const128 = sxt_{64->128} (0x40000000); acc128 = tmp128 + acc128; acc128 = acc128 + const128; acc128 = (size8s t) (acc128 >> 31); acc64 = sxt_{128->64}(acc128); Rd = sat_{32}(acc64); Rd = cmpyiw(Rss,Rtt): <<1:sat tmp128 = sxt_{64->128}((Rss.w[0] * Rtt.w[1])); acc128 = sxt_{64->128}((Rss.w[1] * Rtt.w[0])); acc128 = tmp128 + acc128; acc128 = (size8s t) (acc128 >> 31); acc64 = sxt_{128->64}(acc128); Rd = sat_{32}(acc64); tmp128 = sxt_{64->128}((Rss.w[1] * Rtt.w[0])); cmpyiw(Rss,Rtt*):<<1:rnd:sat</pre> acc128 = sxt_{64->128}((Rss.w[0] * Rtt.w[1])); const128 = sxt_{64->128}(0x40000000); acc128 = tmp128-acc128; acc128 = acc128 + const128; acc128 = (size8s t) (acc128 >> 31); acc64 = sxt_{128->64}(acc128); Rd = sat_{32}(acc64); tmp128 = sxt_{64->128}((Rss.w[1] * Rtt.w[0])); cmpyiw(Rss,Rtt*):<<1:sat</pre> acc128 = sxt_{64->128}((Rss.w[0] * Rtt.w[1])); acc128 = tmp128-acc128; acc128 = (size8s t) (acc128 >> 31); acc64 = sxt_{128->64} (acc128); Rd = sat_{32}(acc64); Rd = tmp128 = sxt_{64->128}((Rss.w[0] * Rtt.w[0])); cmpyrw(Rss,Rtt):<<1:rnd:sat</pre> acc128 = sxt_{64->128}((Rss.w[1] * Rtt.w[1])); const128 = sxt_{64->128}(0x40000000); acc128 = tmp128-acc128; acc128 = acc128 + const128; acc128 = (size8s t) (acc128 >> 31); acc64 = sxt_{128->64}(acc128); Rd = sat_{32}(acc64); {\tt Rd = cmpyrw(Rss,Rtt):} <<1:sat tmp128 = sxt_{64->128}((Rss.w[0] * Rtt.w[0])); acc128 = sxt_{64->128}((Rss.w[1] * Rtt.w[1])); acc128 = tmp128-acc128; acc128 = (size8s t) (acc128 >> 31); acc64 = sxt_{128->64}(acc128); Rd = sat_{32}(acc64); ``` #### **Syntax Behavior** $tmp128 = sxt_{64->128}((Rss.w[0] * Rtt.w[0]));$ Rd =cmpyrw(Rss,Rtt\*):<<1:rnd:sat</pre> $acc128 = sxt_{64->128}((Rss.w[1] * Rtt.w[1]));$ const128 = $sxt_{64->128}(0x40000000)$ ; acc128 = tmp128 + acc128;acc128 = acc128 + const128;acc128 = (size8s t) (acc128 >> 31); $acc64 = sxt_{128->64}(acc128);$ $Rd = sat_{32}(acc64);$ Rd = $tmp128 = sxt_{64->128}((Rss.w[0] * Rtt.w[0]));$ cmpyrw(Rss,Rtt\*):<<1:sat $acc128 = sxt_{64->128}((Rss.w[1] * Rtt.w[1]));$ acc128 = tmp128 + acc128;acc128 = (size8s t) (acc128 >> 31); $acc64 = sxt_{128->64}(acc128);$ $Rd = sat_{32}(acc64);$ Rdd = cmpyiw(Rss,Rtt) Rdd = ((Rss.w[0] \* Rtt.w[1]) + (Rss.w[1] \*Rtt.w[0])); Rdd = cmpyiw(Rss,Rtt\*) Rdd = ((Rss.w[1] \* Rtt.w[0]) - (Rss.w[0] \*Rtt.w[1])); Rdd = ((Rss.w[0] \* Rtt.w[0]) - (Rss.w[1] \*Rdd = cmpyrw(Rss,Rtt) Rtt.w[1])); Rdd = cmpyrw(Rss,Rtt\*) Rdd = ((Rss.w[0] \* Rtt.w[0]) + (Rss.w[1] \*Rtt.w[1])); Rxx += ((Rss.w[0] \* Rtt.w[1]) + (Rss.w[1] \*Rxx += cmpyiw(Rss,Rtt) Rtt.w[0])); Rxx += cmpyiw(Rss,Rtt\*) Rxx += ((Rss.w[1] \* Rtt.w[0]) - (Rss.w[0] \*Rtt.w[1])); Rxx += cmpyrw(Rss,Rtt) Rxx += ((Rss.w[0] \* Rtt.w[0]) - (Rss.w[1] \*Rtt.w[1])); Rxx += cmpyrw(Rss,Rtt\*) Rxx += ((Rss.w[0] \* Rtt.w[0]) + (Rss.w[1] \*Rtt.w[1])); #### Class: XTYPE (slots 3) #### **Notes** - This instruction can only execute on a core with the Hexagon audio extensions - A packet with this instruction cannot have a slot 2 multiply instruction. - If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## Intrinsics | <pre>Rd = cmpyiw(Rss,<br/>Rtt):&lt;&lt;1:rnd:sat</pre> | Word32 Q6_R_cmpyiw_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) | |---------------------------------------------------------|---------------------------------------------------------------------| | <pre>Rd = cmpyiw(Rss,Rtt):&lt;&lt;1:sat</pre> | Word32 Q6_R_cmpyiw_PP_s1_sat(Word64 Rss, Word64 Rtt) | | <pre>Rd = cmpyiw(Rss,<br/>Rtt*):&lt;&lt;1:rnd:sat</pre> | Word32<br>Q6_R_cmpyiw_PP_conj_s1_rnd_sat(Word64 Rss,<br>Word64 Rtt) | | <pre>Rd = cmpyiw(Rss,<br/>Rtt*):&lt;&lt;1:sat</pre> | Word32 Q6_R_cmpyiw_PP_conj_s1_sat(Word64 Rss, Word64 Rtt) | | <pre>Rd = cmpyrw(Rss,<br/>Rtt):&lt;&lt;1:rnd:sat</pre> | Word32 Q6_R_cmpyrw_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) | | <pre>Rd = cmpyrw(Rss,<br/>Rtt):&lt;&lt;1:sat</pre> | Word32 Q6_R_cmpyrw_PP_s1_sat(Word64 Rss, Word64 Rtt) | | <pre>Rd = cmpyrw(Rss,<br/>Rtt*):&lt;&lt;1:rnd:sat</pre> | Word32 Q6_R_cmpyrw_PP_conj_s1_rnd_sat(Word64 Rss, Word64 Rtt) | | <pre>Rd = cmpyrw(Rss,<br/>Rtt*):&lt;&lt;1:sat</pre> | Word32 Q6_R_cmpyrw_PP_conj_s1_sat(Word64 Rss, Word64 Rtt) | | <pre>Rdd = cmpyiw(Rss,Rtt)</pre> | Word64 Q6_P_cmpyiw_PP(Word64 Rss, Word64 Rtt) | | <pre>Rdd = cmpyiw(Rss,Rtt*)</pre> | Word64 Q6_P_cmpyiw_PP_conj(Word64 Rss, Word64 Rtt) | | <pre>Rdd = cmpyrw(Rss,Rtt)</pre> | Word64 Q6_P_cmpyrw_PP(Word64 Rss, Word64 Rtt) | | <pre>Rdd = cmpyrw(Rss,Rtt*)</pre> | Word64 Q6_P_cmpyrw_PP_conj(Word64 Rss, Word64 Rtt) | | <pre>Rxx += cmpyiw(Rss,Rtt)</pre> | Word64 Q6_P_cmpyiwacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=cmpyiw(Rss,Rtt*)</pre> | Word64 Q6_P_cmpyiwacc_PP_conj(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=cmpyrw(Rss,Rtt)</pre> | Word64 Q6_P_cmpyrwacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=cmpyrw(Rss,Rtt*)</pre> | Word64 Q6_P_cmpyrwacc_PP_conj(Word64 Rxx, Word64 Rss, Word64 Rtt) | | | | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|---------------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | N | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=cmpyiw(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=cmpyrw(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=cmpyrw(Rss,Rtt*) | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=cmpyiw(Rss,Rtt*) | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=cmpyiw(Rss,Rtt*):<<1:s | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-------------------------------------| | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=cmpyiw(Rss,Rtt):<<1:s | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=cmpyrw(Rss,Rtt):<<1:s at | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=cmpyrw(Rss,Rtt*):<<1: sat | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=cmpyiw(Rss,Rtt*):<<1:r<br>nd:sat | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=cmpyiw(Rss,Rtt):<<1:rn<br>d:sat | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=cmpyrw(Rss,Rtt):<<1:r<br>nd:sat | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=cmpyrw(Rss,Rtt*):<<1:r<br>nd:sat | | | ICL | ASS | 3 | F | Reg | Тур | е | Ν | 1ajO | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | х | x | х | x | x | Rxx+=cmpyiw(Rss,Rtt*) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | x | x | х | x | x | Rxx+=cmpyiw(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | x | x | x | X | X | Rxx+=cmpyrw(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | x | x | х | x | x | Rxx+=cmpyrw(Rss,Rtt*) | #### Field name #### Description ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x ## Vector complex multiply real or imaginary The inputs Rss and Rtt are a vector of two complex values. Each complex value is composed of a 16-bit imaginary portion in the upper halfword and a 16-bit real portion in the lower halfword. Generate two complex results, either the real result or the imaginary result. These results are optionally shifted left by 0 to 1 bits, and optionally accumulated with the destination register. #### Rxx+=vcmpyi(Rss,Rtt):sat Syntax Behavior #### **Syntax Behavior** Rxx+=vcmpyr(Rss,Rtt):sat $Rxx.w[0]=sat_{32}(Rxx.w[0] + (Rss.h[0] *$ Rtt.h[0]) - (Rss.h[1] \* Rtt.h[1])<<0);</pre> $Rxx.w[1] = sat_{32}(Rxx.w[1] + (Rss.h[2] *$ Rtt.h[2]) - (Rss.h[3] \* Rtt.h[3])<<0);</pre> Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | Rdd=vcmpyi(Rss,Rtt):<<1:sat | Word64 Q6_P_vcmpyi_PP_s1_sat(Word64 Rss, Word64 Rtt) | |-------------------------------------|------------------------------------------------------------------| | Rdd=vcmpyi(Rss,Rtt):sat | Word64 Q6_P_vcmpyi_PP_sat(Word64 Rss, Word64 Rtt) | | Rdd=vcmpyr(Rss,Rtt):<<1:sat | Word64 Q6_P_vcmpyr_PP_s1_sat(Word64 Rss, Word64 Rtt) | | Rdd=vcmpyr(Rss,Rtt):sat | Word64 Q6_P_vcmpyr_PP_sat(Word64 Rss, Word64 Rtt) | | <pre>Rxx+=vcmpyi(Rss,Rtt):sat</pre> | Word64 Q6_P_vcmpyiacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=vcmpyr(Rss,Rtt):sat</pre> | Word64 Q6_P_vcmpyracc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | #### **Encoding** Field name | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-------------------------------------------------| | | ICL | ASS | ; | | Reg | Тур | е | N | ЛајС | р | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vcmpyr(Rss,Rtt)[:< <n] :sat<="" td=""></n]> | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vcmpyi(Rss,Rtt)[:< <n] :sat<="" td=""></n]> | | | ICL | ASS | 3 | | Reg | Тур | е | N | ЛајС | р | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | х | х | х | Rxx+=vcmpyr(Rss,Rtt):sat | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | х | х | х | Rxx+=vcmpyi(Rss,Rtt):sat | | | ICL/ | ASS | 3 | ı | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | |---|------|-----|---|---|-----|-----|---|---|------|----|---|---|----|---|---|----|-----|---|---|---|----|---|---|---|------|----|---|---|----|---|---|-------------------------------------------------| | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vcmpyr(Rss,Rtt)[:< <n] :sat<="" td=""></n]> | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vcmpyi(Rss,Rtt)[:< <n] :sat<="" td=""></n]> | | | ICL/ | ASS | 3 | F | Reg | Тур | е | N | 1ajC | )p | | | s5 | • | | Pa | rse | | | | t5 | | • | N | 1inC | )p | | | х5 | • | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | х | x | х | х | x | Rxx+=vcmpyr(Rss,Rtt):sat | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | s | s | ø | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rxx+=vcmpyi(Rss,Rtt):sat | Description | ICLASS | Instruction class | |---------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | Field name | | Description | |------------|----------------------------|-------------| | s5 | Field to encode register s | | | t5 | Field to encode register t | | | x5 | Field to encode register x | | ## Vector complex conjugate Perform a vector complex conjugate of both complex values in vector Rss. This is done by negating the imaginary halfwords, and placing the result in destination Rdd. | Syntax | Behavior | |--------|----------| |--------|----------| | Rdd=vconj(Rss):sat | Rdd.h[1]=sat <sub>16</sub> (-Rss.h[1]); | |--------------------|-----------------------------------------| | | Rdd.h[0]=Rss.h[0]; | | | $Rdd.h[3]=sat_{16}(-Rss.h[3]);$ | | | Rdd.h[2]=Rss.h[2]; | Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vconj(Rss):sat Word64 Q6_P_vconj_P_sat(Word64 Rss) ``` ## **Encoding** MinOp RegType | 3 | 1 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|---|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|--------------------| | Ī | I | CLA | ASS | ; | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | - | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | | - | - | | - | - | 1 | 1 | 1 | d | d | d | d | d | Rdd=vconj(Rss):sat | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MaiOp | Major opcode | | Minor opcode Register type ### Vector complex rotate Take the least significant bits of Rt, and use these bits to rotate each of the two complex values in the source vector a multiple of 90 degrees. Bits 0 and 1 control the rotation factor for word 0, and bits 2 and 3 control the rotation factor for word 1. If the rotation control bits are 0, the rotation is 0: the real and imaginary halves of the source appear unchanged and unmoved in the destination. If the rotation control bits are 1, the rotation is -pi/2: the real half of the destination gets the imaginary half of the source, and the imaginary half of the destination gets the negative real half of the source. If the rotation control bits are 2, the rotation is pi/2: the real half of the destination gets the negative imaginary half of the source, and the imaginary half of the destination gets the real half of the source. If the rotation control bits are 3, the rotation is pi: the real half of the destination gets the negative real half of the source, and the imaginary half of the destination gets the negative imaginary half of the source. Syntax Behavior ``` Rdd=vcrotate(Rss,Rt) tmp = Rt[1:0]; if (tmp == 0) { Rdd.h[0]=Rss.h[0]; Rdd.h[1]=Rss.h[1]; } else if (tmp == 1) { Rdd.h[0]=Rss.h[1]; Rdd.h[1] = sat_{16} (-Rss.h[0]); } else if (tmp == 2) { Rdd.h[0]=sat_{16}(-Rss.h[1]); Rdd.h[1]=Rss.h[0]; } else { Rdd.h[0]=sat_{16}(-Rss.h[0]); Rdd.h[1] = sat_{16}(-Rss.h[1]); tmp = Rt[3:2]; if (tmp == 0) { Rdd.h[2]=Rss.h[2]; Rdd.h[3]=Rss.h[3]; } else if (tmp == 1) { Rdd.h[2]=Rss.h[3]; Rdd.h[3] = sat_{16}(-Rss.h[2]); } else if (tmp == 2) { Rdd.h[2] = sat_{16}(-Rss.h[3]); Rdd.h[3]=Rss.h[2]; } else { Rdd.h[2] = sat_{16}(-Rss.h[2]); Rdd.h[3]=sat_{16}(-Rss.h[3]); ``` ## Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** Rdd=vcrotate(Rss,Rt) Word64 Q6\_P\_vcrotate\_PR(Word64 Rss, Word32 Rt) ## **Encoding** | 3 | 1 3 | 0 2 | 9 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|----------------------| | | IC | LAS | SS | | | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | 1in | | | | d5 | | | | | - | 1 | 0 | ) | 0 | 0 | 0 | 1 | 1 | 1 | 1 | - | s | s | w | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rdd=vcrotate(Rss,Rt) | | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | #### Description ## Vector reduce complex multiply by scalar Multiply a complex number by a scalar. Rss contains two complex numbers. The real portions are each multiplied by two scalars contained in register Rt, scaled, summed, optionally accumulated, saturated, and stored in the lower word of Rdd. A similar operation is done on the two imaginary portions of Rss. ## Syntax Behavior ``` Rdd = vrcmpys(Rss,Rt): <<</pre> if ("Rt & 1") { 1:sat Assembler mapped to: "Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:hi"; } else { Assembler mapped to: "Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:lo"; Rdd = vrcmpys(Rss,Rtt): <<</pre> Rdd.w[1] = sat_{32}((Rss.h[1] * 1:sat:raw:hi Rtt.w[1].h[0]) << 1 + (Rss.h[3] * Rtt.w[1].h[1]) << 1); Rdd.w[0] = sat_{32}((Rss.h[0] * Rtt.w[1].h[0]) << 1 + (Rss.h[2] * Rtt.w[1].h[1]) << 1); ``` #### **Syntax Behavior** $Rdd.w[1] = sat_{32}((Rss.h[1]) *$ Rdd = vrcmpys(Rss,Rtt): <<</pre> 1:sat:raw:lo Rtt.w[0].h[0]) << 1 + (Rss.h[3] \*Rtt.w[0].h[1]) << 1); $Rdd.w[0] = sat_{32}((Rss.h[0] *$ Rtt.w[0].h[0]) << 1 + (Rss.h[2] \*Rtt.w[0].h[1]) << 1);Rxx += vrcmpys(Rss,Rt): <<</pre> if ("Rt & 1") { 1:sat Assembler mapped to: "Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:hi"; Assembler mapped to: "Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:lo"; $Rxx.w[1] = sat_{32}(Rxx.w[1] + (Rss.h[1] *$ Rxx += vrcmpys(Rss,Rtt): <<</pre> 1:sat:raw:hi Rtt.w[1].h[0]) << 1 + (Rss.h[3] \*Rtt.w[1].h[1]) << 1); $Rxx.w[0] = sat_{32}(Rxx.w[0] + (Rss.h[0] *$ Rtt.w[1].h[0]) << 1 + (Rss.h[2] \*Rtt.w[1].h[1])<<1);</pre> $Rxx += vrcmpys(Rss,Rtt): << Rxx.w[1] = sat_{32}(Rxx.w[1] + (Rss.h[1] *$ 1:sat:raw:lo Rtt.w[0].h[0]) << 1 + (Rss.h[3] \*Rtt.w[0].h[1])<<1); $Rxx.w[0]=sat_32(Rxx.w[0] + (Rss.h[0] *$ Rtt.w[0].h[0]) << 1 + (Rss.h[2] \*Rtt.w[0].h[1]) << 1); #### Class: XTYPE (slots 2,3) #### **Notes** ■ If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd=vrcmpys(Rss,Rt):<<1:sat Word64 Q6_P_vrcmpys_PR_s1_sat(Word64 Rss, Word32 Rt) Rxx+=vrcmpys(Rss,Rt):<<1:sat Word64 Q6_P_vrcmpysacc_PR_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rss, Word32 Rt) ``` ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vrcmpys(Rss,Rtt):<<1 :sat:raw:hi | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vrcmpys(Rss,Rtt):<<1 :sat:raw:lo | | | ICL | ASS | 3 | | Reg | Тур | е | N | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | х5 | • | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rxx+=vrcmpys(Rss,Rtt):<<<br>1:sat:raw:hi | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rxx+=vrcmpys(Rss,Rtt):<<<br>1:sat:raw:lo | Field name Description ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x ## Vector reduce complex multiply by scalar with round and pack Multiply a complex number by scalar. Rss contains two complex numbers. The real portions are each multiplied by two scalars contained in register Rt, scaled, summed, rounded, and saturated. The upper 16 bits of this result are packed in the lower halfword of Rd. A similar operation is done on the two imaginary portions of Rss. ## Syntax Behavior ``` Rd = vrcmpys(Rss,Rt): <<</pre> if ("Rt & 1") { 1:rnd:sat Assembler mapped to: "Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:hi"; } else { Assembler mapped to: "Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:lo"; Rd = vrcmpys(Rss,Rtt): <<</pre> Rd.h[1] = sat_{32}((Rss.h[1] * Rtt.w[1].h[0]) << 1 1:rnd:sat:raw:hi + (Rss.h[3] * Rtt.w[1].h[1])<<1 + 0x8000).h[1]; Rd.h[0] = sat_{32}((Rss.h[0] * Rtt.w[1].h[0]) << 1 + (Rss.h[2] * Rtt.w[1].h[1])<<1 + 0x8000).h[1]; ``` #### Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rd=vrcmpys(Rss,Rt):<<1:rnd:s Word32 Q6_R_vrcmpys_PR_s1_rnd_sat(Word64 at Rss, Word32 Rt) ``` #### **Encoding** Maj0p | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|--------------------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | N | ЛаjC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | - | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=vrcmpys(Rss,Rtt):<<1:r<br>nd:sat:raw:hi | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | - | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=vrcmpys(Rss,Rtt):<<1:r<br>nd:sat:raw:lo | | <b>-</b> | | <b>5</b> | |------------|-------------------|-------------| | Field name | | Description | | ICLASS | Instruction class | | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | Major opcode ## Vector reduce complex rotate This instruction is useful for CDMA despreading. An unsigned 2-bit immediate specifies a byte to use in Rt. Each of four 2-bit fields in the specified byte selects a rotation amount for one of the four complex numbers in Rss. Accumulate the real and imaginary products and store as a 32-bit complex number in Rd. Optionally, the destination register can also be accumulated. #### **Syntax** #### **Behavior** ``` Rdd=vrcrotate(Rss,Rt,#u2) sumr = 0; sumi = 0; control = Rt.ub[#u]; for (i = 0; i < 8; i += 2) { tmpr = Rss.b[i]; tmpi = Rss.b[i+1]; switch (control & 3) { case 0: sumr += tmpr; sumi += tmpi; break; case 1: sumr += tmpi; sumi -= tmpr; break; case 2: sumr -= tmpi; sumi += tmpr; break; case 3: sumr -= tmpr; sumi -= tmpi; break; control = control >> 2; Rdd.w[0]=sumr; Rdd.w[1]=sumi; Rxx+=vrcrotate(Rss,Rt,#u2) sumr = 0; sumi = 0; control = Rt.ub[#u]; for (i = 0; i < 8; i += 2) { tmpr = Rss.b[i]; tmpi = Rss.b[i+1]; switch (control & 3) { case 0: sumr += tmpr; sumi += tmpi; break; case 1: sumr += tmpi; sumi -= tmpr; break; case 2: sumr -= tmpi; sumi += tmpr; break; case 3: sumr -= tmpr; sumi -= tmpi; break; control = control >> 2; Rxx.w[0]=Rxx.w[0] + sumr; Rxx.w[1] = Rxx.w[1] + sumi; ``` ## Class: XTYPE (slots 2,3) ## Intrinsics | Rdd=vrcrotate(Rss,Rt,#u2) | Word64 Q6_P_vrcrotate_PRI(Word64 Rss, | |---------------------------------------|------------------------------------------| | | Word32 Rt, Word32 Iu2) | | | | | <pre>Rxx+=vrcrotate(Rss,Rt,#u2)</pre> | Word64 Q6_P_vrcrotateacc_PRI(Word64 Rxx, | | | Word64 Rss, Word32 Rt, Word32 Iu2) | ## **Encoding** | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|---------------------------| | | IC | CLA | ASS | ; | ı | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | - | s | s | s | s | s | Р | Ρ | i | t | t | t | t | t | 1 | 1 | i | d | d | d | d | d | Rdd=vrcrotate(Rss,Rt,#u2) | | | IC | CLA | ASS | ; | ı | Reg | Тур | е | M | laj | | | • | s5 | | • | Pa | rse | | | | t5 | | | | | | | • | х5 | • | | | | 1 | | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | t | t | t | t | t | - | - | i | x | x | x | x | x | Rxx+=vrcrotate(Rss,Rt,#u2 | | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | #### Description ## 11.10.4 XTYPE FP The XTYPE FP instruction subclass includes instructions for floating point math. ## Floating point addition Add two floating point values. | Syntax | Behavior | |--------------------|--------------| | Rd=sfadd(Rs,Rt) | Rd=Rs+Rt; | | Rdd=dfadd(Rss,Rtt) | Rdd=Rss+Rtt; | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rd=sfadd(Rs,Rt) | Word32 | Q6_R_sfadd_RR(Word32 Rs, Word32 Rt) | |--------------------|----------------|-------------------------------------| | Rdd=dfadd(Rss,Rtt) | Word64<br>Rtt) | Q6_P_dfadd_PP(Word64 Rss, Word64 | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | N | laj0 | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=dfadd(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | P | P | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=sfadd(Rs,Rt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | ## Classify floating-point value Classify floating point values. Classes are Normal, Subnormal, Zero, NaN, or Infinity. If the number is one of the specified classes, return true. #### Syntax Behavior ``` Pd=dfclass(Rss, #u5) Pd = 0; class = fpclassify(Rss); if (#u.0 && (class == FP ZERO)) Pd = 0xff; if (#u.1 && (class == FP NORMAL)) Pd = 0xff; if (#u.2 && (class == FP SUBNORMAL)) Pd = 0xff; if (#u.3 && (class == FP INFINITE)) Pd = 0xff; if (\#u.4 && (class == FP NAN)) Pd = 0xff; cancel flags(); Pd=sfclass(Rs, #u5) Pd = 0; class = fpclassify(Rs); if (#u.0 && (class == FP ZERO)) Pd = 0xff; if (#u.1 && (class == FP NORMAL)) Pd = 0xff; if (#u.2 && (class == FP SUBNORMAL)) Pd = 0xff; if (#u.3 && (class == FP INFINITE)) Pd = 0xff; if (#u.4 && (class == FP NAN)) Pd = 0xff; cancel_flags(); ``` Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=dfclass(Rss,#u5) | Byte Q6_p_dfclass_PI(Word64 Rss, Word32 Iu5) | |---------------------|--------------------------------------------------------| | Pd=sfclass(Rs,#u5) | <pre>Byte Q6_p_sfclass_RI(Word32 Rs, Word32 Iu5)</pre> | ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pai | rse | | | | | | | | | | | | | d | 2 | | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | s | s | s | ø | s | Р | Р | 0 | i | i | i | i | i | - | - | - | - | | - | d | d | Pd=sfclass(Rs,#u5) | | | ICL | ASS | 3 | ı | Reg | Тур | е | | | • | | • | s5 | | | Pai | rse | | | | | | • | • | • | • | | | • | d | 2 | | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | 0 | 0 | 0 | i | i | i | i | i | 1 | 0 | - | d | d | Pd=dfclass(Rss,#u5) | #### Field name Description | RegType | Register type | |---------|----------------------------| | MajOp | Major opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | # Compare floating-point value Compare floating point values. p0 returns true if at least one value is a NaN, zero otherwise. | Syntax | Behavior | |----------------------|----------------------------------------| | Pd=dfcmp.eq(Rss,Rtt) | Pd=Rss==Rtt ? 0xff : 0x00; | | Pd=dfcmp.ge(Rss,Rtt) | Pd=Rss>=Rtt ? 0xff : 0x00; | | Pd=dfcmp.gt(Rss,Rtt) | Pd=Rss>Rtt ? 0xff : 0x00; | | Pd=dfcmp.uo(Rss,Rtt) | Pd=isunordered(Rss,Rtt) ? 0xff : 0x00; | | Pd=sfcmp.eq(Rs,Rt) | Pd=Rs==Rt ? 0xff : 0x00; | | Pd=sfcmp.ge(Rs,Rt) | Pd=Rs>=Rt ? 0xff : 0x00; | | Pd=sfcmp.gt(Rs,Rt) | Pd=Rs>Rt ? 0xff : 0x00; | | Pd=sfcmp.uo(Rs,Rt) | Pd=isunordered(Rs,Rt) ? 0xff : 0x00; | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=dfcmp.eq(Rss,Rtt) | Byte Q6_p_dfcmp_eq_PP(Word64 Rss, Word64 Rtt) | |----------------------|-----------------------------------------------| | Pd=dfcmp.ge(Rss,Rtt) | Byte Q6_p_dfcmp_ge_PP(Word64 Rss, Word64 Rtt) | | Pd=dfcmp.gt(Rss,Rtt) | Byte Q6_p_dfcmp_gt_PP(Word64 Rss, Word64 Rtt) | | Pd=dfcmp.uo(Rss,Rtt) | Byte Q6_p_dfcmp_uo_PP(Word64 Rss, Word64 Rtt) | | Pd=sfcmp.eq(Rs,Rt) | Byte Q6_p_sfcmp_eq_RR(Word32 Rs, Word32 Rt) | | Pd=sfcmp.ge(Rs,Rt) | Byte Q6_p_sfcmp_ge_RR(Word32 Rs, Word32 Rt) | | Pd=sfcmp.gt(Rs,Rt) | Byte Q6_p_sfcmp_gt_RR(Word32 Rs, Word32 Rt) | | Pd=sfcmp.uo(Rs,Rt) | Byte Q6_p_sfcmp_uo_RR(Word32 Rs, Word32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|---|---|---|----------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | М | aj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | | d | 2 | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | 0 | - | | - | d | d | Pd=sfcmp.ge(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | 1 | - | | - | d | d | Pd=sfcmp.uo(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 1 | 1 | - | | - | d | d | Pd=sfcmp.eq(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | - | | - | d | d | Pd=sfcmp.gt(Rs,Rt) | | | ICL | ASS | 3 | F | Reg | Тур | е | | • | | | | s5 | | • | Pa | rse | | | | t5 | | | N | 1inC | đ | | | | d | 2 | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | 0 | - | | - | d | d | Pd=dfcmp.eq(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | - | | - | d | d | Pd=dfcmp.gt(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | - | | | d | d | Pd=dfcmp.ge(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | - | - | 1 | d | d | Pd=dfcmp.uo(Rss,Rtt) | | Field name | | Description | |------------|----------------------------|-------------| | RegType | Register type | - | | MinOp | Minor opcode | | | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d2 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | Maj | Major opcode | | | Min | Minor opcode | | ## Convert floating-point value to other format Convert floating point values. If rounding is required, it occurs according to the rounding mode. #### Syntax Behavior | Rd=convert_df2sf(Rss) | <pre>Rd = conv_df_to_sf(Rss);</pre> | |-----------------------|-------------------------------------| | Rdd=convert_sf2df(Rs) | <pre>Rdd = conv_sf_to_df(Rs);</pre> | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rd=convert_df2sf(Rss) | Word32 Q6_R_convert_df2sf_P(Word64 Rss) | |-----------------------|-----------------------------------------| | Rdd=convert_sf2df(Rs) | Word64 Q6_P_convert_sf2df_R(Word32 Rs) | ## **Encoding** | 3 | 1 30 | 0 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-----------------------| | Γ | ICI | LAS | S | | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | | - | s | s | s | s | s | Р | Р | - | | - | | - | - | 0 | 0 | 0 | d | d | d | d | d | Rdd=convert_sf2df(Rs) | | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_df2sf(Rss) | | Field name | Description | |------------|-------------| | | | ICLASS Instruction class Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type # Convert integer to floating-point value Convert floating point values. If rounding is required, it occur according to the rounding mode unless the :chop option is specified. | Syntax | Behavior | |------------------------|-------------------------------------------| | Rd=convert_d2sf(Rss) | <pre>Rd = conv_8s_to_sf(Rss.s64);</pre> | | Rd=convert_ud2sf(Rss) | <pre>Rd = conv_8u_to_sf(Rss.u64);</pre> | | Rd=convert_uw2sf(Rs) | <pre>Rd = conv_4u_to_sf(Rs.uw[0]);</pre> | | Rd=convert_w2sf(Rs) | <pre>Rd = conv_4s_to_sf(Rs.s32);</pre> | | Rdd=convert_d2df(Rss) | <pre>Rdd = conv_8s_to_df(Rss.s64);</pre> | | Rdd=convert_ud2df(Rss) | <pre>Rdd = conv_8u_to_df(Rss.u64);</pre> | | Rdd=convert_uw2df(Rs) | <pre>Rdd = conv_4u_to_df(Rs.uw[0]);</pre> | | Rdd=convert_w2df(Rs) | <pre>Rdd = conv_4s_to_df(Rs.s32);</pre> | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rd=convert_d2sf(Rss) | Word32 | Q6_R_convert_d2sf_P(Word64 Rss) | |------------------------|--------|----------------------------------| | Rd=convert_ud2sf(Rss) | Word32 | Q6_R_convert_ud2sf_P(Word64 Rss) | | Rd=convert_uw2sf(Rs) | Word32 | Q6_R_convert_uw2sf_R(Word32 Rs) | | Rd=convert_w2sf(Rs) | Word32 | Q6_R_convert_w2sf_R(Word32 Rs) | | Rdd=convert_d2df(Rss) | Word64 | Q6_P_convert_d2df_P(Word64 Rss) | | Rdd=convert_ud2df(Rss) | Word64 | Q6_P_convert_ud2df_P(Word64 Rss) | | Rdd=convert_uw2df(Rs) | Word64 | Q6_P_convert_uw2df_R(Word32 Rs) | | Rdd=convert_w2df(Rs) | Word64 | Q6_P_convert_w2df_R(Word32 Rs) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|---|---|------------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | р | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | - | - | - | 0 | 1 | 0 | d | d | d | d | d | Rdd=convert_ud2df(Rss) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | - | | | - | - | 0 | 1 | 1 | d | d | d | d | d | Rdd=convert_d2df(Rss) | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | - | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 1 | d | d | d | d | d | Rdd=convert_uw2df(Rs) | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | | - | s | s | s | s | s | Р | Р | - | - | | | - | - | 0 | 1 | 0 | d | d | d | d | d | Rdd=convert_w2df(Rs) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | | - | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_ud2sf(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_d2sf(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=convert_uw2sf(Rs) | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=convert_w2sf(Rs) | | Field name | Description | |------------|-------------| | | | ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register s MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type ## Convert floating-point value to integer Convert floating point values. If rounding is required, it occurs according to the rounding mode unless the :chop option is specified. If the value is out of range of the destination integer type, raise the INVALID flag and choose the closest integer, including for infinite inputs. For NaN inputs, the INVALID flag is also raised, and the output value is implementation defined. | Syntax | Behavior | |----------------------------------------|------------------------------------------------------------| | Rd=convert_df2uw(Rss) | <pre>Rd = conv_df_to_4u(Rss).uw[0];</pre> | | Rd=convert_df2uw(Rss):chop | <pre>round_to_zero(); Rd = conv_df_to_4u(Rss).uw[0];</pre> | | Rd=convert_df2w(Rss) | <pre>Rd = conv_df_to_4s(Rss).s32;</pre> | | Rd=convert_df2w(Rss):chop | <pre>round_to_zero(); Rd = conv_df_to_4s(Rss).s32;</pre> | | Rd=convert_sf2uw(Rs) | Rd = conv_sf_to_4u(Rs).uw[0]; | | Rd=convert_sf2uw(Rs):chop | <pre>round_to_zero(); Rd = conv_sf_to_4u(Rs).uw[0];</pre> | | Rd=convert_sf2w(Rs) | <pre>Rd = conv_sf_to_4s(Rs).s32;</pre> | | Rd=convert_sf2w(Rs):chop | <pre>round_to_zero(); Rd = conv_sf_to_4s(Rs).s32;</pre> | | Rdd=convert_df2d(Rss) | <pre>Rdd = conv_df_to_8s(Rss).s64;</pre> | | Rdd=convert_df2d(Rss):chop | <pre>round_to_zero(); Rdd = conv_df_to_8s(Rss).s64;</pre> | | Rdd=convert_df2ud(Rss) | <pre>Rdd = conv_df_to_8u(Rss).u64;</pre> | | <pre>Rdd=convert_df2ud(Rss):chop</pre> | <pre>round_to_zero(); Rdd = conv_df_to_8u(Rss).u64;</pre> | | Rdd=convert_sf2d(Rs) | <pre>Rdd = conv_sf_to_8s(Rs).s64;</pre> | | Rdd=convert_sf2d(Rs):chop | <pre>round_to_zero(); Rdd = conv_sf_to_8s(Rs).s64;</pre> | | Rdd=convert_sf2ud(Rs) | <pre>Rdd = conv_sf_to_8u(Rs).u64;</pre> | | Rdd=convert_sf2ud(Rs):chop | <pre>round_to_zero(); Rdd = conv_sf_to_8u(Rs).u64;</pre> | ## Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=convert_df2uw(Rss) | Word32 | Q6_R_convert_df2uw_P(Word64 Rss) | |----------------------------|----------------|--------------------------------------| | Rd=convert_df2uw(Rss):chop | Word32<br>Rss) | Q6_R_convert_df2uw_P_chop(Word64 | | Rd=convert_df2w(Rss) | Word32 | Q6_R_convert_df2w_P(Word64 Rss) | | Rd=convert_df2w(Rss):chop | Word32 | Q6_R_convert_df2w_P_chop(Word64 Rss) | | Rd=convert_sf2uw(Rs) | Word32 | Q6_R_convert_sf2uw_R(Word32 Rs) | Rd=convert sf2uw(Rs):chop Word32 Q6 R convert sf2uw R chop(Word32 Rs) Word32 Q6 R convert sf2w R(Word32 Rs) Rd=convert\_sf2w(Rs) Rd=convert sf2w(Rs):chop Word32 Q6 R convert sf2w R chop(Word32 Rs) Rdd=convert df2d(Rss) Word64 Q6 P convert df2d P(Word64 Rss) Rdd=convert df2d(Rss):chop Word64 Q6\_P\_convert\_df2d\_P\_chop(Word64 Rss) Word64 Q6\_P\_convert\_df2ud\_P(Word64 Rss) Rdd=convert df2ud(Rss) Rdd=convert df2ud(Rss):chop Word64 Q6 P convert df2ud P chop(Word64 Word64 Q6\_P\_convert\_sf2d\_R(Word32 Rs) Rdd=convert\_sf2d(Rs) Rdd=convert sf2d(Rs):chop Word64 Q6 P convert sf2d R chop (Word32 Rs) Rdd=convert sf2ud(Rs) Word64 Q6 P convert sf2ud R(Word32 Rs) Rdd=convert sf2ud(Rs):chop Word64 Q6 P convert sf2ud R chop(Word32 Rs) #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----------|----|----|---|----------|---|------|---|---|---|----|---|----------|---------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ма | jOp | | | ı | s5 | l | 1 | Pa | rse | | <u> </u> | | | | <u> </u> | N | 1inC | р | | 1 | d5 | 1 | <u> </u> | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rdd=convert_df2d(Rss) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | - | • | - | 0 | 0 | 1 | d | d | d | d | d | Rdd=convert_df2ud(Rss) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | - | • | - | 1 | 1 | 0 | d | d | d | d | d | Rdd=convert_df2d(Rss):ch op | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | - | - | - | - | - | 1 | 1 | 1 | d | d | d | d | d | Rdd=convert_df2ud(Rss):c<br>hop | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | - | s | s | s | s | s | Р | Р | - | - | - | - | • | - | 0 | 1 | 1 | d | d | d | d | d | Rdd=convert_sf2ud(Rs) | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | - | s | s | s | s | s | Р | Р | - | - | - | - | • | - | 1 | 0 | 0 | d | d | d | d | d | Rdd=convert_sf2d(Rs) | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | - | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 1 | d | d | d | d | d | Rdd=convert_sf2ud(Rs):ch op | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | - | s | s | s | s | s | Р | Р | - | - | - | - | • | - | 1 | 1 | 0 | d | d | d | d | d | Rdd=convert_sf2d(Rs):cho p | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | • | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_df2uw(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_df2w(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_df2uw(Rss):ch op | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_df2w(Rss):cho | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=convert_sf2uw(Rs) | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | • | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_sf2uw(Rs):cho p | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | • | - | 0 | 0 | 0 | d | d | d | d | d | Rd=convert_sf2w(Rs) | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | | - | 0 | 0 | 1 | d | d | d | d | d | Rd=convert_sf2w(Rs):chop | #### Field name Description ICLASS Instruction class Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s MajOp Major opcode MinOp Minor opcode RegType Register type Syntax ## Floating point extreme value assistance For divide and square root routines, certain values are problematic for the default routine. These instructions appropriately fix up the numerator (fixupn), denominator (fixupn), or radicand (fixupn) for proper calculations when combined with the divide or square root approximation instructions. Dobovior | Symax | Deliavioi | |--------------------|------------------------------| | Rd=sffixupd(Rs,Rt) | (Rs,Rt,Rd,adjust)=recip_comm | | Rd=sffixupd(Rs,Rt) | <pre>(Rs,Rt,Rd,adjust)=recip_common(Rs,Rt); Rd = Rt;</pre> | |--------------------|--------------------------------------------------------------| | Rd=sffixupn(Rs,Rt) | <pre>(Rs,Rt,Rd,adjust) = recip_common(Rs,Rt); Rd = Rs;</pre> | | Rd=sffixupr(Rs) | <pre>(Rs,Rd,adjust)=invsqrt_common(Rs); Rd = Rs;</pre> | #### Class: XTYPE (slots 2,3) #### **Intrinsics** | Rd=sffixupd(Rs,Rt) | Word32 Q6_R_sffixupd_RR(Word32 Rs, Word32 Rt) | |--------------------|-----------------------------------------------| | Rd=sffixupn(Rs,Rt) | Word32 Q6_R_sffixupn_RR(Word32 Rs, Word32 Rt) | | Rd=sffixupr(Rs) | Word32 Q6_R_sffixupr_R(Word32 Rs) | ### **Encoding** ICLASS MajOp MinOp | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | | ICL | ASS | 3 | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=sffixupr(Rs) | | | ICL | ASS | 3 | | Reg | Тур | Э | Ν | /ajC | )p | | | s5 | • | | Pa | rse | | | • | t5 | | • | N | /linC | )p | | • | d5 | • | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=sffixupn(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=sffixupd(Rs,Rt) | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=sffixupn(Rs, | |---|---|-----|-----|----|----|---|---|---------|---|---|---|---|---|---|---|---|-----|----|---|---|---|---|---|---|---|---|---|---|---|-----------------| | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=sffixupd(Rs, | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fie | eld | na | me | ! | | Descrip | | | | | | | | | pti | on | | | | | | | | | | | | | | RegType | Register type | |---------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | Instruction class Major opcode Minor opcode # Floating point fused multiply-add Multiply two values, and add to (or subtract from) the accumulator. Full intermediate precision is kept. | Behavior | |----------| | | | <pre>Rx+=sfmpy(Rs,Rt)</pre> | <pre>Rx=fmaf(Rs,Rt,Rx);</pre> | |----------------------------------|-------------------------------------------------------------------------------------------------| | Rx-=sfmpy(Rs,Rt) | <pre>Rx=fmaf(-Rs,Rt,Rx);</pre> | | <pre>Rxx+=dfmpyhh(Rss,Rtt)</pre> | <pre>Rxx = Rss*Rtt with partial product Rxx;</pre> | | <pre>Rxx+=dfmpylh(Rss,Rtt)</pre> | <pre>Rxx += (Rss.uw[0] * (0x00100000 zxt<sub>20-</sub> &gt;64 (Rtt.uw[1]))) &lt;&lt; 1;</pre> | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Rx+=sfmpy(Rs,Rt) | Word32 Q6_R_sfmpyacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) | |----------------------------------|---------------------------------------------------------------| | Rx-=sfmpy(Rs,Rt) | Word32 Q6_R_sfmpynac_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | Rxx+=dfmpyhh(Rss,Rtt) | Word64 Q6_P_dfmpyhhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=dfmpylh(Rss,Rtt)</pre> | Word64 Q6_P_dfmpylhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | ## **Encoding** ICLASS MajOp | 3 | 1 : | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-----------------------| | | I | CLA | SS | ; | F | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | х5 | | | | | 1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rxx+=dfmpylh(Rss,Rtt) | | 1 | | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rxx+=dfmpyhh(Rss,Rtt) | | 1 | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rx+=sfmpy(Rs,Rt) | | 1 | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rx-=sfmpy(Rs,Rt) | | Field name | Description | |------------|-------------| | | | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | Instruction class Major opcode ## Floating point fused multiply-add with scaling Multiply two values, and add to (or subtract from) the accumulator. Full intermediate precision is kept. Additionally, scale the output. This instruction has special handling of corner cases. If a multiplicand source is zero and a NaN is not produced, the accumulator is left unchanged; this means the sign of a zero accumulator does not change if the product is a true zero. The scaling factor is the predicate taken as a two's compliment number for single precision. The scaling factor is twice the predicate taken as a two's compliment number for double precision. The implementation can change denormal accumulator values to zero for positive scale factors. #### Syntax Behavior Class: XTYPE (slots 2,3) #### **Intrinsics** ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|----|---|---|----|---|---|----------------------------| | | ICL | ASS | ; | F | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | | u | 12 | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 1 | u | u | x | x | x | x | x | Rx+=sfmpy(Rs,Rt,Pu):scal e | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MajOp | Major opcode | |---------|----------------------------| | MinOp | Minor opcode | | RegType | Register type | | Parse | Packet/loop parse bits | | s5 | Field to encode register s | | t5 | Field to encode register t | | u2 | Field to encode register u | | x5 | Field to encode register x | ## Floating point reciprocal square root approximation Provides an approximation of the reciprocal square root of the radicand (Rs), when combined with the appropriate fixup instruction. Certain values (such as infinities or zeros) in the numerator or denominator can yield values that are not reciprocal approximations, but yield the correct answer when combined with fixup instructions and the appropriate routines. For compatibility, exact results of these instructions cannot be relied on. The precision of the approximation for this architecture and later is at least 6.6 bits. Syntax Behavior ``` Rd, Pe = sfinvsqrta(Rs) if ((Rs,Rd,adjust)=invsqrt_common(Rs)) { Pe = adjust; idx = (Rs >> 17) & 0x7f; mant = (invsqrt_lut[idx] << 15); exp = 127 - ((exponent(Rs) - 127) >> 1) - 1; Rd = -1**Rs.31 * 1.MANT * 2**(exp-BIAS); } ``` Class: XTYPE (slots 2,3) #### **Notes** - This instruction provides a certain amount of accuracy. In future versions the accuracy might increase. For future compatibility, dependence on exact values must be avoided. - The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. #### **Encoding** Parse d5 | : | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------------| | Ī | I | CLA | ASS | ; | ı | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | | е | 2 | | | d5 | | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | - | - | | | - | - | 0 | е | е | d | d | d | d | d | Rd,Pe=sfinvsqrta(Rs) | | Field | | Description | |------------|-------------------|-------------| | Field name | | Description | | ICLASS | Instruction class | | Packet/loop parse bits Field to encode register d e2 Field to encode register e s5 Field to encode register s MajOp Major opcode MinOp Minor opcode RegType Register type ## Floating point fused multiply-add for library routines Multiply two values, and add to (or subtract from) the accumulator. Full intermediate precision is kept. This instruction has special handling of corner cases. Addition of infinities with opposite signs, or subtraction of infinities with like signs, is defined as (positive) zero. Rounding is always nearest-even, except that overflows to infinity round to maximal finite values. If a multiplicand source is zero and a NaN is not produced, the accumulator is left unchanged; this means the sign of a zero accumulator does not change if the product is a true zero. Flags and exceptions are not generated. ``` Syntax Behavior Rx+=sfmpy(Rs,Rt):lib round to nearest(); infminusinf = ((isinf(Rx)) && (isinf(Rs*Rt)) && (Rs ^ Rx ^ Rt.31 != 0)); infinp = (isinf(Rx)) || (isinf(Rt)) || (isinf(Rs)); if (isnan(Rx) || isnan(Rs) || isnan(Rt)) Rx = NaN; tmp=fmaf(Rs,Rt,Rx); if (!((Rx == 0.0) \&\& is true zero(Rs*Rt))) Rx = tmp; cancel flags(); if (isinf(Rx) \&\& !infinp) Rx = Rx - 1; if (infminusinf) Rx = 0; Rx-=sfmpy(Rs,Rt):lib round to nearest(); infminusinf = ((isinf(Rx)) && (isinf(Rs*Rt)) && (Rs ^ Rx ^{\text{Rt.31}} == 0)); infinp = (isinf(Rx)) || (isinf(Rt)) || (isinf(Rs)); if (isnan(Rx) || isnan(Rs) || isnan(Rt)) Rx = NaN; tmp=fmaf(-Rs,Rt,Rx); if (!((Rx == 0.0) \&\& is true zero(Rs*Rt))) Rx = tmp; cancel flags(); if (isinf(Rx) \&\& !infinp) Rx = Rx - 1; if (infminusinf) Rx = 0; ``` #### Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rx+=sfmpy(Rs,Rt):lib Word32 Q6_R_sfmpyacc_RR_lib(Word32 Rx, Word32 Rs, Word32 Rt) Rx-=sfmpy(Rs,Rt):lib Word32 Q6_R_sfmpynac_RR_lib(Word32 Rx, Word32 Rs, Word32 Rt) ``` ### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|---|---|----------------------| | | ICL | ASS | 3 | F | Reg | Тур | Э | N | /lajO | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | р | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | S | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rx+=sfmpy(Rs,Rt):lib | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | х | х | х | x | х | Rx-=sfmpy(Rs,Rt):lib | | Field name | Description | |------------|-------------| |------------|-------------| ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x # Create floating-point constant Using ten bits of immediate, form a floating-point constant. | Syntax | <b>Behavior</b> | |--------|-----------------| |--------|-----------------| | Rd=sfmake(#u10):neg | Rd = (127 - 6) << 23;<br>Rd += (#u << 17);<br>Rd = (1 << 31); | |----------------------|----------------------------------------------------------------------------| | Rd=sfmake(#u10):pos | Rd = $(127 - 6) \ll 23$ ;<br>Rd += $\#u \ll 17$ ; | | Rdd=dfmake(#u10):neg | Rdd = (1023ULL - 6) << 52;<br>Rdd += (#u) << 46;<br>Rdd = ((1ULL) << 63); | | Rdd=dfmake(#u10):pos | Rdd = (1023ULL - 6) << 52;<br>Rdd += (#u) << 46; | ## Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=sfmake(#u10):neg | Word32 | Q6_R_sfmake_I_neg(Word32 | Iu10) | |----------------------|--------|--------------------------|-------| | Rd=sfmake(#u10):pos | Word32 | Q6_R_sfmake_I_pos(Word32 | Iu10) | | Rdd=dfmake(#u10):neg | Word64 | Q6_P_dfmake_I_neg(Word32 | Iu10) | | Rdd=dfmake(#u10):pos | Word64 | Q6_P_dfmake_I_pos(Word32 | Iu10) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|---|---|----------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | | | | Pa | rse | | | | | | | N | 1inO | р | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | i | - | | | | - | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=sfmake(#u10):pos | | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | i | - | | | | - | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=sfmake(#u10):neg | | | ICL. | ASS | 6 | ı | Reg | Тур | е | | | | | | | | | | rse | | • | • | | • | • | | | | | | d5 | | | | | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | i | - | | | | - | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=dfmake(#u10):pos | | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | i | - | - | • | | - | Р | Р | i | i | i | i | i | i | i | i | i | d | d | d | d | d | Rdd=dfmake(#u10):neg | ### Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | # Floating point maximum Maximum of two floating point values. If one value is a NaN, the other is chosen. | Syntax | Behavior | |--------|----------| |--------|----------| | Rd=sfmax(Rs,Rt) | <pre>Rd = fmaxf(Rs,Rt);</pre> | |--------------------|---------------------------------| | Rdd=dfmax(Rss,Rtt) | <pre>Rdd = fmax(Rss,Rtt);</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=sfmax(Rs,Rt) | Word32 | Q6_R_sfmax_RR(Word32 Rs, Word32 Rt) | |--------------------|----------------|-------------------------------------| | Rdd=dfmax(Rss,Rtt) | Word64<br>Rtt) | Q6_P_dfmax_PP(Word64 Rss, Word64 | ## **Encoding** ICLASS Maj0p MinOp | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | Λ | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=dfmax(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=sfmax(Rs,Rt) | ### Field name Description | Register type | |----------------------------| | Packet/loop parse bits | | Field to encode register d | | Field to encode register s | | Field to encode register t | | | Instruction class Major opcode Minor opcode # Floating point minimum Minimum of two floating point values. If one value is a NaN, the other is chosen. | Behavior | |----------| | | | Rd=sfmin(Rs,Rt) | <pre>Rd = fmin(Rs,Rt);</pre> | |--------------------|---------------------------------| | Rdd=dfmin(Rss,Rtt) | <pre>Rdd = fmin(Rss,Rtt);</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=sfmin(Rs,Rt) | Word32 | Q6_R_sfmin_RR(Word32 Rs, Word32 Rt) | |--------------------|----------------|-------------------------------------| | Rdd=dfmin(Rss,Rtt) | Word64<br>Rtt) | Q6_P_dfmin_PP(Word64 Rss, Word64 | ## **Encoding** ICLASS MajOp | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ν | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=dfmin(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=sfmin(Rs,Rt) | | Field name | Description | |------------|-------------| | Fleid name | Description | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | Instruction class Major opcode # Floating point multiply Multiply two floating point values. | ehavior | |---------| | ( | | Rd=sfmpy(Rs,Rt) | Rd=Rs*Rt; | |-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rdd=dfmpyfix(Rss,Rtt) | <pre>if (is_denormal(Rss) &amp;&amp; (df_exponent(Rtt) &gt;= 512) &amp;&amp; is_normal(Rtt)) Rdd = Rss * 0x1.0p52; else if (is_denormal(Rtt) &amp;&amp; (df_exponent(Rss) &gt;= 512) &amp;&amp; is_normal(Rss)) Rdd = Rss * 0x1.0p-52; else Rdd = Rss;</pre> | | Rdd=dfmpyll(Rss,Rtt) | <pre>prod = (Rss.uw[0] * Rtt.uw[0]); Rdd = (prod &gt;&gt; 32) &lt;&lt; 1; if (prod.uw[0] != 0) Rdd.0 = 1;</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=sfmpy(Rs,Rt) | Word32 Q6_R_sfmpy_RR(Word32 Rs, Word32 Rt) | |-----------------------|-------------------------------------------------| | Rdd=dfmpyfix(Rss,Rtt) | Word64 Q6_P_dfmpyfix_PP(Word64 Rss, Word64 Rtt) | | Rdd=dfmpyll(Rss,Rtt) | Word64 Q6_P_dfmpyll_PP(Word64 Rss, Word64 Rtt) | ## **Encoding** Maj0p MinOp | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-----------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=dfmpyfix(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=dfmpyll(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=sfmpy(Rs,Rt) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t Major opcode Minor opcode ### Floating point reciprocal approximation Provides an approximation of the reciprocal of the denominator (Rt), if combined with the appropriate fixup instructions. Certain values (such as infinities or zeros) in the numerator or denominator might yield values that are not reciprocal approximations, but yield the correct answer when combined with fixup instructions and the appropriate routines. For compatibility, exact results of these instructions cannot be relied on. The precision of the approximation for this architecture and later is at least 6.6 bits. Syntax Behavio ``` Rd, Pe = sfrecipa(Rs, Rt) if ((Rs,Rt,Rd,adjust)=recip_common(Rs,Rt)) { Pe = adjust; idx = (Rt >> 16) & 0x7f; mant = (recip_lut[idx] << 15) | 1; exp = 127 - (exponent(Rt) - 127) - 1; Rd = -1**Rt.31 * 1.MANT * 2**(exp-BIAS); }</pre> ``` Class: XTYPE (slots 2,3) #### **Notes** - This instruction provides a certain amount of accuracy. In future versions the accuracy might increase. For future compatibility, avoid dependence on exact values. - The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. ### **Encoding** | 3 | 1 30 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-----------------------| | | IC | LA | SS | | F | Reg | Тур | е | ٨ | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | | е | 2 | | | d5 | | | | | 1 | 1 | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | е | е | d | d | d | d | d | Rd,Pe=sfrecipa(Rs,Rt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | • | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | e2 | Field to encode register e | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | # Floating point subtraction Subtract two floating point values. | ehavior | |---------| | е | | Rd=sfsub(Rs,Rt) | Rd=Rs-Rt; | |--------------------|--------------| | Rdd=dfsub(Rss,Rtt) | Rdd=Rss-Rtt; | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=sfsub(Rs,Rt) | Word32 | Q6_R_sfsub_RR(Word32 Rs, Word32 Rt) | |--------------------|----------------|-------------------------------------| | Rdd=dfsub(Rss,Rtt) | Word64<br>Rtt) | Q6_P_dfsub_PP(Word64 Rss, Word64 | ## **Encoding** ICLASS Maj0p MinOp | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=dfsub(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=sfsub(Rs,Rt) | ### Field name Description | RegType | Register type | |---------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | _ | Instruction class Major opcode Minor opcode ### 11.10.5 XTYPE MPY The XTYPE MPY instruction subclass includes instructions that perform multiplication. ## Multiply and use lower result Multiply the signed 32-bit integer in Rs by either the signed 32-bit integer in Rt or an unsigned immediate value. The 64-bit result is optionally accumulated with the 32-bit destination, or added to an immediate. The least-significant 32-bits of the result are written to the single destination register. This multiply produces the correct results for the ANSI C multiplication of two signed or unsigned integers with an integer result. Syntax Behavior | Rd=+mpyi(Rs,#u8) | <pre>apply_extension(#u); Rd=Rs*#u;</pre> | |---------------------------|--------------------------------------------------| | Rd=-mpyi(Rs,#u8) | Rd=Rs*-#u; | | Rd=add(#u6,mpyi(Rs,#U6)) | <pre>apply_extension(#u); Rd = #u + Rs*#U;</pre> | | Rd=add(#u6,mpyi(Rs,Rt)) | <pre>apply_extension(#u); Rd = #u + Rs*Rt;</pre> | | Rd=add(Ru,mpyi(#u6:2,Rs)) | Rd = Ru + Rs*#u; | | Rd=add(Ru, mpyi(Rs, #u6)) | <pre>apply_extension(#u); Rd = Ru + Rs*#u;</pre> | | Syntax | Behavior | |------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rd=mpyi(Rs,#m9) | <pre>if ("((#m9&lt;0) &amp;&amp; (#m9&gt;-256))") { Assembler mapped to: "Rd=-mpyi(Rs, #m9*(-1))"; } else { Assembler mapped to: "Rd=+mpyi(Rs, #m9)"; }</pre> | | Rd=mpyi(Rs,Rt) | Rd=Rs*Rt; | | Rd=mpyui(Rs,Rt) | Assembler mapped to: "Rd=mpyi(Rs,Rt)" | | Rx+=mpyi(Rs,#u8) | <pre>apply_extension(#u); Rx=Rx + (Rs*#u);</pre> | | Rx+=mpyi(Rs,Rt) | <pre>Rx=Rx + Rs*Rt;</pre> | | Rx-=mpyi(Rs,#u8) | <pre>apply_extension(#u); Rx=Rx - (Rs*#u);</pre> | | Rx-=mpyi(Rs,Rt) | Rx=Rx - Rs*Rt; | | Ry=add(Ru,mpyi(Ry,Rs)) | Ry = Ru + Rs*Ry; | ### Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Word32 Q6 R add mpyi IRI(Word32 Iu6, Word32 Rs, Word32 Rd = add(#u6, mpyi(Rs, #U6)) IU6) Word32 Q6 R add mpyi IRR(Word32 Iu6, Word32 Rs, Word32 Rd = add(\#u6, mpyi(Rs, Rt)) Rd = add(Ru, Word32 Q6 R add mpyi RIR(Word32 Ru, Word32 Iu6 2, mpyi(#u6:2, Rs)) Word32 Rs) Rd = add(Ru, mpyi(Rs, Word32 Q6 R add mpyi RRI (Word32 Ru, Word32 Rs, Word32 #u6)) Rd = mpyi(Rs, #m9) Word32 Q6 R mpyi RI(Word32 Rs, Word32 Im9) Rd = mpyi(Rs, Rt) Word32 Q6 R mpyi RR(Word32 Rs, Word32 Rt) Rd = mpyui(Rs, Rt) Word32 Q6 R mpyui RR(Word32 Rs, Word32 Rt) Rx += mpyi(Rs, #u8) Word32 Q6 R mpyiacc RI(Word32 Rx, Word32 Rs, Word32 Iu8) Word32 Q6 R mpyiacc RR(Word32 Rx, Word32 Rs, Word32 Rt) Rx += mpyi(Rs, Rt) Rx -= mpyi(Rs, #u8) Word32 Q6 R mpyinac RI(Word32 Rx, Word32 Rs, Word32 Rx -= mpyi(Rs, Rt) Word32 Q6_R_mpyinac_RR(Word32 Rx, Word32 Rs, Word32 Rt) Ry = add(Ru, mpyi(Ry, Word32 Q6 R add mpyi RRR(Word32 Ru, Word32 Ry, Word32 Rs)) Rs) ``` # Encoding | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Par | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | s | Р | Р | i | t | t | t | t | t | i | i | i | d | d | d | d | d | Rd=add(#u6,mpyi(Rs,Rt)) | | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pai | rse | | | | d5 | | | | • | | | | | | | | | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | I | i | i | s | s | s | s | s | Р | Р | i | d | d | d | d | d | i | i | i | ı | I | I | I | I | Rd=add(#u6,mpyi(Rs,#U6)) | | | ICL | ASS | 3 | F | Reg | Тур | е | | | • | | | s5 | • | | Pai | rse | | | | d5 | | • | | | | | • | u5 | | | | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | i | i | s | s | s | s | s | Р | P | i | d | d | d | d | d | i | i | i | u | u | u | u | u | Rd=add(Ru,mpyi(#u6:2,Rs) | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | i | i | s | s | s | s | s | Р | Р | i | d | d | d | d | d | i | i | i | u | u | u | u | u | Rd=add(Ru,mpyi(Rs,#u6)) | | | ICL | ASS | 6 | F | Reg | Тур | е | N | 1ajO | р | | | s5 | • | | Par | rse | | | | у5 | | • | | | | | • | u5 | | | | | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | у | у | у | у | у | - | - | - | u | u | u | u | u | Ry=add(Ru,mpyi(Ry,Rs)) | | | ICL | ASS | 3 | ı | Reg | Тур | е | N | 1ajC | р | | | s5 | | | Pai | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | - | - | s | s | s | s | s | Р | Ρ | 0 | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=+mpyi(Rs,#u8) | | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | d | d | d | d | d | Rd=-mpyi(Rs,#u8) | | | ICL | ASS | 3 | F | Reg | Тур | е | N | 1ajO | р | | | s5 | | | Par | rse | | | | | | | N | /linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | x | x | x | x | x | Rx+=mpyi(Rs,#u8) | | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | i | i | i | x | x | x | x | x | Rx-=mpyi(Rs,#u8) | | | ICL | ASS | 3 | ı | Reg | Тур | е | N | 1ajC | р | | | s5 | | | Pai | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=mpyi(Rs,Rt) | | | ICL | ASS | 3 | F | Reg | Тур | e | N | 1ajO | р | | | s5 | | | Pai | rse | | | | t5 | | | N | /linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | ø | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | X | x | x | x | x | Rx+=mpyi(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rx-=mpyi(Rs,Rt) | | Field name | Description | |------------|-------------| | | | | RegType | Register type | |---------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | u5 | Field to encode register u | | x5 | Field to encode register x | | у5 | Field to encode register y | ## Vector multiply word by signed half (32×16) Perform mixed precision vector multiply operations. A 32-bit word from vector Rss is multiplied by a 16-bit halfword (either even or odd) from vector Rtt. The multiplication is performed as a signed 32×16, which produces a 48-bit result. This result is optionally scaled left by one bit. This result is then shifted right by 16 bits, optionally accumulated and then saturated to 32 bits. This operation is available in vector form (vmpyweh/vmpywoh) and non-vector form (multiply and use upper result). Rdd = vmpyweh(Rss, Rtt)[:<<1]:rnd:sat Rtt.h[2])[<<1]+0x8000)>>16); Rdd.w[0]=sat<sub>32</sub>(((Rss.w[0] \* Rtt.h[0])[<<1]+0x8000)>>16); | Syntax | Behavior | |--------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | <pre>Rdd = vmpyweh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | Rdd.w[1]=sat <sub>32</sub> (((Rss.w[1] * Rtt.h[2])[<<1])>>16); Rdd.w[0]=sat <sub>32</sub> (((Rss.w[0] * Rtt.h[0])[<<1])>>16); | | <pre>Rdd = vmpywoh(Rss,Rtt)[:&lt;&lt;1]:rnd:sat</pre> | <pre>Rdd.w[1]=sat<sub>32</sub>(((Rss.w[1] * Rtt.h[3])[&lt;&lt;1]+0x8000)&gt;&gt;16); Rdd.w[0]=sat<sub>32</sub>(((Rss.w[0] * Rtt.h[1])[&lt;&lt;1]+0x8000)&gt;&gt;16);</pre> | | <pre>Rdd = vmpywoh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | <pre>Rdd.w[1]=sat<sub>32</sub>(((Rss.w[1] * Rtt.h[3])[&lt;&lt;1])&gt;&gt;16); Rdd.w[0]=sat<sub>32</sub>(((Rss.w[0] * Rtt.h[1])[&lt;&lt;1])&gt;&gt;16);</pre> | | <pre>Rxx += vmpyweh(Rss,Rtt)[:&lt;&lt;1]:rnd:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx += vmpyweh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | <pre>Rxx.w[1] = sat<sub>32</sub>(Rxx.w[1] + (((Rss.w[1] * Rtt.h[2])[&lt;&lt;1])&gt;&gt;16)); Rxx.w[0] = sat<sub>32</sub>(Rxx.w[0] + (((Rss.w[0] * Rtt.h[0])[&lt;&lt;1])&gt;&gt;16));</pre> | | <pre>Rxx += vmpywoh(Rss,Rtt)[:&lt;&lt;1]:rnd:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx += vmpywoh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | <pre>Rxx.w[1] = sat<sub>32</sub>(Rxx.w[1] + (((Rss.w[1] * Rtt.h[3])[&lt;&lt;1])&gt;&gt;16)); Rxx.w[0] = sat<sub>32</sub>(Rxx.w[0] + (((Rss.w[0] * Rtt.h[1])[&lt;&lt;1])&gt;&gt;16 ));</pre> | ### Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rdd = Word64 Q6 P vmpyweh PP s1 rnd sat (Word64 vmpyweh(Rss,Rtt):<<1:rnd:sat Rss, Word64 Rtt)</pre> Word64 Q6_P_vmpyweh_PP_s1_sat(Word64 Rss, Word64 Rtt) vmpyweh(Rss,Rtt):<<1:sat</pre> Word64 Q6 P vmpyweh PP rnd sat (Word64 Rss, Word64 Rtt) vmpyweh(Rss,Rtt):rnd:sat ``` Rdd = vmpyweh(Rss,Rtt):sat Word64 Q6 P vmpyweh PP sat (Word64 Rss, Word64 Rtt) Word64 Q6\_P\_vmpywoh\_PP\_s1\_rnd\_sat(Word64 Rdd =vmpywoh(Rss,Rtt):<<1:rnd:sat Rss, Word64 Rtt)</pre> Word64 Q6\_P\_vmpywoh\_PP\_s1\_sat(Word64 Rss, Word64 Rtt) vmpywoh(Rss,Rtt):<<1:sat</pre> Word64 Q6 P vmpywoh PP rnd sat (Word64 Rss, vmpywoh(Rss,Rtt):rnd:sat Word64 Rtt) Rdd = vmpywoh(Rss,Rtt):sat Word64 Q6 P vmpywoh PP sat (Word64 Rss, Word64 Rtt) Word64 Q6 P vmpywehacc PP s1 rnd sat (Word64 Rxx += vmpyweh(Rss,Rtt):<<1:rnd:sat Rxx, Word64 Rss, Word64 Rtt)</pre> Word64 Q6 P vmpywehacc PP s1 sat(Word64 vmpyweh(Rss,Rtt):<<1:sat</pre> Rxx, Word64 Rss, Word64 Rtt) Rxx += Word64 Q6\_P\_vmpywehacc\_PP\_rnd\_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) vmpyweh(Rss,Rtt):rnd:sat Rxx += vmpyweh(Rss,Rtt):sat Word64 Q6 P vmpywehacc PP sat (Word64 Rxx, Word64 Rss, Word64 Rtt) Word64 Q6 P vmpywohacc PP s1 rnd sat (Word64 Rxx += vmpywoh(Rss,Rtt):<<1:rnd:sat Rxx, Word64 Rss, Word64 Rtt)</pre> Rxx += vmpywoh(Rss,Word64 Q6 P vmpywohacc PP s1 sat(Word64 Rtt):<<1:sat Rxx, Word64 Rss, Word64 Rtt) Word64 Q6 P vmpywohacc PP rnd sat (Word64 Rxx += vmpywoh(Rss, Rtt):rnd:sat Rxx, Word64 Rss, Word64 Rtt) Rxx += vmpywoh(Rss, Rtt):sat Word64 Q6 P vmpywohacc PP sat(Word64 Rxx, Word64 Rss, Word64 Rtt) #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------------------------------------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | Ν | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vmpyweh(Rss,Rtt)[:<<<br>N]:sat | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vmpywoh(Rss,Rtt)[:<<<br>N]:sat | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vmpyweh(Rss,Rtt)[:<<<br>N]:rnd:sat | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vmpywoh(Rss,Rtt)[:<<<br>N]:rnd:sat | | | ICL | ASS | 6 | | Reg | Тур | е | N | 1ajO | )p | | • | s5 | • | • | Pa | rse | | | • | t5 | • | | ٨ | /linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rxx+=vmpyweh(Rss,Rtt)[:< <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rxx+=vmpywoh(Rss,Rtt)[:<<br><n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rxx+=vmpyweh(Rss,Rtt)[:<<br><n]:rnd:sat< td=""></n]:rnd:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 1 | s | s | ø | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rxx+=vmpywoh(Rss,Rtt)[:<<br><n]:rnd:sat< td=""></n]:rnd:sat<> | t5 x5 | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | - | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | Field to encode register t Field to encode register x ## Vector multiply word by unsigned half (32×16) Perform mixed precision vector multiply operations. A 32-bit signed word from vector Rss is multiplied by a 16-bit unsigned halfword (either odd or even) from vector Rtt. This multiplication produces a 48-bit result. This result is optionally scaled left by one bit, and then a rounding constant is optionally added to the lower 16 bits. This result is then shifted right by 16 bits, optionally accumulated and then saturated to 32 bits. This is a dual vector operation and is performed for both high and low word of Rss. Syntax Behavior Rdd = Rdd.w[1]=sat<sub>32</sub>(((Rss.w[1] \* rt.uh[2])[<<1]+0x8000)>>16); Rdd.w[0]=sat<sub>32</sub>(((Rss.w[0] \* rt.uh[0])[<<1]+0x8000)>>16); | Syntax | Behavior | |---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | <pre>Rdd = vmpyweuh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | Rdd.w[1]=sat <sub>32</sub> (((Rss.w[1] * Rtt.uh[2])[<<1])>>16); Rdd.w[0]=sat <sub>32</sub> (((Rss.w[0] * Rtt.uh[0])[<<1])>>16); | | <pre>Rdd = vmpywouh(Rss,Rtt)[:&lt;&lt;1]:rnd:sat</pre> | <pre>Rdd.w[1]=sat<sub>32</sub>(((Rss.w[1] * Rtt.uh[3])[&lt;&lt;1]+0x8000)&gt;&gt;16); Rdd.w[0]=sat<sub>32</sub>(((Rss.w[0] * Rtt.uh[1])[&lt;&lt;1]+0x8000)&gt;&gt;16);</pre> | | <pre>Rdd = vmpywouh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | <pre>Rdd.w[1] = sat<sub>32</sub>(((Rss.w[1] * Rtt.uh[3])[&lt;&lt;1])&gt;&gt;16); Rdd.w[0] = sat<sub>32</sub>(((Rss.w[0] * Rtt.uh[1])[&lt;&lt;1])&gt;&gt;16);</pre> | | <pre>Rxx += vmpyweuh(Rss,Rtt)[:&lt;&lt;1]:rnd:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx += vmpyweuh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx += vmpywouh(Rss,Rtt)[:&lt;&lt;1]:rnd:sat</pre> | $ \begin{array}{llllllllllllllllllllllllllllllllllll$ | | <pre>Rxx += vmpywouh(Rss,Rtt)[:&lt;&lt;1]:sat</pre> | <pre>Rxx.w[1] = sat<sub>32</sub>(Rxx.w[1] + (((Rss.w[1] * Rtt.uh[3])[&lt;&lt;1])&gt;&gt;16)); Rxx.w[0] = sat<sub>32</sub>(Rxx.w[0] + (((Rss.w[0] * Rtt.uh[1])[&lt;&lt;1])&gt;&gt;16 ));</pre> | ## Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** | Rdd= vmpyweuh(Rss,<br>Rtt):<<1:rnd:sat | Word64 Q6_P_vmpyweuh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) | |----------------------------------------|------------------------------------------------------------| | Rdd= vmpyweuh(Rss,<br>Rtt):<<1:sat | Word64 Q6_P_vmpyweuh_PP_s1_sat(Word64 Rss, Word64 Rtt) | | Rdd= vmpyweuh(Rss,<br>Rtt):rnd:sat | Word64 Q6_P_vmpyweuh_PP_rnd_sat(Word64 Rss, Word64 Rtt) | | Rdd= vmpyweuh(Rss, Rtt):sat | Word64 Q6_P_vmpyweuh_PP_sat(Word64 Rss, Word64 Rtt) | |-----------------------------------------------------------|---------------------------------------------------------------------------| | <pre>Rdd= vmpywouh(Rss, Rtt):&lt;&lt;1:rnd:sat</pre> | Word64 Q6_P_vmpywouh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) | | Rdd= vmpywouh(Rss, Rtt):<<1:sat | Word64 Q6_P_vmpywouh_PP_s1_sat(Word64 Rss, Word64 Rtt) | | Rdd= vmpywouh(Rss,<br>Rtt):rnd:sat | Word64 Q6_P_vmpywouh_PP_rnd_sat(Word64 Rss, Word64 Rtt) | | Rdd= vmpywouh(Rss, Rtt):sat | Word64 Q6_P_vmpywouh_PP_sat(Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpyweuh(Rss, Rtt):&lt;&lt;1:rnd:sat</pre> | Word64 Q6_P_vmpyweuhacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpyweuh(Rss,<br/>Rtt):&lt;&lt;1:sat</pre> | Word64 Q6_P_vmpyweuhacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpyweuh(Rss, Rtt):rnd:sat</pre> | Word64 Q6_P_vmpyweuhacc_PP_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpyweuh(Rss, Rtt):sat</pre> | Word64 Q6_P_vmpyweuhacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpywouh(Rss,<br/>Rtt):&lt;&lt;1:rnd:sat</pre> | Word64 Q6_P_vmpywouhacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpywouh(Rss,<br/>Rtt):&lt;&lt;1:sat</pre> | Word64 Q6_P_vmpywouhacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpywouh(Rss, Rtt):rnd:sat</pre> | Word64 Q6_P_vmpywouhacc_PP_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vmpywouh(Rss, Rtt):sat</pre> | Word64 Q6_P_vmpywouhacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|---------------------------------------------------------------| | | ICL | ASS | ; | ı | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | Ν | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 0 | s | ø | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vmpyweuh(Rss,Rtt)[:<<br><n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 0 | s | ø | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vmpywouh(Rss,Rtt)[:<<br><n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vmpyweuh(Rss,Rtt)[:<<br><n]:rnd:sat< td=""></n]:rnd:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 1 | s | ø | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vmpywouh(Rss,Rtt)[:<<br><n]:rnd:sat< td=""></n]:rnd:sat<> | | | ICL | ASS | ; | F | Reg | Тур | е | N | 1ajO | )p | | | s5 | | | Pa | rse | | | | t5 | | | Ν | /linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rxx+=vmpyweuh(Rss,Rtt)[: < <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rxx+=vmpywouh(Rss,Rtt)[: < <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rxx+=vmpyweuh(Rss,Rtt)[: < <n]:rnd:sat< td=""></n]:rnd:sat<> | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 1 | 1 | s | s | ø | s | ø | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rxx+=vmpywouh(Rss,Rtt)[: < <n]:rnd:sat< td=""></n]:rnd:sat<> | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MajOp | Major opcode | | RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x Minor opcode MinOp ## Multiply signed halfwords Multiply two signed halfwords. Optionally shift the multiplier result by 1 bit. This result can be accumulated or rounded. The destination/accumulator can be either 32 or 64 bits. For 32-bit results, saturation is optional. ## Class: XTYPE (slots 2,3) #### **Notes** ■ If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** | Rd=mpy(Rs.H,Rt.H) | Word32 Q6_R_mpy_RhRh(Word32 Rs, Word32 Rt) | |-------------------------------------------------|-------------------------------------------------------| | Rd=mpy(Rs.H,Rt.H):<<1 | Word32 Q6_R_mpy_RhRh_s1(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.H):<< 1:rnd | Word32 Q6_R_mpy_RhRh_s1_rnd(Word32 Rs, Word32 Rt) | | <pre>Rd=mpy(Rs.H,Rt.H):&lt;&lt; 1:rnd:sat</pre> | Word32 Q6_R_mpy_RhRh_s1_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.H):<< 1:sat | Word32 Q6_R_mpy_RhRh_s1_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.H):rnd | Word32 Q6_R_mpy_RhRh_rnd(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.H):rnd:sat | Word32 Q6_R_mpy_RhRh_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.H):sat | Word32 Q6_R_mpy_RhRh_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L) | Word32 Q6_R_mpy_RhRl(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L):<<1 | Word32 Q6_R_mpy_RhRl_s1(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L):<< 1:rnd | Word32 Q6_R_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L):<<<br>1:rnd:sat | Word32 Q6_R_mpy_RhRl_s1_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L):<< 1:sat | Word32 Q6_R_mpy_RhRl_s1_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L):rnd | Word32 Q6_R_mpy_RhRl_rnd(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L):rnd:sat | Word32 Q6_R_mpy_RhRl_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.H,Rt.L):sat | Word32 Q6_R_mpy_RhRl_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H) | Word32 Q6_R_mpy_RlRh(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H):<<1 | Word32 Q6_R_mpy_RlRh_s1(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H):<<1:rnd | Word32 Q6_R_mpy_RlRh_s1_rnd(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H):<<<br>1:rnd:sat | Word32 Q6_R_mpy_RlRh_s1_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H):<<1:sat | Word32 Q6_R_mpy_RlRh_s1_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H):rnd | Word32 Q6_R_mpy_RlRh_rnd(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H):rnd:sat | Word32 Q6_R_mpy_RlRh_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.H):sat | Word32 Q6_R_mpy_RlRh_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.L) | Word32 Q6_R_mpy_RlRl(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.L):<<1 | Word32 Q6_R_mpy_RlRl_s1(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.L):<<1:rnd | Word32 Q6_R_mpy_RlRl_s1_rnd(Word32 Rs, Word32 Rt) | |---------------------------------------------|-----------------------------------------------------------------| | Rd=mpy(Rs.L,Rt.L):<<<br>1:rnd:sat | Word32 Q6_R_mpy_RlRl_s1_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.L):<< 1:sat | Word32 Q6_R_mpy_RlRl_s1_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.L):rnd | Word32 Q6_R_mpy_RlRl_rnd(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs.L,Rt.L):rnd:sat | <pre>Word32 Q6_R_mpy_RlRl_rnd_sat(Word32 Rs, Word32 Rt)</pre> | | Rd=mpy(Rs.L,Rt.L):sat | Word32 Q6_R_mpy_RlRl_sat(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.H) | Word64 Q6_P_mpy_RhRh(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.H):<<1 | Word64 Q6_P_mpy_RhRh_s1(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.H):<<1:rnd | Word64 Q6_P_mpy_RhRh_s1_rnd(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.H):rnd | Word64 Q6_P_mpy_RhRh_rnd(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.L) | Word64 Q6_P_mpy_RhRl(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.L):<<1 | Word64 Q6_P_mpy_RhRl_s1(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.L):<<1:rnd | Word64 Q6_P_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.H,Rt.L):rnd | Word64 Q6_P_mpy_RhRl_rnd(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.L,Rt.H) | Word64 Q6_P_mpy_RlRh(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.L,Rt.H):<<1 | Word64 Q6_P_mpy_RlRh_s1(Word32 Rs, Word32 Rt) | | <pre>Rdd=mpy(Rs.L,Rt.H):&lt;&lt;1:rnd</pre> | Word64 Q6_P_mpy_RlRh_s1_rnd(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.L,Rt.H):rnd | Word64 Q6_P_mpy_RlRh_rnd(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.L,Rt.L) | Word64 Q6_P_mpy_RlRl(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.L,Rt.L):<<1 | Word64 Q6_P_mpy_RlRl_s1(Word32 Rs, Word32 Rt) | | <pre>Rdd=mpy(Rs.L,Rt.L):&lt;&lt;1:rnd</pre> | Word64 Q6_P_mpy_RlRl_s1_rnd(Word32 Rs, Word32 Rt) | | Rdd=mpy(Rs.L,Rt.L):rnd | Word64 Q6_P_mpy_RlRl_rnd(Word32 Rs, Word32 Rt) | | Rx+=mpy(Rs.H,Rt.H) | Word32 Q6_R_mpyacc_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.H,Rt.H):&lt;&lt;1</pre> | Word32 Q6_R_mpyacc_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.H,Rt.H):&lt;&lt;1:sat</pre> | Word32 Q6_R_mpyacc_RhRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.H,Rt.H):sat</pre> | Word32 Q6_R_mpyacc_RhRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+=mpy(Rs.H,Rt.L) | Word32 Q6_R_mpyacc_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+=mpy(Rs.H,Rt.L):<<1 | Word32 Q6_R_mpyacc_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.H,Rt.L):&lt;&lt;1:sat</pre> | Word32 Q6_R_mpyacc_RhRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.H,Rt.L):sat</pre> | Word32 Q6_R_mpyacc_RhRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+=mpy(Rs.L,Rt.H) | Word32 Q6_R_mpyacc_RlRh(Word32 Rx, Word32 Rs, Word32 Rt) | | | | | <pre>Rx+=mpy(Rs.L,Rt.H):&lt;&lt;1</pre> | Word32 Q6_R_mpyacc_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | |---------------------------------------------|------------------------------------------------------------------------| | <pre>Rx+=mpy(Rs.L,Rt.H):&lt;&lt;1:sat</pre> | Word32 Q6_R_mpyacc_RlRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.L,Rt.H):sat</pre> | Word32 Q6_R_mpyacc_RlRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.L,Rt.L)</pre> | Word32 Q6_R_mpyacc_RlRl(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.L,Rt.L):&lt;&lt;1</pre> | <pre>Word32 Q6_R_mpyacc_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt)</pre> | | <pre>Rx+=mpy(Rs.L,Rt.L):&lt;&lt;1:sat</pre> | Word32 Q6_R_mpyacc_RlRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs.L,Rt.L):sat</pre> | Word32 Q6_R_mpyacc_RlRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.H,Rt.H) | Word32 Q6_R_mpynac_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.H,Rt.H):<<1 | Word32 Q6_R_mpynac_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.H,Rt.H):<<1:sat | Word32 Q6_R_mpynac_RhRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.H,Rt.H):sat | Word32 Q6_R_mpynac_RhRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.H,Rt.L) | Word32 Q6_R_mpynac_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.H,Rt.L):<<1 | Word32 Q6_R_mpynac_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx-=mpy(Rs.H,Rt.L):&lt;&lt;1:sat</pre> | Word32 Q6_R_mpynac_RhRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.H,Rt.L):sat | Word32 Q6_R_mpynac_RhRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.H) | Word32 Q6_R_mpynac_RlRh(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.H):<<1 | Word32 Q6_R_mpynac_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.H):<<1:sat | Word32 Q6_R_mpynac_RlRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.H):sat | Word32 Q6_R_mpynac_RlRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.L) | Word32 Q6_R_mpynac_RlRl(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.L):<<1 | Word32 Q6_R_mpynac_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.L):<<1:sat | Word32 Q6_R_mpynac_RlRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-=mpy(Rs.L,Rt.L):sat | Word32 Q6_R_mpynac_RlRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | | | | Rxx+=mpy(Rs.H,Rt.H) | Word64 Q6_P_mpyacc_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) | |------------------------------------------|--------------------------------------------------------------| | <pre>Rxx+=mpy(Rs.H,Rt.H):&lt;&lt;1</pre> | Word64 Q6_P_mpyacc_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=mpy(Rs.H,Rt.L)</pre> | Word64 Q6_P_mpyacc_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=mpy(Rs.H,Rt.L):&lt;&lt;1</pre> | Word64 Q6_P_mpyacc_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=mpy(Rs.L,Rt.H)</pre> | Word64 Q6_P_mpyacc_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=mpy(Rs.L,Rt.H):&lt;&lt;1</pre> | Word64 Q6_P_mpyacc_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=mpy(Rs.L,Rt.L)</pre> | Word64 Q6_P_mpyacc_RlRl(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=mpy(Rs.L,Rt.L):&lt;&lt;1</pre> | Word64 Q6_P_mpyacc_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-=mpy(Rs.H,Rt.H) | Word64 Q6_P_mpynac_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-=mpy(Rs.H,Rt.H):<<1 | Word64 Q6_P_mpynac_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-=mpy(Rs.H,Rt.L) | Word64 Q6_P_mpynac_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx-=mpy(Rs.H,Rt.L):&lt;&lt;1</pre> | Word64 Q6_P_mpynac_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-=mpy(Rs.L,Rt.H) | Word64 Q6_P_mpynac_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx-=mpy(Rs.L,Rt.H):&lt;&lt;1</pre> | Word64 Q6_P_mpynac_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx-=mpy(Rs.L,Rt.L)</pre> | Word64 Q6_P_mpynac_RlRl(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx-=mpy(Rs.L,Rt.L):&lt;&lt;1</pre> | Word64 Q6_P_mpynac_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|--------------------------------------------| | | ICL | ASS | 3 | ļ | Reg | Тур | е | N | ЛаjC | Эр | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 0 | 0 | d | d | d | d | d | Rdd=mpy(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 0 | 1 | d | d | d | d | d | Rdd=mpy(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 1 | 0 | d | d | d | d | d | Rdd=mpy(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 1 | 1 | d | d | d | d | d | Rdd=mpy(Rs.H,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 0 | 0 | d | d | d | d | d | Rdd=mpy(Rs.L,Rt.L)[:< <n]:<br>rnd</n]:<br> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 0 | 1 | d | d | d | d | d | Rdd=mpy(Rs.L,Rt.H)[:< <n]:< td=""></n]:<> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 1 | 0 | d | d | d | d | d | Rdd=mpy(Rs.H,Rt.L)[:< <n]:< td=""></n]:<> | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----------|-----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|------------------------------------------------| | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 1 | 1 | d | d | d | d | d | Rdd=mpy(Rs.H,Rt.H)[:< <n]<br>:rnd</n]<br> | | | ICL | ASS | 3 | F | Reg | Тур | e | N | l<br>1ajO | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rxx+=mpy(Rs.L,Rt.L)[:< <n< td=""></n<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rxx+=mpy(Rs.L,Rt.H)[:< <n< td=""></n<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | x | x | x | x | x | Rxx+=mpy(Rs.H,Rt.L)[:< <n< td=""></n<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rxx+=mpy(Rs.H,Rt.H)[:<<<br>N] | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rxx-=mpy(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rxx-=mpy(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | x | x | x | x | x | Rxx-=mpy(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rxx-<br>=mpy(Rs.H,Rt.H)[:< <n]< td=""></n]<> | | | ICL | ASS | | F | Reg | Тур | е | N | 1ajC | )p | | r | s5 | | r | Pa | rse | | | r | t5 | | | N | 1inC | )p | | r | d5 | 1 | 1 | | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | s | S | s | S | S | P | P - | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | S | S | S | S | S | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | S | S | S | S | S | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.L)[:< <n]:s<br>at</n]:s<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.H)[:< <n]:s<br>at</n]:s<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.L)[:< <n]:s<br>at</n]:s<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.H)[:< <n]:<br>sat</n]:<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.L)[:< <n]:r< td=""></n]:r<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.H)[:< <n]:r< td=""></n]:r<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.L)[:< <n]:r<br>nd</n]:r<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.H)[:< <n]:r< td=""></n]:r<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.L)[:< <n]:r<br>nd:sat</n]:r<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rd=mpy(Rs.L,Rt.H)[:< <n]:r<br>nd:sat</n]:r<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.L)[:< <n]:r<br>nd:sat</n]:r<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=mpy(Rs.H,Rt.H)[:< <n]:r<br>nd:sat</n]:r<br> | | | ICL | ASS | | | Reg | Тур | | | 1ajO | · · | | r | s5 | | r | | rse | | | r | t5 | | | | 1inO | • | | r | х5 | 1 | 1 | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 0 | 0 | X | X | X | X | X | Rx+=mpy(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | P | P | - | t | t | t | t | t | 0 | 0 | 1 | X | x | х | x | x | Rx+=mpy(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | S | s | P | P | - | t | t | t | t | t | 0 | 1 | 0 | X | X | х | X | x | Rx+=mpy(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | X | х | х | x | x | Rx+=mpy(Rs.H,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rx+=mpy(Rs.L,Rt.L)[:< <n]:<br>sat</n]:<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | X | x | x | x | x | Rx+=mpy(Rs.L,Rt.H)[:< <n]:<br>sat</n]:<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | X | x | x | x | x | Rx+=mpy(Rs.H,Rt.L)[:< <n]:<br>sat</n]:<br> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rx+=mpy(Rs.H,Rt.H)[:< <n]<br>:sat</n]<br> | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-----------------------------------------------------| | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | х | x | x | x | х | Rx-=mpy(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | х | х | x | х | х | Rx-=mpy(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | х | x | x | x | х | Rx-=mpy(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | х | х | x | х | х | Rx-=mpy(Rs.H,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rx-<br>=mpy(Rs.L,Rt.L)[:< <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rx-<br>=mpy(Rs.L,Rt.H)[:< <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rx-<br>=mpy(Rs.H,Rt.L)[:< <n]:sat< td=""></n]:sat<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | x | x | x | x | x | Rx-<br>=mpy(Rs.H,Rt.H)[:< <n]:sat< td=""></n]:sat<> | ### Field name Description ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type sH Rs is high tH Rt is high Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x # Multiply unsigned halfwords Multiply two unsigned halfwords. Scale the result by 0 to 3 bits. Optionally, add or subtract the result from the accumulator. | Rd= mpyu(Rs.[HL],Rt.[HL])[:<<1] | Rd=(Rs.uh[01] * Rt.uh[01])[<<1]; | |----------------------------------------------------|---------------------------------------------------------| | Rdd= mpyu(Rs.[HL],Rt.[HL])[:<<1] | Rdd=(Rs.uh[01] * Rt.uh[01])[<<1]; | | <pre>Rx+= mpyu(Rs.[HL],Rt.[HL])[:&lt;&lt;1]</pre> | <pre>Rx=Rx+ (Rs.uh[01] * Rt.uh[01])[&lt;&lt;1];</pre> | | <pre>Rx-= mpyu(Rs.[HL],Rt.[HL])[:&lt;&lt;1]</pre> | <pre>Rx=Rx- (Rs.uh[01] * Rt.uh[01])[&lt;&lt;1];</pre> | | <pre>Rxx+= mpyu(Rs.[HL],Rt.[HL])[:&lt;&lt;1]</pre> | <pre>Rxx=Rxx+ (Rs.uh[01] * Rt.uh[01])[&lt;&lt;1];</pre> | | <pre>Rxx-= mpyu(Rs.[HL],Rt.[HL])[:&lt;&lt;1]</pre> | <pre>Rxx=Rxx- (Rs.uh[01] * Rt.uh[01])[&lt;&lt;1];</pre> | Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd= mpyu(Rs.H,Rt.H) | UWord32 Q6_R_mpyu_RhRh(Word32 Rs, Word32 Rt) | |--------------------------|----------------------------------------------------------------------| | Rd= mpyu(Rs.H,Rt.H):<<1 | UWord32 Q6_R_mpyu_RhRh_s1(Word32 Rs, Word32 Rt) | | Rd= mpyu(Rs.H,Rt.L) | UWord32 Q6_R_mpyu_RhRl(Word32 Rs, Word32 Rt) | | Rd= mpyu(Rs.H,Rt.L):<<1 | UWord32 Q6_R_mpyu_RhRl_s1(Word32 Rs, Word32 Rt) | | Rd= mpyu(Rs.L,Rt.H) | UWord32 Q6_R_mpyu_RlRh(Word32 Rs, Word32 Rt) | | Rd= mpyu(Rs.L,Rt.H):<<1 | UWord32 Q6_R_mpyu_RlRh_s1(Word32 Rs, Word32 Rt) | | Rd= mpyu(Rs.L,Rt.L) | UWord32 Q6_R_mpyu_RlRl(Word32 Rs, Word32 Rt) | | Rd= mpyu(Rs.L,Rt.L):<<1 | UWord32 Q6_R_mpyu_RlRl_s1(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.H,Rt.H) | UWord64 Q6_P_mpyu_RhRh(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.H,Rt.H):<<1 | UWord64 Q6_P_mpyu_RhRh_s1(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.H,Rt.L) | UWord64 Q6_P_mpyu_RhRl(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.H,Rt.L):<<1 | UWord64 Q6_P_mpyu_RhRl_s1(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.L,Rt.H) | UWord64 Q6_P_mpyu_RlRh(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.L,Rt.H):<<1 | UWord64 Q6_P_mpyu_RlRh_s1(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.L,Rt.L) | UWord64 Q6_P_mpyu_RlRl(Word32 Rs, Word32 Rt) | | Rdd= mpyu(Rs.L,Rt.L):<<1 | UWord64 Q6_P_mpyu_RlRl_s1(Word32 Rs, Word32 Rt) | | Rx+= mpyu(Rs.H,Rt.H) | Word32 Q6_R_mpyuacc_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+= mpyu(Rs.H,Rt.H):<<1 | Word32 Q6_R_mpyuacc_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+= mpyu(Rs.H,Rt.L) | Word32 Q6_R_mpyuacc_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+= mpyu(Rs.H,Rt.L):<<1 | Word32 Q6_R_mpyuacc_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+= mpyu(Rs.L,Rt.H) | Word32 Q6_R_mpyuacc_RlRh(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+=mpyu(Rs.L,Rt.H):<<1 | Word32 Q6_R_mpyuacc_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+= mpyu(Rs.L,Rt.L) | Word32 Q6_R_mpyuacc_RlRl(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx+= mpyu(Rs.L,Rt.L):<<1 | Word32 Q6_R_mpyuacc_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-= mpyu(Rs.H,Rt.H) | Word32 Q6_R_mpyunac_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-= mpyu(Rs.H,Rt.H):<<1 | Word32 Q6_R_mpyunac_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-= mpyu(Rs.H,Rt.L) | Word32 Q6_R_mpyunac_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-= mpyu(Rs.H,Rt.L):<<1 | Word32 Q6_R_mpyunac_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-= mpyu(Rs.L,Rt.H) | <pre>Word32 Q6_R_mpyunac_RlRh(Word32 Rx, Word32 Rs, Word32 Rt)</pre> | | Rx-= mpyu(Rs.L,Rt.H):<<1 | Word32 Q6_R_mpyunac_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) | |--------------------------------------------|---------------------------------------------------------------| | Rx-= mpyu(Rs.L,Rt.L) | Word32 Q6_R_mpyunac_R1R1(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx-= mpyu(Rs.L,Rt.L):<<1 | Word32 Q6_R_mpyunac_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.H,Rt.H)</pre> | Word64 Q6_P_mpyuacc_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.H,Rt.H):&lt;&lt;1</pre> | Word64 Q6_P_mpyuacc_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.H,Rt.L)</pre> | Word64 Q6_P_mpyuacc_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.H,Rt.L):&lt;&lt;1</pre> | Word64 Q6_P_mpyuacc_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.L,Rt.H)</pre> | Word64 Q6_P_mpyuacc_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.L,Rt.H):&lt;&lt;1</pre> | Word64 Q6_P_mpyuacc_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.L,Rt.L)</pre> | Word64 Q6_P_mpyuacc_RlRl(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+= mpyu(Rs.L,Rt.L):&lt;&lt;1</pre> | Word64 Q6_P_mpyuacc_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-= mpyu(Rs.H,Rt.H) | Word64 Q6_P_mpyunac_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-= mpyu(Rs.H,Rt.H):<<1 | Word64 Q6_P_mpyunac_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-= mpyu(Rs.H,Rt.L) | Word64 Q6_P_mpyunac_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-= mpyu(Rs.H,Rt.L):<<1 | Word64 Q6_P_mpyunac_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-= mpyu(Rs.L,Rt.H) | Word64 Q6_P_mpyunac_R1Rh(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-= mpyu(Rs.L,Rt.H):<<1 | Word64 Q6_P_mpyunac_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-= mpyu(Rs.L,Rt.L) | Word64 Q6_P_mpyunac_R1R1(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx-= mpyu(Rs.L,Rt.L):&lt;&lt;1</pre> | Word64 Q6_P_mpyunac_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|----|----|---|---|----|---|---|--------------------------------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | Λ | ЛајС | р | | | s5 | | | Pa | rse | | | | t5 | | | | sH | tH | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 0 | 0 | d | d | d | d | d | Rdd=mpyu(Rs.L,Rt.L)[:< <n ]<="" td=""></n> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 0 | 1 | d | d | d | d | d | Rdd=mpyu(Rs.L,Rt.H)[:< <n ]<="" td=""></n> | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|----|----|---|---|----|---|---|-----------------------------------------------| | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 1 | 0 | d | d | d | d | d | Rdd=mpyu(Rs.H,Rt.L)[:< <n< td=""></n<> | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | 1 | 1 | d | d | d | d | d | Rdd=mpyu(Rs.H,Rt.H)[:<<<br>N] | | | ICL | ASS | 3 | F | Reg | Тур | е | Ν | ЛајС | )p | | | s5 | | | Pa | rse | | | | t5 | | | | sH | tH | | | х5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rxx+=mpyu(Rs.L,Rt.L)[:<< N] | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rxx+=mpyu(Rs.L,Rt.H)[:<<<br>N] | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | x | x | x | x | x | Rxx+=mpyu(Rs.H,Rt.L)[:<<<br>N] | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rxx+=mpyu(Rs.H,Rt.H)[:<<<br>N] | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rxx-<br>=mpyu(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rxx-<br>=mpyu(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | x | x | x | x | x | Rxx-<br>=mpyu(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rxx-<br>=mpyu(Rs.H,Rt.H)[:< <n]< td=""></n]<> | | | ICL | ASS | 3 | F | Reg | Тур | е | Λ | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | | sH | tH | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | P | • | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=mpyu(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=mpyu(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rd=mpyu(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rd=mpyu(Rs.H,Rt.H)[:< <n]< td=""></n]<> | | | ICL | ASS | 3 | F | Reg | Тур | е | ٨ | ЛаjС | )p | | • | s5 | • | • | Pa | rse | | | • | t5 | | | | sH | tH | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rx+=mpyu(Rs.L,Rt.L)[:< <n< td=""></n<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rx+=mpyu(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | x | x | x | x | x | Rx+=mpyu(Rs.H,Rt.L)[:< <n< td=""></n<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rx+=mpyu(Rs.H,Rt.H)[:<<<br>N] | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | P | Р | - | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | x | Rx-=mpyu(Rs.L,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | | t | t | t | t | t | 0 | 0 | 1 | x | x | х | х | х | Rx-=mpyu(Rs.L,Rt.H)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | x | x | x | х | х | Rx-=mpyu(Rs.H,Rt.L)[:< <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | x | x | x | x | x | Rx-<br>=mpyu(Rs.H,Rt.H)[:< <n]< td=""></n]<> | #### Field name #### Description ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type sH Rs is high tH Rt is high Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x ## Polynomial multiply words Perform a $32 \times 32$ carryless polynomial multiply using 32-bit source registers Rs and Rt. The 64-bit result is optionally accumulated (XOR'd) with the destination register. Finite field multiply instructions are useful for many algorithms including scramble code generation, cryptographic algorithms, convolutional, and Reed Solomon codes. Syntax Behavior ### Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rdd=pmpyw(Rs,Rt) Word64 Q6_P_pmpyw_RR(Word32 Rs, Word32 Rt) Rxx^=pmpyw(Rs,Rt) Word64 Q6_P_pmpywxacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) ``` ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------| | | ICL. | ASS | 3 | | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=pmpyw(Rs,Rt) | | | ICL. | ASS | 3 | | Reg | Тур | e | N | 1ajC | )p | | | s5 | | | Pai | rse | | | | t5 | | | N | linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | х | x | х | х | х | Rxx^=pmpyw(Rs,Rt) | | Field name | Description | |------------|-------------| |------------|-------------| ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Packet/loop page Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x ## Vector reduce multiply word by signed half (32×16) Perform mixed precision vector multiply operations and accumulate the results. A 32-bit word from vector Rss is multiplied by a 16-bit halfword (either even or odd) from vector Rtt. The multiplication is performed as a signed 32 × 16, which produces a 48-bit result. This result is optionally scaled left by one bit. A similar operation is performed for both words in Rss, and the two results are accumulated. The final result is optionally accumulated with Rxx. Syntax Behavior ``` Rdd=vrmpyweh(Rss,Rtt)[:<<1] Rdd = (Rss.w[1] * Rtt.h[2])[<<1] + (Rss.w[0] * Rtt.h[0])[<<1]; Rdd=vrmpywoh(Rss,Rtt)[:<<1] Rdd = (Rss.w[1] * Rtt.h[3])[<<1] + (Rss.w[0] * Rtt.h[1])[<<1]; Rxx+=vrmpyweh(Rss,Rtt)[:<<1] Rxx += (Rss.w[1] * Rtt.h[2])[<<1] + (Rss.w[0] * Rtt.h[0])[<<1]; Rxx+=vrmpywoh(Rss,Rtt)[:<<1] Rxx += (Rss.w[1] * Rtt.h[3])[<<1] + (Rss.w[0] * Rtt.h[3])[<<1]; ``` ## Class: XTYPE (slots 2,3) ## Intrinsics | Rdd= vrmpyweh(Rss,Rtt) | Word64 Q6_P_vrmpyweh_PP(Word64 Rss, Word64 Rtt) | |----------------------------------------------|-------------------------------------------------------------------| | <pre>Rdd= vrmpyweh(Rss,Rtt):&lt;&lt;1</pre> | Word64 Q6_P_vrmpyweh_PP_s1(Word64 Rss, Word64 Rtt) | | Rdd= vrmpywoh(Rss,Rtt) | Word64 Q6_P_vrmpywoh_PP(Word64 Rss, Word64 Rtt) | | <pre>Rdd= vrmpywoh(Rss,Rtt):&lt;&lt;1</pre> | Word64 Q6_P_vrmpywoh_PP_s1(Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vrmpyweh(Rss,Rtt)</pre> | Word64 Q6_P_vrmpywehacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vrmpyweh(Rss,Rtt):&lt;&lt;1</pre> | Word64 Q6_P_vrmpywehacc_PP_s1(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vrmpywoh(Rss,Rtt)</pre> | Word64 Q6_P_vrmpywohacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+= vrmpywoh(Rss,Rtt):&lt;&lt;1</pre> | Word64 Q6_P_vrmpywohacc_PP_s1(Word64 Rxx, Word64 Rss, Word64 Rtt) | ## **Encoding** Field name | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-----------------------------------------------| | | ICL | ASS | 3 | 1 | Reg | Тур | е | N | ЛајС | р | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vrmpywoh(Rss,Rtt)[:<<br><n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vrmpyweh(Rss,Rtt)[:<<br><n]< td=""></n]<> | | | ICL | ASS | 6 | ı | Reg | Тур | е | ٨ | ЛајС | )p | | | s5 | • | | Pa | rse | | | | t5 | • | | N | 1inC | )p | | • | х5 | • | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rxx+=vrmpyweh(Rss,Rtt)[: < <n]< td=""></n]<> | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rxx+=vrmpywoh(Rss,Rtt)[: < <n]< td=""></n]<> | | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | <n]< th=""></n]<> | |----|---|---|-----|-----|---|---|-------|----|---|---|----|---|---|----|-----|---|---|---|----|---|---|---|------|---|---|---|----|---|---|-----------------------------------------------| | 1 | 0 | 1 | 0 | 0 | 0 | N | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vrmpyweh(Rss,Rtt)[:<<br><n]< td=""></n]<> | | SS | | F | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | р | | | х5 | | | | | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rxx+=vrmpyweh(Rss,Rtt)[: < <n]< td=""></n]<> | | 1 | 0 | 1 | 0 | 1 | 0 | N | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rxx+=vrmpywoh(Rss,Rtt)[: < <n]< td=""></n]<> | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Description | Instruction class | |----------------------------| | Major opcode | | Minor opcode | | Register type | | Packet/loop parse bits | | Field to encode register d | | Field to encode register s | | Field to encode register t | | Field to encode register x | | | # Multiply and use upper result Multiply two signed or unsigned 32-bit words. Store the upper 32 bits of this result to a single destination register. Optional rounding is available. | Syntax | Behavior | |----------------------------------------------|---------------------------------------------------------------| | Rd=mpy(Rs,Rt.H):<<1:rnd:sat | $Rd = sat_{32}(((Rs * Rt.h[1]) << 1+0x8000) >> 16);$ | | Rd=mpy(Rs,Rt.H):<<1:sat | $Rd = sat_{32}(((Rs * Rt.h[1]) << 1) >> 16);$ | | <pre>Rd=mpy(Rs,Rt.L):&lt;&lt;1:rnd:sat</pre> | $Rd = sat_{32}(((Rs * Rt.h[0]) << 1 + 0 \times 8000) >> 16);$ | | Rd=mpy(Rs,Rt.L):<<1:sat | $Rd = sat_{32}(((Rs * Rt.h[0]) << 1) >> 16);$ | | Rd=mpy(Rs,Rt) | Rd=(Rs * Rt)>>32; | | Rd=mpy(Rs,Rt):<<1 | Rd=(Rs * Rt)>>31; | | Rd=mpy(Rs,Rt):<<1:sat | $Rd=sat_{32}((Rs * Rt)>>31);$ | | Rd=mpy(Rs,Rt):rnd | Rd=((Rs * Rt)+0x80000000)>>32; | | Rd=mpysu(Rs,Rt) | Rd=(Rs * Rt.uw[0])>>32; | | Rd=mpyu(Rs,Rt) | Rd=(Rs.uw[0]) * Rt.uw[0])>>32; | | <pre>Rx+=mpy(Rs,Rt):&lt;&lt;1:sat</pre> | $Rx = sat_{32}((Rx) + ((Rs * Rt) >> 31));$ | | <pre>Rx-=mpy(Rs,Rt):&lt;&lt;1:sat</pre> | $Rx = sat_{32}((Rx) - ((Rs * Rt) >> 31));$ | # Class: XTYPE (slots 2,3) ## **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** | Rd= mpy(Rs,Rt.H):<<1:rnd:sat | Word32 Q6_R_mpy_RRh_s1_rnd_sat(Word32 Rs, Word32 Rt) | |-----------------------------------------------|---------------------------------------------------------------| | Rd= mpy(Rs,Rt.H):<<1:sat | Word32 Q6_R_mpy_RRh_s1_sat(Word32 Rs, Word32 Rt) | | <pre>Rd= mpy(Rs,Rt.L):&lt;&lt;1:rnd:sat</pre> | Word32 Q6_R_mpy_RRl_s1_rnd_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs,Rt.L):<<1:sat | Word32 Q6_R_mpy_RRl_s1_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs,Rt) | Word32 Q6_R_mpy_RR(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs,Rt):<<1 | Word32 Q6_R_mpy_RR_s1(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs,Rt):<<1:sat | Word32 Q6_R_mpy_RR_s1_sat(Word32 Rs, Word32 Rt) | | Rd=mpy(Rs,Rt):rnd | Word32 Q6_R_mpy_RR_rnd(Word32 Rs, Word32 Rt) | | Rd=mpysu(Rs,Rt) | Word32 Q6_R_mpysu_RR(Word32 Rs, Word32 Rt) | | Rd=mpyu(Rs,Rt) | UWord32 Q6_R_mpyu_RR(Word32 Rs, Word32 Rt) | | <pre>Rx+=mpy(Rs,Rt):&lt;&lt;1:sat</pre> | Word32 Q6_R_mpyacc_RR_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx-=mpy(Rs,Rt):&lt;&lt;1:sat</pre> | Word32 Q6_R_mpynac_RR_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|------------------------------------| | | ICL | ASS | ; | I | Reg | Тур | е | N | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=mpy(Rs,Rt):rnd | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=mpyu(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=mpysu(Rs,Rt) | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs,Rt.H):<<1:sat | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rd=mpy(Rs,Rt.L):<<1:sat | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs,Rt.H):<<1:rnd:<br>sat | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs,Rt):<<1:sat | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rd=mpy(Rs,Rt.L):<<1:rnd:s | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | N | 0 | N | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | N | N | d | d | d | d | d | Rd=mpy(Rs,Rt)[:< <n]< td=""></n]<> | | | ICL | ASS | ; | I | Reg | Тур | е | ٨ | ЛаjС | )p | | | s5 | • | • | Pa | rse | | | • | t5 | | • | N | 1inC | )p | | | х5 | | • | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | х | Rx+=mpy(Rs,Rt):<<1:sat | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | х | х | х | х | Rx-=mpy(Rs,Rt):<<1:sat | ## Field name ICLASS t5 x5 ## Description | MajOp | Major opcode | |---------|----------------------------| | MinOp | Minor opcode | | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | Instruction class Field to encode register t Field to encode register x # Multiply and use full result Multiply two signed or unsigned 32-bit words. Optionally, add or subtract this value from the 64-bit accumulator. The result is a full-precision 64-bit value. | Syntax | Behavior | |---------|-----------| | Cylitax | Deliavioi | | Rdd=mpy(Rs,Rt) | Rdd=(Rs * Rt); | |--------------------------------|-------------------------------------------------| | Rdd=mpyu(Rs,Rt) | Rdd=(Rs.uw[0] * Rt.uw[0]); | | Rxx[+-] = mpy(Rs,Rt) | <pre>Rxx= Rxx [+-] (Rs * Rt);</pre> | | <pre>Rxx[+-]=mpyu(Rs,Rt)</pre> | <pre>Rxx= Rxx [+-] (Rs.uw[0] * Rt.uw[0]);</pre> | Class: XTYPE (slots 2,3) ## **Intrinsics** | Rdd=mpy(Rs,Rt) | Word64 Q6_P_mpy_RR(Word32 Rs, Word32 Rt) | |----------------------------|----------------------------------------------------------| | Rdd=mpyu(Rs,Rt) | UWord64 Q6_P_mpyu_RR(Word32 Rs, Word32 Rt) | | <pre>Rxx+=mpy(Rs,Rt)</pre> | Word64 Q6_P_mpyacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx+=mpyu(Rs,Rt) | Word64 Q6_P_mpyuacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-=mpy(Rs,Rt) | Word64 Q6_P_mpynac_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx-=mpyu(Rs,Rt) | Word64 Q6_P_mpyunac_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=mpy(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=mpyu(Rs,Rt) | | | ICL | ASS | 3 | ı | Reg | Тур | е | N | /ajC | )p | | | s5 | | | Pa | rse | | | • | t5 | | | ٨ | /linC | )p | | | х5 | | • | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | х | x | х | х | х | Rxx+=mpy(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | х | x | х | х | х | Rxx-=mpy(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | х | Rxx+=mpyu(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | x | x | x | x | х | Rxx-=mpyu(Rs,Rt) | Field name Description ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x # Vector dual multiply Multiply four 16-bit halfwords in Rss by the corresponding 16-bit halfwords in Rtt. The two lower results are scaled and added. The lower word of the accumulator is optionally added. This result is saturated to 32 bits and stored in the lower word of the accumulator. The same operation is performed on the upper two products using the upper word of the accumulator. Syntax Behavior ``` Rdd = Rdd.w[0] = sat_{32}((Rss.h[0] * Rtt.h[0]) << 1 + vdmpy(Rss,Rtt):<<1:sat (Rss.h[1] * Rtt.h[1])<<1); Rdd.w[1] = sat_{32}((Rss.h[2] * Rtt.h[2]) << 1 + (Rss.h[3] * Rtt.h[3])<<1); Rdd = vdmpy(Rss,Rtt):sat Rdd.w[0]=sat_{32}((Rss.h[0] * Rtt.h[0]) << 0 + (Rss.h[1] * Rtt.h[1])<<0); Rdd.w[1] = sat_{32}((Rss.h[2] * Rtt.h[2]) << 0 + (Rss.h[3] * Rtt.h[3])<<0); Rxx += Rxx.w[0] = sat_{32}(Rxx.w[0] + (Rss.h[0] * vdmpy(Rss,Rtt):<<1:sat Rtt.h[0])<<1 + (Rss.h[1] * Rtt.h[1])<<1);</pre> Rxx.w[1] = sat_{32}(Rxx.w[1] + (Rss.h[2] * Rtt.h[2])<<1 + (Rss.h[3] * Rtt.h[3])<<1);</pre> ``` #### **Syntax Behavior** $Rxx.w[0]=sat_{32}(Rxx.w[0] + (Rss.h[0] *$ Rxx += ``` vdmpy(Rss,Rtt):sat Rtt.h[0])<<0 + (Rss.h[1] * Rtt.h[1])<<0);</pre> Rxx.w[1] = sat_{32}(Rxx.w[1] + (Rss.h[2] * Rtt.h[2])<<0 + (Rss.h[3] * Rtt.h[3])<<0);</pre> ``` Class: XTYPE (slots 2,3) ## **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** | <pre>Rdd=vdmpy(Rss,Rtt):&lt;&lt;1:sat</pre> | Word64 Q6_P_vdmpy_PP_s1_sat(Word64 Rss, Word64 Rtt) | |----------------------------------------------|--------------------------------------------------------------------| | Rdd=vdmpy(Rss,Rtt):sat | Word64 Q6_P_vdmpy_PP_sat(Word64 Rss, Word64 Rtt) | | <pre>Rxx+=vdmpy(Rss,Rtt):&lt;&lt;1:sat</pre> | Word64 Q6_P_vdmpyacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=vdmpy(Rss,Rtt):sat</pre> | Word64 Q6_P_vdmpyacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-------------------------------------------------| | | ICL | ASS | 3 | | Reg | Тур | е | Λ | /lajC | р | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vdmpy(Rss,Rtt)[:< <n]: sat<="" td=""></n]:> | | | ICL | ASS | 3 | | Reg | Тур | е | Λ | /ajO | р | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rxx+=vdmpy(Rss,Rtt)[:< <n<br>]:sat</n<br> | | Ī | IC | CL/ | ٩SS | ; | I | Reg | Тур | е | Ν | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | |---|-----|-----|-----|---|---|-----|-----|---|---|------|----|---|---|----|---|---|----|-----|---|---|---|----|---|---|---|------|----|---|---|----|---|---|-------------------------------------------------| | | 1 - | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vdmpy(Rss,Rtt)[:< <n]: sat<="" td=""></n]:> | | | IC | CL/ | ASS | ; | - | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | р | | | х5 | | | | | - | ۱ . | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | x | x | x | x | x | Rxx+=vdmpy(Rss,Rtt)[:< <n<br>]:sat</n<br> | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MajOp | Major opcode | | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | | | | # Vector dual multiply with round and pack Multiply four 16-bit halfwords in Rss by the corresponding 16-bit halfwords in Rtt. The two lower results are scaled and added together with a rounding constant. This result is saturated to 32 bits, and the upper 16 bits of this result are stored in the lower 16 bits of the destination register. The same operation is performed on the upper two products and the result is stored in the upper 16-bit halfword of the destination. Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** ``` Rd = Word32 Q6_R_vdmpy_PP_s1_rnd_sat(Word64 Rss, vdmpy(Rss,Rtt):<<1:rnd:sat</pre> Word64 Rtt) Rd = vdmpy(Rss,Rtt):rnd:sat Word32 Q6_R_vdmpy_PP_rnd_sat(Word64 Rss, Word64 Rtt) ``` # **Encoding** ICLASS Maj0p | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|------|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|------------------------------------------------| | | IC | CLA | SS | | ı | Reg | Туре | Э | N | 1ajC | р | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | , | 1 | 1 | 0 | 1 | 0 | 0 | 1 | N | 0 | 0 | s | w | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rd=vdmpy(Rss,Rtt)[:< <n]:r<br>nd:sat</n]:r<br> | #### Field name Description Instruction class MinOp Minor opcode Register type RegType Packet/loop parse bits Parse Field to encode register d d5 Major opcode Field to encode register s s5 Field to encode register t t5 # Vector reduce multiply bytes Multiply eight 8-bit bytes in Rss by the corresponding 8-bit bytes in Rtt. The four lower results are accumulated. The lower word of the accumulator is optionally added. This result is stored in the lower 32 bits of the accumulator. The same operation is performed on the upper four products using the upper word of the accumulator. The eight bytes of Rss can be treated as either signed or unsigned. Syntax Behavior ``` Rdd = Rdd.w[0] = ((Rss.b[0] * Rtt.ub[0]) + (Rss.b[1] * vrmpybsu (Rss, Rtt) Rtt.ub[1]) + (Rss.b[2] * Rtt.ub[2]) + (Rss.b[3] * Rtt.ub[3])); Rdd.w[1] = ((Rss.b[4] * Rtt.ub[4]) + (Rss.b[5] * Rtt.ub[5]) + (Rss.b[6] * Rtt.ub[6]) + (Rss.b[7] * Rtt.ub[7])); Rdd.w[0] = ((Rss.ub[0] * Rtt.ub[0]) + (Rss.ub[1] * Rdd = vrmpybu (Rss, Rtt) Rtt.ub[1]) + (Rss.ub[2] * Rtt.ub[2]) + (Rss.ub[3] * Rtt.ub[3])); Rdd.w[1] = ((Rss.ub[4] * Rtt.ub[4]) + (Rss.ub[5] * Rtt.ub[5]) + (Rss.ub[6] * Rtt.ub[6]) + (Rss.ub[7] * Rtt.ub[7])); Rxx.w[0] = (Rxx.w[0] + (Rss.b[0] * Rtt.ub[0]) + Rxx += vrmpybsu (Rss, Rtt) (Rss.b[1] * Rtt.ub[1]) + (Rss.b[2] * Rtt.ub[2]) + (Rss.b[3] * Rtt.ub[3])); Rxx.w[1] = (Rxx.w[1] + (Rss.b[4] * Rtt.ub[4]) + (Rss.b[5] * Rtt.ub[5]) + (Rss.b[6] * Rtt.ub[6]) + (Rss.b[7] * Rtt.ub[7])); ``` ## Syntax Behavior Class: XTYPE (slots 2,3) ## **Intrinsics** | Rdd=vrmpybsu(Rss,Rtt) | Word64 Q6_P_vrmpybsu_PP(Word64 Rss, Word64 Rtt) | |------------------------|----------------------------------------------------------------| | Rdd=vrmpybu(Rss,Rtt) | Word64 Q6_P_vrmpybu_PP(Word64 Rss, Word64 Rtt) | | Rxx+=vrmpybsu(Rss,Rtt) | Word64 Q6_P_vrmpybsuacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | | Rxx+=vrmpybu(Rss,Rtt) | Word64 Q6_P_vrmpybuacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | # **Encoding** Maj0p | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ν | ЛајС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vrmpybu(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vrmpybsu(Rss,Rtt) | | | ICL | ASS | 3 | ı | Reg | Тур | е | ٨ | ЛајС | )p | | | s5 | • | • | Pa | rse | | | • | t5 | | | N | /linC | )p | | | х5 | • | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | х | х | x | х | Rxx+=vrmpybu(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rxx+=vrmpybsu(Rss,Rtt) | | | | <b>-</b> | |------------|-------------------|-------------| | Field name | | Description | | ICLASS | Instruction class | | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | Major opcode # Vector dual multiply signed by unsigned bytes Multiply eight 8-bit signed bytes in Rss by the corresponding 8-bit unsigned bytes in Rtt. Add the results in pairs, and optionally add the accumulator. The results are saturated to signed 16 bits and stored in the four halfwords of the destination register. Syntax Behavior ``` Rdd = Rdd.h[0] = sat_{16}(((Rss.b[0] * Rtt.ub[0]) + vdmpybsu(Rss,Rtt):sat (Rss.b[1] * Rtt.ub[1]))); Rdd.h[1] = sat_{16}(((Rss.b[2] * Rtt.ub[2]) + (Rss.b[3] * Rtt.ub[3]))); Rdd.h[2] = sat_{16}(((Rss.b[4] * Rtt.ub[4]) + (Rss.b[5] * Rtt.ub[5]))); Rdd.h[3] = sat_{16}(((Rss.b[6] * Rtt.ub[6]) + (Rss.b[7] * Rtt.ub[7]))); Rxx.h[0] = sat_{16}((Rxx.h[0] + (Rss.b[0] * Rxx += vdmpybsu(Rss,Rtt):sat Rtt.ub[0]) + (Rss.b[1] * Rtt.ub[1]))); Rtt.ub[2]) + (Rss.b[3] * Rtt.ub[3]))); Rxx.h[2] = sat_{16}((Rxx.h[2] + (Rss.b[4] * Rtt.ub[4]) + (Rss.b[5] * Rtt.ub[5]))); Rxx.h[3] = sat_{16}((Rxx.h[3] + (Rss.b[6] * Rtt.ub[6]) + (Rss.b[7] * Rtt.ub[7]))); ``` # Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** | Rdd=vdmpybsu(Rss,Rtt):sat | Word64<br>Word64 | Q6_P_vdmpybsu_PP_sat(Word64 Rss, Rtt) | |---------------------------------------|------------------|------------------------------------------------------| | <pre>Rxx+=vdmpybsu(Rss,Rtt):sat</pre> | | Q6_P_vdmpybsuacc_PP_sat(Word64 Rxx, Rss, Word64 Rtt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-----------------------------| | | ICL | ASS | 3 | 1 | Reg | Тур | е | N | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | Λ | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vdmpybsu(Rss,Rtt):sa t | | | ICL | ASS | 3 | ı | Reg | Тур | е | Λ | /lajC | )p | | | s5 | | • | Pa | rse | | | | t5 | • | | ٨ | /linC | )p | | | х5 | | • | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | s | s | s | ø | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | x | x | x | x | x | Rxx+=vdmpybsu(Rss,Rtt):s at | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MajOp | Major opcode | |---------|----------------------------| | MinOp | Minor opcode | | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | # Vector multiply even halfwords Multiply the even 16-bit halfwords from Rss and Rtt separately. Optionally accumulate with the low and high words of the destination register pair and optionally saturate. | Syntax | Behavior | |-----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rdd=vmpyeh(Rss,Rtt):<<1:sat | Rdd.w[0]=sat <sub>32</sub> ((Rss.h[0] * Rtt.h[0])<<1);<br>Rdd.w[1]=sat <sub>32</sub> (((Rss.h[2] * Rtt.h[2])<<1); | | Rdd=vmpyeh(Rss,Rtt):sat | Rdd.w[0]=sat <sub>32</sub> (((Rss.h[0] * Rtt.h[0])<<0);<br>Rdd.w[1]=sat <sub>32</sub> (((Rss.h[2] * Rtt.h[2])<<0); | | <pre>Rxx+=vmpyeh(Rss,Rtt)</pre> | <pre>Rxx.w[0]=Rxx.w[0] + (Rss.h[0] * Rtt.h[0]); Rxx.w[1]=Rxx.w[1] + (Rss.h[2] * Rtt.h[2]);</pre> | | <pre>Rxx+=vmpyeh(Rss,Rtt):&lt;&lt;1:sat</pre> | <pre>Rxx.w[0] = sat<sub>32</sub>((Rxx.w[0] + (Rss.h[0] * Rtt.h[0]) &lt;&lt; 1); Rxx.w[1] = sat<sub>32</sub>((Rxx.w[1] + (Rss.h[2] * Rtt.h[2]) &lt;&lt; 1);</pre> | | <pre>Rxx+=vmpyeh(Rss,Rtt):sat</pre> | <pre>Rxx.w[0]=sat<sub>32</sub>((Rxx.w[0] + (Rss.h[0] * Rtt.h[0])&lt;&lt;0); Rxx.w[1]=sat<sub>32</sub>((Rxx.w[1] + (Rss.h[2] * Rtt.h[2])&lt;&lt;0);</pre> | # Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** | <pre>Rdd=vmpyeh(Rss,Rtt):&lt;&lt;1:sat</pre> | Word64 Q6_P_vmpyeh_PP_s1_sat(Word64 Rss, Word64 Rtt) | |-----------------------------------------------|---------------------------------------------------------------------| | Rdd=vmpyeh(Rss,Rtt):sat | Word64 Q6_P_vmpyeh_PP_sat(Word64 Rss, Word64 Rtt) | | Rxx+=vmpyeh(Rss,Rtt) | Word64 Q6_P_vmpyehacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=vmpyeh(Rss,Rtt):&lt;&lt;1:sat</pre> | Word64 Q6_P_vmpyehacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | | <pre>Rxx+=vmpyeh(Rss,Rtt):sat</pre> | Word64 Q6_P_vmpyehacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) | ## **Encoding** Parse d5 s5 t5 x5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|---|---|---|----|---|---|--------------------------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Λ | /lajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | Λ | /linC | р | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | d | d | d | d | d | Rdd=vmpyeh(Rss,Rtt)[:< <n]:sat< td=""></n]:sat<> | | | ICL | ASS | 3 | ı | Reg | Тур | е | ٨ | /lajC | )p | | | s5 | • | | Pa | rse | | | | t5 | | • | ٨ | /linC | р | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | х | х | х | х | x | Rxx+=vmpyeh(Rss,Rtt) | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | x | x | x | x | x | Rxx+=vmpyeh(Rss,Rtt)[:<< N]:sat | | • | | | | | | | | | | | • | | • | | | | | | | • | |----|-----|-----|----|----|-----|-----|------|-----|------|----|---|--|---|---|----|-----|-----|----|--|---| | | Fie | ld | na | me | | | | | | | | | | [ | es | cri | pti | on | | | | IC | LAS | SS | | | Ins | tru | ctic | n o | clas | SS | | | | | | | | | | | | Ma | j0 | 0 | | | Ма | jor | ор | CO | de | | | | | | | | | | | | | Mi | nOn | 0 | | | Mir | nor | ор | CO | de | | | | | | | | | | | | | Re | gΤ | уре | 9 | | Re | gis | ter | typ | е | | | | | | | | | | | | Packet/loop parse bits Field to encode register d Field to encode register s Field to encode register t Field to encode register x # Vector multiply halfwords Multiply two 16-bit halfwords separately, and optionally accumulate with the low and high words of the destination. Optionally saturate, and store the results back to the destination register pair. | Syntax | Behavior | |----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rdd=vmpyh(Rs,Rt)[:<<1]:sat | Rdd.w[0]=sat <sub>32</sub> ((Rs.h[0] * Rt.h[0])[<<1]);<br>Rdd.w[1]=sat <sub>32</sub> ((Rs.h[1] * Rt.h[1])[<<1]); | | <pre>Rxx+=vmpyh(Rs,Rt)</pre> | <pre>Rxx.w[0] = Rxx.w[0] + (Rs.h[0] * Rt.h[0]); Rxx.w[1] = Rxx.w[1] + (Rs.h[1] * Rt.h[1]);</pre> | | <pre>Rxx+=vmpyh(Rs,Rt)[:&lt;&lt;1]:sat</pre> | <pre>Rxx.w[0] = sat<sub>32</sub>(Rxx.w[0] + (Rs.h[0] * Rt.h[0])[&lt;&lt;1]); Rxx.w[1] = sat<sub>32</sub>(Rxx.w[1] + (Rs.h[1] * Rt.h[1])[&lt;&lt;1]);</pre> | # Class: XTYPE (slots 2,3) ## **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. # Intrinsics | Rdd=vmpyh(Rs,Rt):<<1:sat | Word64 Q6_P_vmpyh_RR_s1_sat(Word32 Rs, Word32 Rt) | |--------------------------------------------|------------------------------------------------------------------| | Rdd=vmpyh(Rs,Rt):sat | Word64 Q6_P_vmpyh_RR_sat(Word32 Rs, Word32 Rt) | | Rxx+=vmpyh(Rs,Rt) | Word64 Q6_P_vmpyhacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=vmpyh(Rs,Rt):&lt;&lt;1:sat</pre> | Word64 Q6_P_vmpyhacc_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) | | <pre>Rxx+=vmpyh(Rs,Rt):sat</pre> | Word64 Q6_P_vmpyhacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) | # **Encoding** Field name x5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | N | 0 | 0 | s | s | s | s | s | Р | P | 0 | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vmpyh(Rs,Rt)[:< <n]:s at<="" td=""></n]:s> | | | ICL | ASS | 3 | F | Reg | Тур | е | N | 1ajO | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | x | х | х | x | Rxx+=vmpyh(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rxx+=vmpyh(Rs,Rt)[:< <n]: sat<="" td=""></n]:> | | ICLASS | Instruction class | |---------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | Field to encode register $\boldsymbol{x}$ ## Description # Vector multiply halfwords with round and pack Multiply two 16-bit halfwords separately. Round the results, and store the high halfwords packed in a single register destination. Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** Rd=vmpyh(Rs,Rt):<<1:rnd:sat Word32 Q6\_R\_vmpyh\_RR\_s1\_rnd\_sat(Word32 Rs, Word32 Rt) Rd=vmpyh(Rs,Rt):rnd:sat Word32 Q6\_R\_vmpyh\_RR\_rnd\_sat(Word32 Rs, Word32 Rt) # **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|----------------------------------------------------| | | ICI | ASS | 3 | ı | Reg | Тур | Э | N | 1ajC | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | N | 0 | 1 | s | Ø | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rd=vmpyh(Rs,Rt)[:< <n]:rn d:sat<="" td=""></n]:rn> | ## Field name Description ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t # Vector multiply halfwords, signed by unsigned Multiply two 16-bit halfwords. Rs is considered signed, Ru unsigned. #### **Syntax Behavior** ``` Rdd = Rdd.w[0] = sat_{32}((Rs.h[0] * Rt.uh[0])[<<1]); vmpyhsu(Rs,Rt)[:<<1]:sat</pre> Rdd.w[1] = sat_{32}((Rs.h[1] * Rt.uh[1])[<<1]); Rxx += Rxx.w[0]=sat_{32}(Rxx.w[0] + (Rs.h[0] * vmpyhsu(Rs,Rt)[:<<1]:sat</pre> Rt.uh[0])[<<1]); Rxx.w[1] = sat_{32}(Rxx.w[1] + (Rs.h[1] * Rt.uh[1]) [<<1]); ``` Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** ``` Rdd= Word64 Q6 P vmpyhsu RR s1 sat (Word32 Rs, Word32 Rt) vmpyhsu(Rs,Rt):<<1:sat</pre> Word64 Q6 P vmpyhsu RR sat(Word32 Rs, Word32 Rt) Rdd= vmpyhsu(Rs,Rt):sat Rxx+= Word64 Q6 P vmpyhsuacc RR s1 sat(Word64 Rxx, Word32 Rs, Word32 Rt) vmpyhsu(Rs,Rt):<<1:sat Rxx+= vmpyhsu(Rs,Rt):sat Word64 Q6_P_vmpyhsuacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) ``` ## **Encoding** Maj0p | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|------------------------------------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | N | ЛаjС | )p | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | N | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vmpyhsu(Rs,Rt)[:< <n] :sat<="" td=""></n]> | | | ICL | ASS | 3 | ı | Reg | Тур | е | ٨ | ЛаjС | )p | | • | s5 | • | • | Pa | rse | | | | t5 | • | | N | 1inC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | N | 1 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rxx+=vmpyhsu(Rs,Rt)[:<<<br>N]:sat | | 3 | I 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|------|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | N | 1ajC | р | | | s5 | | | Pa | rse | | | | t5 | | | Λ | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | N | 0 | 0 | s | ø | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vmpyhsu(Rs,Rt)[:< <n] :sat<="" th=""></n]> | | | ICL | ASS | 3 | F | Reg | Тур | е | N | 1ajO | р | | | s5 | | • | Pa | rse | | | • | t5 | • | | ٨ | /linC | )p | | | х5 | | • | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | N | 1 | 1 | s | ø | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | x | x | x | x | x | Rxx+=vmpyhsu(Rs,Rt)[:<<<br>N]:sat | #### Field name Description ICLASS Instruction class MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t Field to encode register x x5 Major opcode # Vector reduce multiply halfwords Multiply each halfword of Rss by the corresponding halfword in Rtt. Add the intermediate products together and then optionally add the accumulator. Store the full 64-bit result in the destination register pair. This instruction is known as "big mac". #### ## Class: XTYPE (slots 2,3) ### **Intrinsics** ``` Rdd=vrmpyh(Rss,Rtt) Word64 Q6_P_vrmpyh_PP(Word64 Rss, Word64 Rtt) Rxx+=vrmpyh(Rss,Rtt) Word64 Q6_P_vrmpyhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) ``` # **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|------|------------|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | ЛаjС | <b>)</b> p | | | s5 | | | Pa | rse | | | | t5 | | | Ν | /linC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vrmpyh(Rss,Rtt) | | | ICL | ASS | 3 | | Reg | Тур | е | N | ЛаjС | )p | | | s5 | | | Pai | rse | | | | t5 | | | ٨ | /linC | )p | | | х5 | | | | | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | x | х | х | х | х | Rxx+=vrmpyh(Rss,Rtt) | | Field name | | Description | |------------|-------------------|-------------| | CLASS | Instruction class | | ICLASS Instruction class MajOp Major opcode MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x # Vector multiply bytes Four 8-bit bytes from register Rs are multiplied by four 8-bit bytes from Rt. The product is optionally accumulated with the 16-bit value from the destination register. The 16-bit results are packed in the destination register pair. The bytes of Rs can be treated as either signed or unsigned. #### **Syntax Behavior** Rdd=vmpybsu(Rs,Rt) Rdd.h[0]=((Rs.b[0] \* Rt.ub[0])); Rdd.h[1] = ((Rs.b[1] \* Rt.ub[1]));Rdd.h[2] = ((Rs.b[2] \* Rt.ub[2]));Rdd.h[3] = ((Rs.b[3] \* Rt.ub[3]));Rdd.h[0]=((Rs.ub[0] \* Rt.ub[0])); Rdd=vmpybu(Rs,Rt) Rdd.h[1]=((Rs.ub[1] \* Rt.ub[1])); Rdd.h[2]=((Rs.ub[2] \* Rt.ub[2])); Rdd.h[3] = ((Rs.ub[3] \* Rt.ub[3]));Rxx+=vmpybsu(Rs,Rt) Rxx.h[0] = (Rxx.h[0] + (Rs.b[0] \* Rt.ub[0]));Rxx.h[1] = (Rxx.h[1] + (Rs.b[1] \* Rt.ub[1]));Rxx.h[2] = (Rxx.h[2] + (Rs.b[2] \* Rt.ub[2]));Rxx.h[3] = (Rxx.h[3] + (Rs.b[3] \* Rt.ub[3]));Rxx+=vmpybu(Rs,Rt) Rxx.h[0] = (Rxx.h[0] + (Rs.ub[0] \* Rt.ub[0]));Rxx.h[1] = (Rxx.h[1] + (Rs.ub[1] \* Rt.ub[1]));Rxx.h[2] = (Rxx.h[2] + (Rs.ub[2] \* Rt.ub[2])); Rxx.h[3] = (Rxx.h[3] + (Rs.ub[3] \* Rt.ub[3])); # Class: XTYPE (slots 2,3) # Intrinsics | Rdd=vmpybsu(Rs,Rt) | Word64 Q6_P_vmpybsu_RR(Word32 Rs, Word32 Rt) | |---------------------|-------------------------------------------------------------| | Rdd=vmpybu(Rs,Rt) | Word64 Q6_P_vmpybu_RR(Word32 Rs, Word32 Rt) | | Rxx+=vmpybsu(Rs,Rt) | Word64 Q6_P_vmpybsuacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | | Rxx+=vmpybu(Rs,Rt) | Word64 Q6_P_vmpybuacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) | # **Encoding** MajOp | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|------|--------|----|----|----|----|----|----|------|----|----|----|----|---|---|---|------|----|---|---|----|---|---|---------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ν | ЛајС | jOp s5 | | | | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | d5 | | | | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vmpybsu(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | d | d | d | d | d | Rdd=vmpybu(Rs,Rt) | | | ICL | ASS | 6 | ı | Reg | Тур | е | ٨ | ЛајС | )p | | | s5 | • | • | Pa | arse | | | | t5 | | • | N | 1inC | )p | | | х5 | | • | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | x | х | х | х | Rxx+=vmpybu(Rs,Rt) | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | х | x | х | х | х | Rxx+=vmpybsu(Rs,Rt) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | MinOp | Minor opcode | |---------|----------------------------| | RegType | Register type | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | Major opcode # Vector polynomial multiply halfwords Perform a vector $16 \times 16$ carryless polynomial multiply using 32-bit source registers Rs and Rt. The 64-bit result is stored in packed H, H, L, L format in the destination register. The destination register can optionally be accumulated (XOR'd). Finite field multiply instructions are useful for many algorithms including scramble code generation, cryptographic algorithms, convolutional, and Reed Solomon codes. Syntax Behavior ``` Rdd=vpmpyh(Rs,Rt) x0 = Rs.uh[0]; x1 = Rs.uh[1]; y0 = Rt.uh[0]; y1 = Rt.uh[1]; prod0 = prod1 = 0; for(i=0; i < 16; i++) { if((y0 >> i) & 1) prod0 ^= (x0 << i); if((y1 >> i) & 1) prod1 ^= (x1 << i); } Rdd.h[0]=prod0.uh[0]; Rdd.h[1]=prod1.uh[0]; Rdd.h[2]=prod0.uh[1]; Rdd.h[3]=prod1.uh[1];</pre> ``` ### Syntax Behavior ``` Rxx^=vpmpyh(Rs,Rt) x0 = Rs.uh[0]; x1 = Rs.uh[1]; y0 = Rt.uh[0]; y1 = Rt.uh[1]; prod0 = prod1 = 0; for(i=0; i < 16; i++) { if((y0 >> i) & 1) prod0 ^= (x0 << i); if((y1 >> i) & 1) prod1 ^= (x1 << i); } Rxx.h[0]=Rxx.uh[0] ^ prod0.uh[0]; Rxx.h[1]=Rxx.uh[2] ^ prod0.uh[1]; Rxx.h[3]=Rxx.uh[3] ^ prod1.uh[1];</pre> ``` Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rdd=vpmpyh(Rs,Rt) Word64 Q6_P_vpmpyh_RR(Word32 Rs, Word32 Rt) Rxx^=vpmpyh(Rs,Rt) Word64 Q6_P_vpmpyhxacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) ``` ## **Encoding** Maj0p | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|----|----|---------------------------|-----|-----|----|----|------|----|-------|----|----|----|----|----|----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | Γ | ICL | AS | S | | Reg | Тур | е | N | 1ajC | р | | | s5 | | | Pa | se | | | | t5 | | | N | /linC | )p | | | d5 | | | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | s | Ø | Ø | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | d | d | d | d | d | Rdd=vpmpyh(Rs,Rt) | | | ICL | AS | S | RegType MajOp s5 Parse t5 | | | | | | N | /linC | )p | | | х5 | • | | | | | | | | | | | | | | | | | | | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | х | x | х | х | х | Rxx^=vpmpyh(Rs,Rt) | | Field name | | Description | |------------|-------------------|-------------| | TCLASS | Instruction class | Docomption | MinOp Minor opcode RegType Register type Parse Packet/loop parse bits d5 Field to encode register d s5 Field to encode register s t5 Field to encode register t x5 Field to encode register x Major opcode ## **11.10.6 XTYPE PERM** The XTYPE PERM instruction subclass includes instructions that perform permutations. Permute operations perform various operations on vector data, including arithmetic, format conversion, and rearrangement of vector elements. ## CABAC decode bin This is a special-purpose instruction to support H.264 context adaptive binary arithmetic coding (CABAC). See Section Section 4.8.1 for a complete description. Syntax Behavior ``` Rdd=decbin(Rss,Rtt) state = Rtt.w[1][5:0]; valMPS = Rtt.w[1][8:8]; bitpos = Rtt.w[0][4:0]; range = Rss.w[0]; offset = Rss.w[1]; range <<= bitpos; offset <<= bitpos; rLPS = rLPS table 64x4[state][ (range >>29)&3]; rLPS = rLPS << 23; rMPS= (range&0xff800000) - rLPS; if (offset < rMPS) { Rdd = AC_next_state_MPS_64[state]; Rdd[8:8]=valMPS; Rdd[31:23] = (rMPS >> 23); Rdd.w[1]=offset; P0=valMPS; } else { Rdd = AC_next_state_LPS_64[state]; Rdd[8:8] = ((!state)?(1-valMPS):(valMPS)); Rdd[31:23] = (rLPS >> 23); Rdd.w[1] = (offset-rMPS); P0=(valMPS^1); ``` Class: XTYPE (slots 2,3) ### **Notes** The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|---------------------| | | ICL | .ASS | 3 | | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | - | s | s | w | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rdd=decbin(Rss,Rtt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | - | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | Maj | Major opcode | | | Min | Minor opcode | | | RegType | Register type | | | | | | ## Saturate Saturate a single scalar value. The sath instruction saturates a signed 32-bit number to a signed 16-bit number, which is signextended back to 32 bits and placed in the destination register. The minimum negative value of the result is 0xffff8000 and the maximum positive value is 0x00007fff. The satuh instruction saturates a signed 32-bit number to an unsigned 16-bit number, which is zero-extended back to 32 bits and placed in the destination register. The minimum value of the result is 0 and the maximum value is 0x0000ffff. The satb instruction saturates a signed 32-bit number to an signed 8-bit number, which is signextended back to 32 bits and placed in the destination register. The minimum value of the result is 0xffffff80 and the maximum value is 0x0000007f. The satub instruction saturates a signed 32-bit number to an unsigned 8-bit number, which is zero-extended back to 32 bits and placed in the destination register. The minimum value of the result is 0 and the maximum value is 0x000000ff. | Syntax | Behavior | |--------------|------------------------------| | Rd=sat(Rss) | $Rd = sat_{32}(Rss);$ | | Rd=satb(Rs) | $Rd = sat_8(Rs);$ | | Rd=sath(Rs) | $Rd = sat_{16}(Rs);$ | | Rd=satub(Rs) | Rd = usat <sub>8</sub> (Rs); | | Rd=satuh(Rs) | $Rd = usat_{16}(Rs);$ | ## Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | Rd=sat(Rss) | Word32 | Q6_R_sat_P(Word64 Rss) | |--------------|--------|-------------------------| | Rd=satb(Rs) | Word32 | Q6_R_satb_R(Word32 Rs) | | Rd=sath(Rs) | Word32 | Q6_R_sath_R(Word32 Rs) | | Rd=satub(Rs) | Word32 | Q6_R_satub_R(Word32 Rs) | | Rd=satuh(Rs) | Word32 | Q6_R_satuh_R(Word32 Rs) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|---|---|--------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | р | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | P | Р | - | - | | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=sat(Rss) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rd=sath(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | | 1 | 0 | 1 | d | d | d | d | d | Rd=satuh(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | | 1 | 1 | 0 | d | d | d | d | d | Rd=satub(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | | 1 | 1 | 1 | d | d | d | d | d | Rd=satb(Rs) | | Field name | Description | |------------|-------------| |------------|-------------| ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register s MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type # Swizzle bytes Swizzle the bytes of a word. This instruction is useful in converting between little and big endian formats. Syntax Behavior | o[0]=Rs.b[3]; | |---------------| | o[1]=Rs.b[2]; | | o[2]=Rs.b[1]; | | o[3]=Rs.b[0]; | | | Class: XTYPE (slots 2,3) ## **Intrinsics** ``` Rd=swiz(Rs) Word32 Q6_R_swiz_R(Word32 Rs) ``` ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|---|---|---|----|---|---|-------------| | | ICI | LASS | 3 | | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | р | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | | - | - | | - | - | 1 | 1 | 1 | d | d | d | d | d | Rd=swiz(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | # Vector align Align a vector. Use the immediate amount, or the least significant three bits of a Predicate register, as the number of bytes to align. Shift the Rss register pair right by this number of bytes. Fill the vacated positions with the least significant elements from Rtt. ## Syntax Behavior | <pre>Rdd = valignb(Rtt,Rss,#u3)</pre> | Rdd = $(Rss >>> #u*8) (Rtt << ((8-#u)*8));$ | |---------------------------------------|--------------------------------------------------------------------------| | <pre>Rdd = valignb(Rtt,Rss,Pu)</pre> | PREDUSE_TIMING;<br>Rdd = Rss >>> (Pu&0x7)*8 (Rtt << (8-<br>(Pu&0x7))*8); | # Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=valignb(Rtt,Rss,#u3) | Word64 Q6_P_valignb_PPI(Word64 Rtt, Word64 Rss, Word32 Iu3) | |--------------------------|-------------------------------------------------------------| | Rdd=valignb(Rtt,Rss,Pu) | Word64 Q6_P_valignb_PPp(Word64 Rtt, Word64 Rss, Byte Pu) | ## **Encoding** | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|--------|-----|-----|----|-----|-----|-----|----|-----|-----|----|----|----|----|----|----|-----|-----|----|----|----|----|---|---|---|-----|---|---|----|----|---|---|--------------------------| | | IC | CLA | ASS | | ı | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | d5 | | | | | 1 | , | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | - | - | s | s | s | s | s | P | Р | - | t | t | t | t | t | i | i | i | d | d | d | d | d | Rdd=valignb(Rtt,Rss,#u3) | | | ICLASS | | | ı | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | | u | 2 | | | d5 | | | | | | 1 | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | u | u | d | d | d | d | d | Rdd=valignb(Rtt,Rss,Pu) | | SS | 3 | I | Reg | Тур | е | M | aj | | | | s5 | | | Pa | rse | | | | t5 | | | | u | 2 | | | d5 | | | | |----------------|---|-----|-----|-----|----|---|----|---|---|-------------------|----|---|---|----|-----|--|---|---|----|-----|-----|----|---|---|---|---|----|---|---|-------------| | ) | 0 | 0 | 0 | 1 | 0 | 0 | | - | ø | ø | s | s | s | Р | P | | t | t | t | t | t | - | u | u | d | d | d | d | d | Rdd=valignb | | | | Fie | eld | na | me | , | | | | | | | | | | | | D | es | cri | pti | on | | | | | | | | | | ICLASS Instruc | | | | | | | | | | Instruction class | | | | | | | | | | | | | | | | | | | | | | ICHADD | motraction class | |---------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | u2 | Field to encode register u | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | # Vector round and pack Add the constant 0x00008000 to each word in the 64-bit source vector Rss. Optionally saturate this addition to 32 bits. Pack the high halfwords of the result into the corresponding halfword of the 32-bit destination register. | Rd=vrndwh(Rss) | <pre>for (i=0;i&lt;2;i++) { Rd.h[i]=(Rss.w[i]+0x08000).h[1]; }</pre> | |--------------------|---------------------------------------------------------------------------------------| | Rd=vrndwh(Rss):sat | <pre>for (i=0;i&lt;2;i++) { Rd.h[i]=sat<sub>32</sub>(Rss.w[i]+0x08000).h[1];</pre> | Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** | Rd=vrndwh(Rss) | Word32 Q6_R_vrndwh_P(Word64 Rss) | |--------------------|--------------------------------------| | Rd=vrndwh(Rss):sat | Word32 Q6_R_vrndwh_P_sat(Word64 Rss) | # **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | Ī | ICL | ASS | 3 | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | Ν | /linC | )p | | | d5 | | | | | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | 0 | d | d | d | d | d | Rd=vrndwh(Rss) | | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 1 | 0 | d | d | d | d | d | Rd=vrndwh(Rss):sat | ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register s MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type # Vector saturate and pack For each element in the vector, saturate the value to the next smaller size. The <code>vsathub</code> instruction saturates signed halfwords to unsigned bytes, while the <code>vsathb</code> instruction saturates signed halfwords to signed bytes. The vsatwh instruction saturates signed words to signed halfwords, while the vsatwuh instruction saturates signed words to unsigned halfwords. The resulting values are packed together into destination register Rd. **Syntax** | - <b>,</b> | | |-----------------|-----------------------------------------------------------------------------| | Rd=vsathub(Rss) | <pre>for (i=0;i&lt;4;i++) { Rd.b[i]=usat<sub>8</sub>(Rss.h[i]); }</pre> | | Rd=vsatwh(Rss) | <pre>for (i=0;i&lt;2;i++) { Rd.h[i]=sat<sub>16</sub>(Rss.w[i]); }</pre> | | Rd=vsatwuh(Rss) | <pre>for (i=0;i&lt;2;i++) { Rd.h[i]=usat<sub>16</sub>(Rss.w[i]); }</pre> | **Behavior** Class: XTYPE (slots 2,3) ### **Notes** ■ If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ## **Intrinsics** | Rd=vsathb(Rs) | Word32 | Q6_R_vsathb_R(Word32 Rs) | |-----------------|--------|----------------------------| | Rd=vsathb(Rss) | Word32 | Q6_R_vsathb_P(Word64 Rss) | | Rd=vsathub(Rs) | Word32 | Q6_R_vsathub_R(Word32 Rs) | | Rd=vsathub(Rss) | Word32 | Q6_R_vsathub_P(Word64 Rss) | | Rd=vsatwh (Rss) | Word32 | Q6_R_vsatwh_P(Word64 Rss) | | Rd=vsatwuh(Rss) | Word32 | Q6_R_vsatwuh_P(Word64 Rss) | # **Encoding** RegType | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-----------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=vsathub(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | | 0 | 1 | 0 | d | d | d | d | d | Rd=vsatwh(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | P | Р | | - | - | - | - | | 1 | 0 | 0 | d | d | d | d | d | Rd=vsatwuh(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | P | Р | | - | - | - | - | | 1 | 1 | 0 | d | d | d | d | d | Rd=vsathb(Rss) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | | s | s | s | s | s | Р | Р | | - | - | - | - | - | 0 | 0 | - | d | d | d | d | d | Rd=vsathb(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | • | s | s | s | s | s | P | Р | - | - | - | - | - | • | 0 | 1 | - | d | d | d | d | d | Rd=vsathub(Rs) | | Field name | | Decemention | |------------|-------------------|-------------| | Field name | | Description | | 7T 7 C C | Instruction class | | | ICLASS | instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | MajOp | Major opcode | | MinOp | Minor opcode | Register type # Vector saturate without pack Saturate each element of source vector Rss to the next smaller size. The vsathub instruction saturates signed halfwords to unsigned bytes. The vsatwh instruction saturates signed words to signed halfwords, and the vsatwh instruction saturates signed words to unsigned halfwords. The resulting values are placed in destination register Rdd in unpacked form. ## Rdd=vsathub(Rss) ## Rdd=vsathb(Rss) ## Syntax Behavior | Rdd=vsathb(Rss) | <pre>for (i=0;i&lt;4;i++) { Rdd.h[i]=sat<sub>8</sub>(Rss.h[i]); }</pre> | |------------------|-------------------------------------------------------------------------------| | Rdd=vsathub(Rss) | <pre>for (i=0;i&lt;4;i++) { Rdd.h[i]=usat<sub>8</sub>(Rss.h[i]); }</pre> | | Rdd=vsatwh(Rss) | <pre>for (i=0;i&lt;2;i++) { Rdd.w[i]=sat<sub>16</sub>(Rss.w[i]); }</pre> | | Rdd=vsatwuh(Rss) | <pre>for (i=0;i&lt;2;i++) { Rdd.w[i]=usat<sub>16</sub>(Rss.w[i]); }</pre> | ## Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** | Rdd=vsathb (Rss) | Word64 | Q6_P_vsathb_P(Word64 Rss) | |------------------|--------|----------------------------| | Rdd=vsathub(Rss) | Word64 | Q6_P_vsathub_P(Word64 Rss) | | Rdd=vsatwh(Rss) | Word64 | Q6_P_vsatwh_P(Word64 Rss) | | Rdd=vsatwuh(Rss) | Word64 | Q6_P_vsatwuh_P(Word64 Rss) | ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | | | | | | | | | | | | | |---|------|----|-----|-----|----|-------|----|-------|----|-------|----|-------|----|-------|----|-------|----|-------|----|-------|----|---|---|---|----|---|---|----|-----|---|---|------------------|--|--|--|---|-------|----|--|--|----|--|--|--| | | | | Reg | Тур | е | МајОр | | МајОр | | МајОр | | MajOp | | MajOp | | MajOp | | MajOp | | MajOp | | | | | s5 | | | Pa | rse | | | | | | | Λ | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | P | Р | | - | - | - | - | • | 1 | 0 | 0 | d | d | d | d | d | Rdd=vsathub(Rss) | | | | | | | | | | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | P | Р | | - | - | - | - | - | 1 | 0 | 1 | d | d | d | d | d | Rdd=vsatwuh(Rss) | | | | | | | | | | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | | 1 | 1 | 0 | d | d | d | d | d | Rdd=vsatwh(Rss) | | | | | | | | | | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 1 | 1 | d | d | d | d | d | Rdd=vsathb(Rss) | | | | | | | | | | | | | | Field name | Description | |------------|-------------| | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | MajOp Major opcode MinOp Minor opcode RegType Register type ## Vector shuffle Shuffle odd halfwords (shuffoh) takes the odd halfwords from Rtt and the odd halfwords from Rss and merges them together into vector Rdd. Shuffle even halfwords (shuffeh) performs the same operation on every even halfword in Rss and Rtt. The same operation is available for odd and even bytes. | Rdd=shuffeb(Rss,Rtt) | <pre>for (i=0;i&lt;4;i++) { Rdd.b[i*2]=Rtt.b[i*2]; Rdd.b[i*2+1]=Rss.b[i*2]; }</pre> | |----------------------|-----------------------------------------------------------------------------------------------| | Rdd=shuffeh(Rss,Rtt) | <pre>for (i=0;i&lt;2;i++) { Rdd.h[i*2]=Rtt.h[i*2]; Rdd.h[i*2+1]=Rss.h[i*2]; }</pre> | | Rdd=shuffob(Rtt,Rss) | <pre>for (i=0;i&lt;4;i++) { Rdd.b[i*2]=Rss.b[i*2+1]; Rdd.b[i*2+1]=Rtt.b[i*2+1]; }</pre> | | Rdd=shuffoh(Rtt,Rss) | <pre>for (i=0;i&lt;2;i++) { Rdd.h[i*2]=Rss.h[i*2+1]; Rdd.h[i*2+1]=Rtt.h[i*2+1]; }</pre> | # Class: XTYPE (slots 2,3) ## Intrinsics | Rdd=shuffeb(Rss,Rtt) | Word64 Q6_P_shuffeb_PP(Word64 Rss,<br>Rtt) | Word64 | |----------------------|--------------------------------------------|--------| | Rdd=shuffeh(Rss,Rtt) | Word64 Q6_P_shuffeh_PP(Word64 Rss,<br>Rtt) | Word64 | | Rdd=shuffob(Rtt,Rss) | Word64 Q6_P_shuffob_PP(Word64 Rtt,<br>Rss) | Word64 | | Rdd=shuffoh(Rtt,Rss) | Word64 Q6_P_shuffoh_PP(Word64 Rtt,<br>Rss) | Word64 | ## **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|------|----|----|----|----|---|---|---|-----|---|---|---|---|---|---|----------------------| | | ICL | ASS | 3 | | Reg | Тур | е | M | laj | | | | s5 | | | Pa | arse | | | t5 | | | | | Min | | | | | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rdd=shuffeb(Rss,Rtt) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rdd=shuffob(Rtt,Rss) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rdd=shuffeh(Rss,Rtt) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | 0 | d | d | d | d | d | Rdd=shuffoh(Rtt,Rss) | | ield name | Description | |-----------|-------------| | icia name | Description | | ICLASS | Instruction class | |---------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | # Vector splat bytes Replicate the low 8 bits from register Rs into each of the four bytes of destination register Rd. #### **Syntax Behavior** | Rd=vsplatb(Rs) | <pre>for (i=0;i&lt;4;i++) { Rd.b[i]=Rs.b[0]; }</pre> | |-----------------|-----------------------------------------------------------| | Rdd=vsplatb(Rs) | <pre>for (i=0;i&lt;8;i++) { Rdd.b[i]=Rs.b[0]; }</pre> | ## Class: XTYPE (slots 2,3) ## **Intrinsics** | Rd=vsplatb(Rs) | Word32 Q6_R | _vsplatb_R(Word32 | Rs) | |-----------------|-------------|-------------------|-----| | Rdd=vsplatb(Rs) | Word64 Q6_P | _vsplatb_R(Word32 | Rs) | ## **Encoding** Parse d5 s5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|---|---|-----------------| | | ICL | ASS | 3 | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inO | р | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | - | - | | - | - | 1 | 0 | - | d | d | d | d | d | Rdd=vsplatb(Rs) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 1 | 1 | d | d | d | d | d | Rd=vsplatb(Rs) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | • | Packet/loop parse bits Field to encode register d Field to encode register s MajOp Major opcode Minor opcode MinOp Register type RegType # Vector splat halfwords Replicate the low 16 bits from register Rs into each of the four halfwords of destination Rdd. #### Class: XTYPE (slots 2,3) #### **Intrinsics** Rdd=vsplath(Rs) Word64 Q6\_P\_vsplath\_R(Word32 Rs) ## **Encoding** RegType | * | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|---|---|-----------------| | | ICL | .ASS | 3 | ١ | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inO | р | | | d5 | | | | | Ī | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | - | s | s | ø | s | s | Р | Р | | - | - | - | - | - | 0 | 1 | - | d | d | d | d | d | Rdd=vsplath(Rs) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | Register type # Vector splice Concatenate the low (8-N) bytes of vector Rtt with the low N bytes of vector Rss. This instruction is helpful to vectorize unaligned stores. | Rdd=vspliceb(Rss,Rtt,#u3) | $Rdd = Rtt << #u*8 zxt_{#u*8->64}(Rss);$ | |---------------------------|--------------------------------------------------------------------------------------| | Rdd=vspliceb(Rss,Rtt,Pu) | PREDUSE_TIMING;<br>Rdd = Rtt << (Pu&7)*8 zxt <sub>(Pu&amp;7)*8-&gt;64</sub> (Rss); | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rdd=vspliceb(Rss,Rtt,#u3) | Word64 Q6_P_vspliceb_PPI(Word64 Rss, Word64 Rtt, Word32 Iu3) | |---------------------------|--------------------------------------------------------------| | Rdd=vspliceb(Rss,Rtt,Pu) | Word64 Q6_P_vspliceb_PPp(Word64 Rss, Word64 Rtt, Byte Pu) | ## **Encoding** RegType | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|--------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|----|---|---|----|---|---|---------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | М | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | | | s | s | s | s | s | Р | Р | - | t | t | t | t | t | i | i | i | d | d | d | d | d | Rdd=vspliceb(Rss,Rtt,#u3) | | | ICLASS | | | | Reg | Тур | е | М | aj | | | | s5 | | | Pa | rse | | | | t5 | | | | u | 12 | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | u | u | d | d | d | d | d | Rdd=vspliceb(Rss,Rtt,Pu) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | u2 | Field to encode register u | | | Maj | Major opcode | | | Min | Minor opcode | | Register type # Vector sign extend The vsxtbh instruction sign-extends each byte of a single register source to halfwords, and places the result in the destination register pair. The vsxthw instruction sign-extends each halfword of a single register source to words, and places the result in the destination register pair. #### **Syntax Behavior** | Rdd=vsxtbh(Rs) | <pre>for (i=0;i&lt;4;i++) { Rdd.h[i]=Rs.b[i]; }</pre> | |----------------|-----------------------------------------------------------| | Rdd=vsxthw(Rs) | <pre>for (i=0;i&lt;2;i++) { Rdd.w[i]=Rs.h[i]; }</pre> | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Rdd=vsxtbh(Rs) | Word64 Q6_P_vsxtbh_R(Word32 Rs) | |----------------|---------------------------------| | Rdd=vsxthw(Rs) | Word64 Q6_P_vsxthw_R(Word32 Rs) | ## **Encoding** Field name | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|------|----|-----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|----------------| | | ICL | ASS | 3 | ı | Reg | Туре | Э | Maj | Юp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | • | s | ß | s | Ø | s | Р | Р | - | - | - | | - | - | 0 | 0 | | d | d | d | d | d | Rdd=vsxtbh(Rs) | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | • | s | s | s | s | ø | Р | Ρ | • | - | - | - | - | - | 1 | 0 | - | d | d | d | d | d | Rdd=vsxthw(Rs) | | 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | CL | ۱SS | 3 | ı | Reg | Тур | е | Ма | jОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inO | р | | | d5 | | | | |---------------------------------------------------------------------------------|----|-----|---|---|-----|-----|---|----|-----|---|---|---|----|---|---|----|-----|---|---|---|---|---|---|---|------|---|---|---|----|---|---|----------------| | 0 0 0 0 1 0 0 0 0 0 - <b>s s s s P P 1</b> 0 - <b>d d d d d R</b> dd=vsxthw(Rs) | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | | s | s | s | s | s | P | Р | - | - | - | - | - | - | 0 | 0 | • | d | d | d | d | d | Rdd=vsxtbh(Rs) | | | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 0 | - | d | d | d | d | d | Rdd=vsxthw(Rs) | Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | RegType Register type | Field name | | Description | |------------|--------------|-------------| | MajOp | Major opcode | | | MinOp | Minor opcode | | #### Vector truncate In the vtrunehb instruction, for each halfword in a vector, take the even (lower) byte and ignore the other byte. The resulting values are packed into destination register Rd. The vtrunohb instruction takes each odd byte of the source vector. The vtrunewh instruction uses two source register pairs, Rss and Rtt. The even (lower) halfwords of Rss are packed in the upper word of Rdd, while the lower halfwords of Rtt are packed in the lower word of Rdd. The vtrunowh instruction performs the same operation as vtrunewh, but uses the odd (upper) halfwords of the source vectors instead. #### Syntax Behavior | Rd=vtrunehb(Rss) | <pre>for (i=0;i&lt;4;i++) { Rd.b[i]=Rss.b[i*2]; }</pre> | |-----------------------|---------------------------------------------------------------------------------------------| | Rd=vtrunohb(Rss) | <pre>for (i=0;i&lt;4;i++) { Rd.b[i]=Rss.b[i*2+1]; }</pre> | | Rdd=vtrunehb(Rss,Rtt) | <pre>for (i=0;i&lt;4;i++) { Rdd.b[i]=Rtt.b[i*2]; Rdd.b[i+4]=Rss.b[i*2]; }</pre> | | Rdd=vtrunewh(Rss,Rtt) | <pre>Rdd.h[0]=Rtt.h[0]; Rdd.h[1]=Rtt.h[2]; Rdd.h[2]=Rss.h[0]; Rdd.h[3]=Rss.h[2];</pre> | | Rdd=vtrunohb(Rss,Rtt) | <pre>for (i=0;i&lt;4;i++) { Rdd.b[i]=Rtt.b[i*2+1]; Rdd.b[i+4]=Rss.b[i*2+1]; }</pre> | | Syntax | Behavior | |-----------------------|--------------------| | Rdd=vtrunowh(Rss,Rtt) | Rdd.h[0]=Rtt.h[1]; | | | Rdd.h[1]=Rtt.h[3]; | | | Rdd.h[2]=Rss.h[1]; | | | Rdd.h[3]=Rss.h[3]: | Class: XTYPE (slots 2,3) #### **Intrinsics** | Rd=vtrunehb(Rss) | Word32 | Q6_R_vtrunehb_P(Word64 Rss) | |-----------------------|----------------|-------------------------------------| | Rd=vtrunohb(Rss) | Word32 | Q6_R_vtrunohb_P(Word64 Rss) | | Rdd=vtrunehb(Rss,Rtt) | Word64<br>Rtt) | Q6_P_vtrunehb_PP(Word64 Rss, Word64 | | Rdd=vtrunewh(Rss,Rtt) | Word64<br>Rtt) | Q6_P_vtrunewh_PP(Word64 Rss, Word64 | | Rdd=vtrunohb(Rss,Rtt) | Word64<br>Rtt) | Q6_P_vtrunohb_PP(Word64 Rss, Word64 | | Rdd=vtrunowh(Rss,Rtt) | Word64<br>Rtt) | Q6_P_vtrunowh_PP(Word64 Rss, Word64 | ## **Encoding** d5 s5 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|-------|-----|----|----|----|----|---|---|---|-------|---|---|---|----|---|---|-----------------------| | | ICL | ASS | 3 | | Reg | Тур | е | Ма | јОр | | | | s5 | s5 | | Parse | | | | | | | | N | MinOp | | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | P | Р | - | - | - | - | - | - | 0 | 0 | 0 | d | d | d | d | d | Rd=vtrunohb(Rss) | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | - | | - | - | | 0 | 1 | 0 | d | d | d | d | d | Rd=vtrunehb(Rss) | | | ICL | ASS | 3 | | Reg | Тур | е | М | aj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rdd=vtrunewh(Rss,Rtt) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | d | d | d | d | d | Rdd=vtrunehb(Rss,Rtt) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | d | d | d | d | d | Rdd=vtrunowh(Rss,Rtt) | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | d | d | d | d | d | Rdd=vtrunohb(Rss,Rtt) | | Field name | | Description | |------------|------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | Field to encode register d Field to encode register s t5 Field to encode register t MajOp Major opcode MinOp Minor opcode Maj Major opcode Min Minor opcode RegType Register type ## Vector zero extend The wzxtbh instruction zero-extends each byte of a single register source to halfwords, and places the result in the destination register pair. The vzxthw instruction zero-extends each halfword of a single register source to words, and places the result in the destination register pair. | Syntax | Behavior | |----------------|------------------------------------------------------------| | Rdd=vzxtbh(Rs) | <pre>for (i=0;i&lt;4;i++) { Rdd.h[i]=Rs.ub[i]; }</pre> | | Rdd=vzxthw(Rs) | <pre>for (i=0;i&lt;2;i++) { Rdd.w[i]=Rs.uh[i]; }</pre> | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Rdd=vzxtbh(Rs) | Word64 Q6_P_vzxtbh_R(Word32 Rs) | |----------------|---------------------------------| | Rdd=vzxthw(Rs) | Word64 Q6_P_vzxthw_R(Word32 Rs) | ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|----------------| | Ī | ICL | .ASS | 3 | ı | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | | - | - | - | - | 0 | 1 | - | d | d | d | d | d | Rdd=vzxtbh(Rs) | | - | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | - | - | - | - | - | 1 | 1 | - | d | d | d | d | d | Rdd=vzxthw(Rs) | | Field name De | scription | |---------------|-----------| |---------------|-----------| ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register s MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type Syntax #### **11.10.7 XTYPE PRED** The XTYPE PRED instruction subclass includes instructions that perform miscellaneous operations on predicates, including mask generation, predicate transfers, and the Viterbi pack operation. Predicate operations modify predicate source data. #### Bounds check Determine whether Rs falls in the range defined by Rtt. Rtt.w0 is set by the user to the lower bound, and Rtt.w1 is set by the user to the upper bound. All bits of the destination predicate are set when the value falls within the range, or all cleared otherwise. **Behavior** #### Class: XTYPE (slots 2,3) #### **Intrinsics** Pd=boundscheck(Rs,Rtt) Byte Q6 p boundscheck RP(Word32 Rs, Word64 Rtt) #### **Encoding** | 3 | 1 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|---|-----|-----|----|----|-----|------|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|---|---|---|------------------------------------| | | I | CLA | ASS | ; | F | Reg | Туре | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | | d | 2 | | | 1 | | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | 0 | 0 | - | - | - | d | d | Pd=boundscheck(Rss,Rtt):r<br>aw:lo | | 1 | | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 1 | 0 | 1 | - | - | - | d | d | Pd=boundscheck(Rss,Rtt):r<br>aw:hi | # Field name Description | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | # Compare byte These instructions sign- or zero-extend the low eight bits of the source registers and perform 32-bit comparisons on the result. For an extended 32-bit immediate operand, the full 32 immediate bits are used for the comparison. | Syntax | Behavior | |---------------------|--------------------------------------------------------------------------| | Pd=cmpb.eq(Rs, #u8) | Pd=Rs.ub[0] == #u ? 0xff : 0x00; | | Pd=cmpb.eq(Rs,Rt) | Pd=Rs.b[0] == Rt.b[0] ? 0xff : 0x00; | | Pd=cmpb.gt(Rs, #s8) | Pd=Rs.b[0] > #s ? 0xff : 0x00; | | Pd=cmpb.gt(Rs,Rt) | Pd=Rs.b[0] > Rt.b[0] ? 0xff : 0x00; | | Pd=cmpb.gtu(Rs,#u7) | <pre>apply_extension(#u); Pd=Rs.ub[0] &gt; #u.uw[0] ? 0xff : 0x00;</pre> | | Pd=cmpb.gtu(Rs,Rt) | Pd=Rs.ub[0] > Rt.ub[0] ? 0xff : 0x00; | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=cmpb.eq(Rs, #u8) | Byte Q6_p_cmpb_eq_RI(Word32 Rs, Word32 Iu8) | |---------------------|----------------------------------------------| | Pd=cmpb.eq(Rs,Rt) | Byte Q6_p_cmpb_eq_RR(Word32 Rs, Word32 Rt) | | Pd=cmpb.gt(Rs, #s8) | Byte Q6_p_cmpb_gt_RI(Word32 Rs, Word32 Is8) | | Pd=cmpb.gt(Rs,Rt) | Byte Q6_p_cmpb_gt_RR(Word32 Rs, Word32 Rt) | | Pd=cmpb.gtu(Rs,#u7) | Byte Q6_p_cmpb_gtu_RI(Word32 Rs, Word32 Iu7) | | Pd=cmpb.gtu(Rs,Rt) | Byte Q6_p_cmpb_gtu_RR(Word32 Rs, Word32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|---|---|---|---------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | | d | 2 | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | - | - | - | d | d | Pd=cmpb.gt(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 0 | - | - | - | d | d | Pd=cmpb.eq(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | 1 | - | - | - | d | d | Pd=cmpb.gtu(Rs,Rt) | | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | • | | Pa | rse | | • | • | • | | | • | | | | | • | d | 2 | | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 0 | - | d | d | Pd=cmpb.eq(Rs,#u8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 0 | - | d | d | Pd=cmpb.gt(Rs,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | - | 1 | 0 | s | s | s | s | s | Р | Р | - | 0 | i | i | i | i | i | i | i | 0 | 0 | - | d | d | Pd=cmpb.gtu(Rs,#u7) | | Field name | | Description | |------------|---------------|-------------| | RegType | Register type | | | MajOp | Major opcode | | ICLASSInstruction classParsePacket/loop parse bitsd2Field to encode register ds5Field to encode register s | Field name | | Description | |------------|----------------------------|-------------| | t5 | Field to encode register t | | | Maj | Major opcode | | | Min | Minor opcode | | # Compare half These instructions sign- or zero-extend the low 16 bits of the source registers and perform 32-bit comparisons on the result. For an extended 32-bit immediate operand, the full 32 immediate bits are for the comparison. | Syntax | Behavior | |--------|----------| |--------|----------| | Pd=cmph.eq(Rs, #s8) | <pre>apply_extension(#s); Pd=Rs.h[0] == #s ? 0xff : 0x00;</pre> | |---------------------|--------------------------------------------------------------------------| | Pd=cmph.eq(Rs,Rt) | Pd=Rs.h[0] == Rt.h[0] ? 0xff : 0x00; | | Pd=cmph.gt(Rs, #s8) | <pre>apply_extension(#s); Pd=Rs.h[0] &gt; #s ? 0xff : 0x00;</pre> | | Pd=cmph.gt(Rs,Rt) | Pd=Rs.h[0] > Rt.h[0] ? 0xff : 0x00; | | Pd=cmph.gtu(Rs,#u7) | <pre>apply_extension(#u); Pd=Rs.uh[0] &gt; #u.uw[0] ? 0xff : 0x00;</pre> | | Pd=cmph.gtu(Rs,Rt) | Pd=Rs.uh[0] > Rt.uh[0] ? 0xff : 0x00; | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=cmph.eq(Rs, #s8) | Byte Q6_p_cmph_eq_RI(Word32 Rs, Word32 Is8) | |---------------------|----------------------------------------------| | Pd=cmph.eq(Rs,Rt) | Byte Q6_p_cmph_eq_RR(Word32 Rs, Word32 Rt) | | Pd=cmph.gt(Rs, #s8) | Byte Q6_p_cmph_gt_RI(Word32 Rs, Word32 Is8) | | Pd=cmph.gt(Rs,Rt) | Byte Q6_p_cmph_gt_RR(Word32 Rs, Word32 Rt) | | Pd=cmph.gtu(Rs,#u7) | Byte Q6_p_cmph_gtu_RI(Word32 Rs, Word32 Iu7) | | Pd=cmph.gtu(Rs,Rt) | Byte Q6_p_cmph_gtu_RR(Word32 Rs, Word32 Rt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|---|---|---|---------------------| | | ICL | ASS | 3 | | Reg | Тур | е | N | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | | d | 2 | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 1 | - | - | - | d | d | Pd=cmph.eq(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | - | - | | d | d | Pd=cmph.gt(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 1 | - | - | | d | d | Pd=cmph.gtu(Rs,Rt) | | | ICL | ASS | 3 | | Reg | Тур | е | | | | | | s5 | • | | Pa | rse | | • | | | | | • | | | | • | | d | 2 | | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | - | 0 | 0 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 1 | - | d | d | Pd=cmph.eq(Rs,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | - | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 1 | • | d | d | Pd=cmph.gt(Rs,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | - | 1 | 0 | s | s | s | s | s | Р | Р | - | 0 | i | i | i | i | i | i | i | 0 | 1 | - | d | d | Pd=cmph.gtu(Rs,#u7) | | Etalal access | D | |---------------|-------------| | Field name | Description | | RegType | Register type | |---------|------------------------| | MajOp | Major opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | Field name | | Description | |------------|----------------------------|-------------| | d2 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | Maj | Major opcode | | | Min | Minor opcode | | # Compare doublewords Compare two 64-bit register pairs for unsigned greater than, greater than, or equal. The 8-bit predicate register Pd is set to all 1's or all 0's, depending on the result. | S۱ | /ntax | Behavior | |----|-------|----------| | | | | | Pd=cmp.eq(Rss,Rtt) | Pd=Rss==Rtt ? 0xff : 0x00; | |---------------------|-----------------------------------| | Pd=cmp.gt(Rss,Rtt) | Pd=Rss>Rtt ? 0xff : 0x00; | | Pd=cmp.gtu(Rss,Rtt) | Pd=Rss.u64>Rtt.u64 ? 0xff : 0x00; | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=cmp.eq(Rss,Rtt) | Byte Q6_p_cmp_eq_PP(Word64 Rss, Word64 Rtt) | |---------------------|----------------------------------------------| | Pd=cmp.gt(Rss,Rtt) | Byte Q6_p_cmp_gt_PP(Word64 Rss, Word64 Rtt) | | Pd=cmp.gtu(Rss,Rtt) | Byte Q6_p_cmp_gtu_PP(Word64 Rss, Word64 Rtt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|---|---|---|---------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | | d | 2 | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | P | P | | t | t | t | t | t | 0 | 0 | 0 | - | | - | d | d | Pd=cmp.eq(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | 0 | - | - | - | d | d | Pd=cmp.gt(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | 0 | - | - | - | d | d | Pd=cmp.gtu(Rss,Rtt) | | Field name Desc | cription | |-----------------|----------| |-----------------|----------| | RegType | Register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | # Compare bit mask When all the bits in the mask in Rt or a short immediate are set (bitsset) or clear (bitsclr) in Rs, set the Pd to true. Otherwise, set the bits in Pd to false. | Syntax | <b>Behavior</b> | |--------|-----------------| | | | | Pd=[!]bitsclr(Rs, #u6) | Pd=(Rs&#u)[!]=0 ? 0xff : 0x00; | |------------------------|---------------------------------| | Pd=[!]bitsclr(Rs,Rt) | Pd=(Rs&Rt)[!]=0 ? 0xff : 0x00; | | Pd=[!]bitsset(Rs,Rt) | Pd=(Rs&Rt)[!]=Rt ? 0xff : 0x00; | ## Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=!bitsclr(Rs, #u6) | Byte Q6_p_not_bitsclr_RI(Word32 Rs, Word32 Iu6) | |----------------------|-----------------------------------------------------------| | Pd=!bitsclr(Rs,Rt) | <pre>Byte Q6_p_not_bitsclr_RR(Word32 Rs, Word32 Rt)</pre> | | Pd=!bitsset(Rs,Rt) | <pre>Byte Q6_p_not_bitsset_RR(Word32 Rs, Word32 Rt)</pre> | | Pd=bitsclr(Rs,#u6) | Byte Q6_p_bitsclr_RI(Word32 Rs, Word32 Iu6) | | Pd=bitsclr(Rs,Rt) | Byte Q6_p_bitsclr_RR(Word32 Rs, Word32 Rt) | | Pd=bitsset(Rs,Rt) | Byte Q6_p_bitsset_RR(Word32 Rs, Word32 Rt) | ## **Encoding** RegType | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|------|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|---------------------| | | ICL. | ASS | 3 | F | Reg | Туре | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | | | | | | | d | 2 | | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | P | i | i | i | i | i | i | - | - | - | - | - | - | d | d | Pd=bitsclr(Rs,#u6) | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | - | - | - | - | - | - | d | d | Pd=!bitsclr(Rs,#u6) | | | ICL | ASS | 3 | ı | Reg | Туре | е | М | aj | | | | s5 | | | Pa | rse | | | | t5 | | | | | | | | | d | 2 | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | - | - | - | - | - | - | d | d | Pd=bitsset(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | - | - | - | d | d | Pd=!bitsset(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | - | - | - | d | d | Pd=bitsclr(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | - | - | - | d | d | Pd=!bitsclr(Rs,Rt) | | Field name | | | Description | |------------|--|--|-------------| | | | | | | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | MajOp | Major opcode | | Maj | Major opcode | Register type ## Mask generate from predicate For each of the low eight bits in predicate register Pt, when the bit is set, set the corresponding byte in 64-bit register pair Rdd to 0xff, otherwise, set the corresponding byte to 0x00. #### Syntax Behavior Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rdd=mask(Pt) Word64 Q6_P_mask_p(Byte Pt) ``` ## **Encoding** t2 | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|--------------| | Ī | I | CL | ASS | 3 | ı | Reg | Тур | е | | | | | | | | | Pa | rse | | | | | ť | 2 | | | | | | d5 | | | | | Ī | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | - | - | - | - | - | - | - | - | Р | Р | - | - | - | - | t | t | - | - | - | d | d | d | d | d | Rdd=mask(Pt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | Field to encode register t RegType Register type ## Check for TLB match Determine whether the TLB entry in Rss matches the ASID:PPN in Rt. ## Syntax Behavior Class: XTYPE (slots 2,3) #### **Notes** ■ The predicate generated by this instruction cannot be used as a .new predicate, nor can it be automatically AND'd with another predicate. #### **Intrinsics** ``` Pd=tlbmatch(Rss,Rt) Byte Q6_p_tlbmatch_PR(Word64 Rss, Word32 Rt) ``` ## **Encoding** | I | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|---|---|---|---------------------| | Ī | | CLA | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inC | )p | | | | d | 2 | | | Ī | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | 1 | 1 | | | - | d | d | Pd=tlbmatch(Rss,Rt) | | Field name | | Description | |------------|---------------|-------------| | ReaType | Register type | 2000p | | Kedilbe | register type | |---------|----------------------------| | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | | | ## Predicate transfer Pd=Rs transfers a predicate to the 8 least-significant bits of a general register and zeros the other bits. Rd=Ps transfers the 8 least-significant bits of a general register to a predicate. | Pd=Rs | Pd = Rs.ub[0]; | |-------|-------------------------------------------------------------| | Rd=Ps | <pre>PREDUSE_TIMING; Rd = zxt<sub>8-&gt;32</sub>(Ps);</pre> | Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=Rs | Byte Q6_p_equals_R(Word32 Rs) | |-------|-------------------------------| | Rd=Ps | Word32 Q6_R_equals_p(Byte Ps) | ## **Encoding** | 3 | 1 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|---|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------| | | ŀ | CLA | ASS | 3 | | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | | | | | | | d | 2 | | | Ţ. | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | - | - | | - | | - | - | - | - | - | | d | d | Pd=Rs | | | I | CL/ | ASS | 6 | | Reg | Тур | е | Ма | јОр | | | • | • | s | 2 | Pa | rse | | • | • | | | | | • | | | • | d5 | | • | | | Ŀ | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | 1 | - | - | - | - | s | s | Р | Р | - | - | - | - | - | - | - | - | - | d | d | d | d | d | Rd=Ps | | Field name | | Description | |-------------|--------------------|-------------| | T 0 T 7 0 0 | landarintina alama | | | ICLASS | Instruction class | |---------|----------------------------| | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | d5 | Field to encode register d | | s2 | Field to encode register s | | s5 | Field to encode register s | | MajOp | Major opcode | | RegType | Register type | #### Test bit Extract a bit from a register. When the bit is true (1), set all the bits of the predicate register destination to 1. When the bit is false (0), set all the bits of the predicate register destination to 0. Indicate the bit to test can using an immediate or register value. When using a register to indicate the bit to test, and the value specified is out of range, the predicate result is zero. # Syntax Behavior Class: XTYPE (slots 2,3) #### **Intrinsics** | Pd=!tstbit(Rs,#u5) | Byte Q6_p_not_tstbit_RI(Word32 Rs, Word32 Iu5) | |--------------------|------------------------------------------------| | Pd=!tstbit(Rs,Rt) | Byte Q6_p_not_tstbit_RR(Word32 Rs, Word32 Rt) | | Pd=tstbit(Rs,#u5) | Byte Q6_p_tstbit_RI(Word32 Rs, Word32 Iu5) | | Pd=tstbit(Rs,Rt) | Byte Q6_p_tstbit_RR(Word32 Rs, Word32 Rt) | #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|------|----|----|----|----|-----|-----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|--------------------| | | ICL | ASS | 3 | F | Reg | Тур | е | Ма | jОр | | | | s5 | | | Pa | rse | | | | | | | | | | | | | d | 2 | | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | - | - | - | - | - | - | d | d | Pd=tstbit(Rs,#u5) | | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | - | - | - | - | - | - | d | d | Pd=!tstbit(Rs,#u5) | | | ICL | ASS | 3 | ı | Reg | Тур | е | N | 1aj | | s5 F | | | | Pa | rse | | | | t5 | | • | | | • | • | | • | d | 2 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | - | - | - | d | d | Pd=tstbit(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | - | - | - | - | - | - | d | d | Pd=!tstbit(Rs,Rt) | #### Field name Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | MajOp | Major opcode | | | | Major opcode Register type Maj RegType ## Vector compare halfwords Compare each of four 16-bit halfwords in two 64-bit vectors and set the corresponding bits in a predicate destination to '11' if true, '00' if false. Halfword comparisons are for equal, signed greater than, or unsigned greater than. Syntax Behavior ``` for (i = 0; i < 4; i++) { Pd=vcmph.eq(Rss, #s8) Pd.i*2 = (Rss.h[i] == #s); Pd.i*2+1 = (Rss.h[i] == #s); for (i = 0; i < 4; i++) { Pd=vcmph.eq(Rss,Rtt) Pd.i*2 = (Rss.h[i] == Rtt.h[i]); Pd.i*2+1 = (Rss.h[i] == Rtt.h[i]); Pd=vcmph.gt(Rss, #s8) for (i = 0; i < 4; i++) { Pd.i*2 = (Rss.h[i] > #s); Pd.i*2+1 = (Rss.h[i] > #s); Pd=vcmph.gt(Rss,Rtt) for (i = 0; i < 4; i++) { Pd.i*2 = (Rss.h[i] > Rtt.h[i]); Pd.i*2+1 = (Rss.h[i] > Rtt.h[i]); for (i = 0; i < 4; i++) { Pd=vcmph.gtu(Rss, #u7) Pd.i*2 = (Rss.uh[i] > #u); Pd.i*2+1 = (Rss.uh[i] > #u); for (i = 0; i < 4; i++) { Pd=vcmph.gtu(Rss,Rtt) Pd.i*2 = (Rss.uh[i] > Rtt.uh[i]); Pd.i*2+1 = (Rss.uh[i] > Rtt.uh[i]); ``` # Class: XTYPE (slots 2,3) ## Intrinsics | Pd=vcmph.eq(Rss, #s8) | Byte Q6_p_vcmph_eq_PI(Word64 Rss, Word32 Is8) | |-----------------------|------------------------------------------------| | Pd=vcmph.eq(Rss,Rtt) | Byte Q6_p_vcmph_eq_PP(Word64 Rss, Word64 Rtt) | | Pd=vcmph.gt(Rss, #s8) | Byte Q6_p_vcmph_gt_PI(Word64 Rss, Word32 Is8) | | Pd=vcmph.gt(Rss,Rtt) | Byte Q6_p_vcmph_gt_PP(Word64 Rss, Word64 Rtt) | | Pd=vcmph.gtu(Rss,#u7) | Byte Q6_p_vcmph_gtu_PI(Word64 Rss, Word32 Iu7) | | Pd=vcmph.gtu(Rss,Rtt) | Byte Q6_p_vcmph_gtu_PP(Word64 Rss, Word64 Rtt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|-----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|---|---|---|-----------------------| | | ICL | ASS | 3 | - 1 | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | | d | 2 | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 1 | | - | - | d | d | Pd=vcmph.eq(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 0 | - | - | - | d | d | Pd=vcmph.gt(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 0 | 1 | - | - | - | d | d | Pd=vcmph.gtu(Rss,Rtt) | | | ICL | ASS | 3 | I | Reg | Тур | е | | | • | | | s5 | • | • | Pa | rse | | • | • | | | | • | | | • | | • | d | 2 | | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 1 | - | d | d | Pd=vcmph.eq(Rss,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 1 | - | d | d | Pd=vcmph.gt(Rss,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | 0 | i | i | i | i | i | i | i | 0 | 1 | - | d | d | Pd=vcmph.gtu(Rss,#u7) | | Field name | | Description | |------------|---------------|-------------| | eattane | Register type | | | RegType | Register type | |---------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ## Vector compare bytes for any match Compare each byte in two 64-bit source vectors and set a predicate if any of the eight bytes are equal. This instruction can quickly find the null terminator in a string. #### **Syntax** #### **Behavior** ## Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Pd=!any8(vcmpb.eq(Rss,Rtt)) Byte Q6_p_not_any8_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) Pd=any8(vcmpb.eq(Rss,Rtt)) Byte Q6_p_any8_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) ``` #### **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|---|---|---|------------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | | d | 2 | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | 0 | 0 | - | - | - | d | d | Pd=any8(vcmpb.eq(Rss,Rtt )) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | 0 | 1 | - | - | - | d | d | Pd=!any8(vcmpb.eq(Rss,Rt t)) | #### Field name #### Description | Register type | |----------------------------| | Minor opcode | | Instruction class | | Packet/loop parse bits | | Field to encode register d | | Field to encode register s | | Field to encode register t | | | ## Vector compare bytes Compare each of eight bytes in two 64-bit vectors and set the corresponding bit in a predicate destination to 1 if true, 0 if false. Byte comparisons are for equal or unsigned greater than. In the following example, every other comparison is true. #### Syntax Behavior ``` Pd=vcmpb.eq(Rss, #u8) for (i = 0; i < 8; i++) { Pd.i = (Rss.ub[i] == #u); Pd=vcmpb.eq(Rss,Rtt) for (i = 0; i < 8; i++) { Pd.i = (Rss.b[i] == Rtt.b[i]); Pd=vcmpb.gt(Rss, #s8) for (i = 0; i < 8; i++) { Pd.i = (Rss.b[i] > #s); Pd=vcmpb.gt(Rss,Rtt) for (i = 0; i < 8; i++) { Pd.i = (Rss.b[i] > Rtt.b[i]); Pd=vcmpb.gtu(Rss, #u7) for (i = 0; i < 8; i++) { Pd.i = (Rss.ub[i] > #u); Pd=vcmpb.gtu(Rss,Rtt) for (i = 0; i < 8; i++) { Pd.i = (Rss.ub[i] > Rtt.ub[i]); ``` # Class: XTYPE (slots 2,3) ## Intrinsics | Pd=vcmpb.eq(Rss,#u8) | Byte Q6_p_vcmpb_eq_PI(Word64 Rss, Word32 Iu8) | |-----------------------|------------------------------------------------| | Pd=vcmpb.eq(Rss,Rtt) | Byte Q6_p_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) | | Pd=vcmpb.gt(Rss,#s8) | Byte Q6_p_vcmpb_gt_PI(Word64 Rss, Word32 Is8) | | Pd=vcmpb.gt(Rss,Rtt) | Byte Q6_p_vcmpb_gt_PP(Word64 Rss, Word64 Rtt) | | Pd=vcmpb.gtu(Rss,#u7) | Byte Q6_p_vcmpb_gtu_PI(Word64 Rss, Word32 Iu7) | | Pd=vcmpb.gtu(Rss,Rtt) | Byte Q6_p_vcmpb_gtu_PP(Word64 Rss, Word64 Rtt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|---|---|---|-----------------------| | | ICL | ASS | 3 | - | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | 1inO | )p | | | | d | 2 | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 1 | 1 | 0 | | | - | d | d | Pd=vcmpb.eq(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 1 | 1 | 1 | - | - | - | d | d | Pd=vcmpb.gtu(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 1 | t | t | t | t | t | 0 | 1 | 0 | - | - | - | d | d | Pd=vcmpb.gt(Rss,Rtt) | | | ICL | ASS | 6 | ı | Reg | Тур | е | | | | | | s5 | • | • | Pa | rse | | • | • | | • | • | | | | • | | • | d | 2 | | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 0 | - | d | d | Pd=vcmpb.eq(Rss,#u8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 0 | 0 | - | d | d | Pd=vcmpb.gt(Rss,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | 0 | i | i | i | i | i | i | i | 0 | 0 | - | d | d | Pd=vcmpb.gtu(Rss,#u7) | | Field name | Description | |------------|-------------| | | | | RegType | Register type | |------------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | <b>s</b> 5 | Field to encode register s | | t5 | Field to encode register t | # Vector compare words Compare each of two 32-bit words in two 64-bit vectors and set the corresponding bits in a predicate destination to '1111' if true, '0000' if false. Word comparisons are for equal, signed greater than, or unsigned greater than. ## Syntax Behavior | Pd[3:0] = (Rss.w[0]==#s);<br>Pd[7:4] = (Rss.w[1]==#s); | |----------------------------------------------------------------------| | Pd[3:0] = (Rss.w[0]==Rtt.w[0]);<br>Pd[7:4] = (Rss.w[1]==Rtt.w[1]); | | Pd[3:0] = (Rss.w[0]>#s);<br>Pd[7:4] = (Rss.w[1]>#s); | | Pd[3:0] = (Rss.w[0]>Rtt.w[0]);<br>Pd[7:4] = (Rss.w[1]>Rtt.w[1]); | | Pd[3:0] = (Rss.uw[0]>#u.uw[0]);<br>Pd[7:4] = (Rss.uw[1]>#u.uw[0]); | | Pd[3:0] = (Rss.uw[0]>Rtt.uw[0]);<br>Pd[7:4] = (Rss.uw[1]>Rtt.uw[1]); | | | # Class: XTYPE (slots 2,3) ## Intrinsics | Pd=vcmpw.eq(Rss, #s8) | Byte Q6_p_vcmpw_eq_PI(Word64 Rss, Word32 Is8) | |-----------------------|------------------------------------------------| | Pd=vcmpw.eq(Rss,Rtt) | Byte Q6_p_vcmpw_eq_PP(Word64 Rss, Word64 Rtt) | | Pd=vcmpw.gt(Rss,#s8) | Byte Q6_p_vcmpw_gt_PI(Word64 Rss, Word32 Is8) | | Pd=vcmpw.gt(Rss,Rtt) | Byte Q6_p_vcmpw_gt_PP(Word64 Rss, Word64 Rtt) | | Pd=vcmpw.gtu(Rss,#u7) | Byte Q6_p_vcmpw_gtu_PI(Word64 Rss, Word32 Iu7) | | Pd=vcmpw.gtu(Rss,Rtt) | Byte Q6_p_vcmpw_gtu_PP(Word64 Rss, Word64 Rtt) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|---|---|---|-----------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | N | /linC | )p | | | | d | 2 | | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | P | Р | 0 | t | t | t | t | t | 0 | 0 | 0 | | | - | d | d | Pd=vcmpw.eq(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 0 | 1 | - | - | - | d | d | Pd=vcmpw.gt(Rss,Rtt) | | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | - | - | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | 0 | 1 | 0 | - | - | - | d | d | Pd=vcmpw.gtu(Rss,Rtt) | | | ICL | ASS | 6 | ı | Reg | Тур | е | | | • | | | s5 | | | Pa | rse | | • | • | | • | • | | | | • | | • | d | 2 | | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 1 | 0 | - | d | d | Pd=vcmpw.eq(Rss,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | - | i | i | i | i | i | i | i | i | 1 | 0 | - | d | d | Pd=vcmpw.gt(Rss,#s8) | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | - | 0 | i | į | i | i | i | i | i | 1 | 0 | - | d | d | Pd=vcmpw.gtu(Rss,#u7) | | Field name | Description | |------------|-------------| | | | | RegType | Register type | |---------|----------------------------| | MajOp | Major opcode | | MinOp | Minor opcode | | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d2 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | ## Viterbi pack even and odd predicate bits Pack the even and odd bits of two predicate registers into a single destination register. A variant of this instruction is R3:2 |= vitpack(P1, P0). This places the packed predicate bits into the lower 8 bits of the register pair, which has been preshifted by 8 bits. This instruction is useful in Viterbi decoding. Repeated use of the push version enables storing a history for traceback purposes. | Syntax | Behavior | |---------------------------------|----------------------------------------------------------------| | · · · · · · · · · · · · · · · · | <pre>PREDUSE_TIMING; Rd = (Ps&amp;0x55) (Pt&amp;0xAA);</pre> | Class: XTYPE (slots 2,3) #### **Intrinsics** Rd=vitpack(Ps,Pt) Word32 Q6 R vitpack pp(Byte Ps, Byte Pt) ## **Encoding** | 3 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|-------------------| | Ī | ı | ICL/ | ASS | 3 | ١ | Reg | Тур | е | Ма | јОр | | | | | s | 2 | Pa | rse | | | | | t | 2 | | | | | | d5 | | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | 0 | 0 | - | | | s | s | Р | Р | - | - | - | - | t | t | - | - | - | d | d | d | d | d | Rd=vitpack(Ps,Pt) | | Field name | D | escription | |------------|----------------------------|------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s2 | Field to encode register s | | | t2 | Field to encode register t | | | MajOp | Major opcode | | | RegType | Register type | | ## Vector mux Perform an element-wise byte selection between two vectors. For each of the low 8 bits of predicate register Pu, if the bit is set, the corresponding byte in Rdd is set to the corresponding byte from Rss. Otherwise, set the byte in Rdd to the byte from Rtt. #### Syntax Behavior Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rdd=vmux(Pu,Rss,Rtt) Word64 Q6_P_vmux_pPP(Byte Pu, Word64 Rss, Word64 Rtt) ``` ## **Encoding** | 3 | 1 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|---|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|----|---|---|----------------------| | | I | CLA | ASS | | F | Reg | Тур | е | | | | | | s5 | | | Pa | rse | | | | t5 | | | | u | 2 | | | d5 | | | | | 1 | | 1 | 0 | 1 | 0 | 0 | 0 | 1 | - | | - | s | s | s | s | s | P | Р | - | t | t | t | t | t | - | u | u | d | d | d | d | d | Rdd=vmux(Pu,Rss,Rtt) | | Field name | | Description | |------------|---------------|-------------| | RegType | Register type | | | MinOp | Minor opcode | |--------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | u2 | Field to encode register u | ## **11.10.8 XTYPE SHIFT** The XTYPE SHIFT instruction subclass includes instructions that perform shifts. ## Mask generate from immediate Generate a mask from two immediate values. | Syntax | Behavior | |------------------|----------------------------| | Rd=mask(#u5,#U5) | Rd = ((1 << #u) -1) << #U; | Class: XTYPE (slots 2,3) #### **Intrinsics** Rd=mask(#u5, #U5) Word32 Q6\_R\_mask\_II(Word32 Iu5, Word32 IU5) ## **Encoding** RegType | ; | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------| | Ī | I | CLA | ASS | ; | F | Reg | Тур | е | Ма | jОр | | | | | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | Ī | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | ı | I | - | - | - | - | - | Р | Р | 1 | i | i | i | i | i | ı | I | I | d | d | d | d | d | Rd=mask(#u5,#U5) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | MajOp | Major opcode | | | MinOp | Minor opcode | | Register type # Shift by immediate Shift the source register value right or left based on the type of instruction. In these instructions, contain the shift amount in an unsigned immediate (5 bits for 32-bit shifts, 6 bits for 64-bit shifts) and the shift instruction gives the shift direction. Arithmetic right shifts place the sign bit of the source value in the vacated positions, while logical right shifts place zeros in the vacated positions. Left shifts always zero-fill the vacated bits. | Syntax | Benavior | |-------------------|----------------------| | Rd=asl(Rs, #u5) | Rd = Rs << #u; | | Rd=asr(Rs,#u5) | Rd = Rs >> #u; | | Rd=lsr(Rs, #u5) | Rd = Rs >>> #u; | | Rd=rol(Rs, #u5) | $Rd = Rs \ll_R #u;$ | | Rdd=asl(Rss, #u6) | Rdd = Rss << #u; | | Rdd=asr(Rss,#u6) | Rdd = Rss >> #u; | | Rdd=lsr(Rss,#u6) | Rdd = Rss >>> #u; | | Rdd=rol(Rss,#u6) | $Rdd = Rss <<_R #u;$ | ## Class: XTYPE (slots 2,3) ## **Intrinsics** | Rd=asl(Rs, #u5) | Word32 | Q6_R_asl_RI(Word32 | Rs, | Word32 Iu5) | |-------------------|--------|--------------------|-----|---------------| | Rd=asr(Rs,#u5) | Word32 | Q6_R_asr_RI(Word32 | Rs, | Word32 Iu5) | | Rd=lsr(Rs, #u5) | Word32 | Q6_R_lsr_RI(Word32 | Rs, | Word32 Iu5) | | Rd=rol(Rs,#u5) | Word32 | Q6_R_rol_RI(Word32 | Rs, | Word32 Iu5) | | Rdd=asl(Rss, #u6) | Word64 | Q6_P_asl_PI(Word64 | Rss | , Word32 Iu6) | | Rdd=asr(Rss,#u6) | Word64 | Q6_P_asr_PI(Word64 | Rss | , Word32 Iu6) | Rdd=lsr(Rss, #u6) Word64 Q6\_P\_lsr\_PI(Word64 Rss, Word32 Iu6) Rdd=rol(Rss, #u6) Word64 Q6\_P\_rol\_PI(Word64 Rss, Word32 Iu6) ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|------|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|------------------| | | ICL | ASS | ; | F | Reg | Туре | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rdd=asr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 1 | d | d | d | d | d | Rdd=lsr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 0 | d | d | d | d | d | Rdd=asl(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 1 | d | d | d | d | d | Rdd=rol(Rss,#u6) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rd=asr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 1 | d | d | d | d | d | Rd=lsr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 1 | 0 | d | d | d | d | d | Rd=asl(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 1 | 1 | d | d | d | d | d | Rd=rol(Rs,#u5) | Field name Description ICLASSInstruction classParsePacket/loop parse bitsd5Field to encode register ds5Field to encode register s MajOpMajor opcodeMinOpMinor opcodeRegTypeRegister type ## Shift by immediate and accumulate Shift the source register value right or left based on the type of instruction. In these instructions, the shift amount is contained in an unsigned immediate (5 bits for 32-bit shifts, 6 bits for 64-bit shifts) and the shift instruction gives the shift direction. Arithmetic right shifts place the sign bit of the source value in the vacated positions, while logical right shifts place zeros in the vacated positions. Left shifts always zero-fill the vacated bits. After shifting, add or subtract the shifted value from the destination register or register pair. | Syntax | Behavior | |------------------------------------|-------------------------------------------------------------------| | <pre>Rx=add(#u8,asl(Rx,#U5))</pre> | $Rx=apply_extension(#u)+(Rx<<#U);$ | | <pre>Rx=add(#u8,lsr(Rx,#U5))</pre> | <pre>Rx=apply_extension(#u)+(((unsigned int)Rx)&gt;&gt;#U);</pre> | | <pre>Rx=sub(#u8,asl(Rx,#U5))</pre> | <pre>Rx=apply_extension(#u)-(Rx&lt;&lt;#U);</pre> | | <pre>Rx=sub(#u8,lsr(Rx,#U5))</pre> | <pre>Rx=apply_extension(#u)-(((unsigned int)Rx)&gt;&gt;#U);</pre> | | Rx[+-]=asl(Rs, #u5) | Rx = Rx [+-] Rs << #u; | | Rx[+-]=asr(Rs, #u5) | Rx = Rx [+-] Rs >> #u; | | Rx[+-]=lsr(Rs, #u5) | Rx = Rx [+-] Rs >>> #u; | | <pre>Rx[+-]=rol(Rs, #u5)</pre> | $Rx = Rx [+-] Rs <<_R #u;$ | | <pre>Rxx[+-]=asl(Rss, #u6)</pre> | Rxx = Rxx [+-] Rss << #u; | | <pre>Rxx[+-]=asr(Rss, #u6)</pre> | Rxx = Rxx [+-] Rss >> #u; | | <pre>Rxx[+-]=lsr(Rss, #u6)</pre> | Rxx = Rxx [+-] Rss >>> #u; | | <pre>Rxx[+-]=rol(Rss, #u6)</pre> | $Rxx = Rxx [+-] Rss <<_R #u;$ | # Class: XTYPE (slots 2,3) ## Intrinsics | Rx+=asr(Rs, #u5) Word32 Q6_R_asracc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx+=lsr(Rs, #u5) Word32 Q6_R_lsracc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx+=rol(Rs, #u5) Word32 Q6_R_rolacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=asl(Rs, #u5) Word32 Q6_R_aslnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=lsr(Rs, #u5) Word32 Q6_R_lsrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=rol(Rs, #u5) Word32 Q6_R_lsrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=rol(Rs, #u5) Word32 Q6_R_rolnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx=add(#u8, asl(Rx, #U5)) Word32 Q6_R_add_asl_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=add(#u8, lsr(Rx, #U5)) Word32 Q6_R_add_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8, asl(Rx, #U5)) Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rx+=rol(Rs, #u5) Word32 Q6_R_rolacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=asl(Rs, #u5) Word32 Q6_R_aslnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=asr(Rs, #u5) Word32 Q6_R_asrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=lsr(Rs, #u5) Word32 Q6_R_lsrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=rol(Rs, #u5) Word32 Q6_R_rolnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx=add(#u8,asl(Rx, #u5)) Word32 Q6_R_add_asl_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=add(#u8,lsr(Rx, #u5)) Word32 Q6_R_add_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8,asl(Rx, #u5)) Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8,lsr(Rx, #u5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8,lsr(Rx, #u5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8,lsr(Rx, #u5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Iu5) Rx=sub(#u8,lsr(Rx, #u5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 Rx, Word32 Iu5) | | Tu5 | | Iu5) Rx-=asr(Rs, #u5) Word32 Q6_R_asrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=lsr(Rs, #u5) Word32 Q6_R_lsrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=rol(Rs, #u5) Word32 Q6_R_rolnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) Rx=add(#u8, asl(Rx, #U5)) Word32 Q6_R_add_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=add(#u8, lsr(Rx, #U5)) Word32 Q6_R_add_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8, asl(Rx, #U5)) Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rxx+=asl(Rss, #u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 | | Rx-=lsr(Rs, #u5) Word32 Q6_R_lsrnac_RI (Word32 Rx, Word32 Rs, Word32 Iu5) Rx-=rol(Rs, #u5) Word32 Q6_R_rolnac_RI (Word32 Rx, Word32 Rs, Word32 Iu5) Rx=add(#u8, asl(Rx, #U5)) Word32 Q6_R_add_asl_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=add(#u8, lsr(Rx, #U5)) Word32 Q6_R_add_lsr_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8, asl(Rx, #U5)) Word32 Q6_R_sub_asl_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8, lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rxx+=asl(Rss, #u6) Word64 Q6_P_aslacc_PI (Word64 Rxx, Word64 Rss, Word32 | | Rx-=rol (Rs, #u5) Word32 Q6_R_rolnac_RI (Word32 Rx, Word32 Rs, Word32 Iu5) Rx=add(#u8,asl(Rx, #U5)) Word32 Q6_R_add_asl_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=add(#u8,lsr(Rx, #U5)) Word32 Q6_R_add_lsr_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,asl(Rx, #U5)) Word32 Q6_R_sub_asl_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,lsr(Rx, #U5)) Word32 Q6_R_sub_lsr_IRI (Word32 Iu8, Word32 Rx, Word32 IU5) Rxx+=asl(Rss, #u6) Word64 Q6_P_aslacc_PI (Word64 Rxx, Word64 Rss, Word32 | | Iu5) Rx=add(#u8,as1(Rx,#U5)) Word32 Q6_R_add_as1_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=add(#u8,lsr(Rx,#U5)) Word32 Q6_R_add_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,as1(Rx,#U5)) Word32 Q6_R_sub_as1_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,lsr(Rx,#U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rxx+=as1(Rss,#u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 | | IU5) Rx=add(#u8,lsr(Rx,#U5)) Word32 Q6_R_add_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,asl(Rx,#U5)) Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,lsr(Rx,#U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,lsr(Rx,#U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rxx+=asl(Rss,#u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 | | IU5) Rx=sub(#u8,asl(Rx,#U5)) Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rx=sub(#u8,lsr(Rx,#U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rxx+=asl(Rss,#u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 | | IU5) Rx=sub(#u8,lsr(Rx,#U5)) Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) Rxx+=asl(Rss,#u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 | | IU5) Rxx+=asl(Rss, #u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 | | | | | | Rxx+=asr(Rss, #u6) Word64 Q6_P_asracc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx+=lsr(Rss, #u6) Word64 Q6_P_lsracc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx+=rol(Rss, #u6) Word64 Q6_P_rolacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx-=asl(Rss, #u6) Word64 Q6_P_aslnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx-=asr(Rss, #u6) Word64 Q6_P_asrnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx-=lsr(Rss, #u6) Word64 Q6_P_lsrnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx-=rol(Rss, #u6) Word64 Q6_P_rolnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|------|---|---|-------------------------| | | ICL | ASS | 3 | I | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | ٨ | /linC | )p | | | х5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 0 | х | x | x | х | x | Rxx-=asr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | ß | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 1 | x | x | x | x | x | Rxx-=lsr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 0 | х | х | х | х | х | Rxx-=asl(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 1 | х | х | х | х | x | Rxx-=rol(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | ß | s | s | s | Р | Р | i | i | i | i | i | i | 1 | 0 | 0 | x | x | x | x | x | Rxx+=asr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | i | i | - | i | i | i | 1 | 0 | 1 | x | x | x | x | x | Rxx+=lsr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | P | Р | i | i | ï | i | i | i | 1 | 1 | 0 | x | x | x | x | x | Rxx+=asl(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | P | Р | i | i | ï | i | i | i | 1 | 1 | 1 | x | x | x | x | x | Rxx+=rol(Rss,#u6) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | ß | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 0 | x | x | x | x | x | Rx-=asr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | ß | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 1 | x | x | x | x | x | Rx-=lsr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | ß | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 1 | 0 | x | x | x | x | x | Rx-=asl(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 1 | 1 | х | х | х | х | х | Rx-=rol(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 0 | 0 | х | х | х | х | х | Rx+=asr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | S | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 0 | 1 | x | x | x | x | x | Rx+=lsr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 1 | 0 | х | х | х | х | х | Rx+=asl(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 1 | 1 | х | х | х | х | х | Rx+=rol(Rs,#u5) | | | ICL | ASS | 3 | I | Reg | Тур | е | | | | | | х5 | | | Pa | rse | | | | | | | | | | | N | /ajC | p | | | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | х | x | x | х | х | Р | Р | i | I | ı | I | I | I | i | i | i | 0 | i | 1 | 0 | - | Rx=add(#u8,asl(Rx,#U5)) | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | x | x | x | х | x | Р | Р | i | ı | - | ı | ı | ı | i | i | i | 0 | i | 1 | 1 | - | Rx=sub(#u8,asl(Rx,#U5)) | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | х | x | x | х | x | Р | Р | i | I | - | ı | I | ı | i | i | i | 1 | i | 1 | 0 | - | Rx=add(#u8,lsr(Rx,#U5)) | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | x | x | x | x | x | Р | Р | i | ı | ı | ı | I | I | i | i | i | 1 | i | 1 | 1 | - | Rx=sub(#u8,lsr(Rx,#U5)) | #### Field name ### Description RegType Register type ICLASS Instruction class Parse Packet/loop parse bits s5 Field to encode register s x5 Field to encode register x MajOp Major opcode MinOp Minor opcode # Shift by immediate and add Shift Rs left by 0-7 bits, add to Rt, and place the result in Rd. This instruction is useful for calculating array pointers, where destruction of the base pointer is undesirable. | Syntax | Behavior | |--------|----------| | | | Rd=addasl(Rt,Rs,#u3) $Rd = Rt + Rs \ll \#u;$ Class: XTYPE (slots 2,3) ### **Intrinsics** Rd=addasl(Rt,Rs,#u3) Word32 Q6\_R\_addasl\_RRI(Word32 Rt, Word32 Rs, Word32 Iu3) ## **Encoding** | 3 | 1 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|--------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|---|-----|---|---|---|----|---|---|---|---|----------------------| | Г | ICLASS | | | Reg | Тур | е | ٨ | Лаj | | | | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | d5 | | | | | | | 1 | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | t | t | t | t | t | i | i | i | d | d | d | d | d | Rd=addasl(Rt,Rs,#u3) | | Field name | | Description | |------------|-------------------|-------------| | ICLASS | Instruction class | | | ICHADD | instruction class | |---------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | ## Shift by immediate and logical Shift the source register value right or left based on the type of instruction. In these instructions, the shift amount is contained in an unsigned immediate (five bits for 32-bit shifts, six bits for 64-bit shifts) and the shift instruction gives the shift direction. Arithmetic right shifts place the sign bit of the source value in the vacated positions, while logical right shifts place zeros in the vacated positions. Left shifts always zero-fill the vacated bits. After shifting, take the logical AND, OR, or XOR of the shifted amount and the destination register or register pair, and place the result back in the destination register or register pair. Saturation is not available for these instructions. ### Syntax Behavior | <pre>Rx=and(#u8,asl(Rx,#U5))</pre> | <pre>Rx=apply_extension(#u)&amp;(Rx&lt;&lt;#U);</pre> | |--------------------------------------|-----------------------------------------------------------------------| | <pre>Rx=and(#u8,lsr(Rx,#U5))</pre> | <pre>Rx=apply_extension(#u)&amp;(((unsigned int)Rx)&gt;&gt;#U);</pre> | | <pre>Rx=or(#u8,asl(Rx,#U5))</pre> | <pre>Rx=apply_extension(#u) (Rx&lt;&lt;#U);</pre> | | Rx=or(#u8,lsr(Rx,#U5)) | <pre>Rx=apply_extension(#u) (((unsigned int)Rx)&gt;&gt;#U);</pre> | | <pre>Rx[&amp; ]=asl(Rs, #u5)</pre> | Rx = Rx [ &] Rs << #u; | | <pre>Rx[&amp; ]=asr(Rs, #u5)</pre> | Rx = Rx [ &] Rs >> #u; | | <pre>Rx[&amp; ]=lsr(Rs, #u5)</pre> | Rx = Rx [ &] Rs >>> #u; | | <pre>Rx[&amp; ]=rol(Rs, #u5)</pre> | $Rx = Rx [ \&] Rs <<_R #u;$ | | Rx^=asl(Rs, #u5) | $Rx = Rx ^ Rs << #u;$ | | Rx^=lsr(Rs, #u5) | Rx = Rx ^ Rs >>> #u; | | <pre>Rx^=rol(Rs, #u5)</pre> | $Rx = Rx ^ Rs <<_R #u;$ | | <pre>Rxx[&amp; ]=asl(Rss, #u6)</pre> | <pre>Rxx = Rxx [ &amp;] Rss &lt;&lt; #u;</pre> | | Syntax | Benavior | |--------------------------------------|-------------------------------| | <pre>Rxx[&amp; ]=asr(Rss, #u6)</pre> | Rxx = Rxx [ &] Rss >> #u; | | <pre>Rxx[&amp; ]=lsr(Rss, #u6)</pre> | Rxx = Rxx [ &] Rss >>> #u; | | <pre>Rxx[&amp; ]=rol(Rss, #u6)</pre> | $Rxx = Rxx [ &] Rss <<_R #u;$ | | <pre>Rxx^=asl(Rss, #u6)</pre> | Rxx = Rxx ^ Rss << #u; | | <pre>Rxx^=lsr(Rss,#u6)</pre> | Rxx = Rxx ^ Rss >>> #u; | | <pre>Rxx^=rol(Rss, #u6)</pre> | $Rxx = Rxx ^ Rss <<_R #u;$ | # Class: XTYPE (slots 2,3) ## Intrinsics | Rx&=asl(Rs,#u5) | Word32 Q6_R_asland_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | |------------------------------------|------------------------------------------------------------| | Rx&=asr(Rs,#u5) | Word32 Q6_R_asrand_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | Rx&=lsr(Rs,#u5) | Word32 Q6_R_lsrand_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | Rx&=rol(Rs,#u5) | Word32 Q6_R_roland_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | <pre>Rx=and(#u8,asl(Rx,#U5))</pre> | Word32 Q6_R_and_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) | | <pre>Rx=and(#u8,lsr(Rx,#U5))</pre> | Word32 Q6_R_and_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) | | <pre>Rx=or(#u8,asl(Rx,#U5))</pre> | Word32 Q6_R_or_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) | | <pre>Rx=or(#u8,lsr(Rx,#U5))</pre> | Word32 Q6_R_or_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) | | Rx^=asl(Rs,#u5) | Word32 Q6_R_aslxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | Rx^=lsr(Rs, #u5) | Word32 Q6_R_lsrxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | Rx^=rol(Rs,#u5) | Word32 Q6_R_rolxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | $Rx \mid =asl(Rs, #u5)$ | Word32 Q6_R_aslor_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | <pre>Rx =asr(Rs, #u5)</pre> | Word32 Q6_R_asror_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | Rx =lsr(Rs,#u5) | Word32 Q6_R_lsror_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | <pre>Rx =rol(Rs, #u5)</pre> | Word32 Q6_R_rolor_RI(Word32 Rx, Word32 Rs, Word32 Iu5) | | Rxx&=asl(Rss,#u6) | Word64 Q6_P_asland_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx&=asr(Rss,#u6) | Word64 Q6_P_asrand_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | Rxx&=lsr(Rss,#u6) | Word64 Q6_P_lsrand_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | |-------------------------------|------------------------------------------------------------| | Rxx&=rol(Rss,#u6) | Word64 Q6_P_roland_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | <pre>Rxx^=asl(Rss, #u6)</pre> | Word64 Q6_P_aslxacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | <pre>Rxx^=lsr(Rss, #u6)</pre> | Word64 Q6_P_lsrxacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | <pre>Rxx^=rol(Rss, #u6)</pre> | Word64 Q6_P_rolxacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | <pre>Rxx =asl(Rss, #u6)</pre> | Word64 Q6_P_aslor_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | <pre>Rxx =asr(Rss, #u6)</pre> | Word64 Q6_P_asror_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | <pre>Rxx =lsr(Rss, #u6)</pre> | Word64 Q6_P_lsror_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | | <pre>Rxx =rol(Rss,#u6)</pre> | Word64 Q6_P_rolor_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|-------------------| | | ICL/ | ASS | 3 | F | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | х5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 0 | х | x | x | x | х | Rxx&=asr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 1 | х | x | x | x | х | Rxx&=lsr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 0 | х | x | x | x | х | Rxx&=asl(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 1 | х | x | x | x | x | Rxx&=rol(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 1 | 0 | 0 | х | x | х | х | х | Rxx =asr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 1 | 0 | 1 | х | x | х | х | х | Rxx =lsr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 1 | 1 | 0 | х | x | x | x | х | Rxx =asl(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Ρ | i | i | i | i | i | i | 1 | 1 | 1 | x | x | x | x | x | Rxx =rol(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 0 | 1 | х | x | х | х | х | Rxx^=lsr(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 0 | х | x | х | x | х | Rxx^=asl(Rss,#u6) | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 0 | 1 | 1 | x | x | x | x | x | Rxx^=rol(Rss,#u6) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 0 | х | x | х | х | х | Rx&=asr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 1 | х | x | x | x | х | Rx&=lsr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | P | 0 | i | i | i | i | i | 0 | 1 | 0 | x | x | x | x | x | Rx&=asl(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Ρ | 0 | i | i | i | i | i | 0 | 1 | 1 | x | x | x | x | x | Rx&=rol(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 0 | 0 | x | x | x | x | x | Rx =asr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | P | 0 | i | i | i | i | ÷ | 1 | 0 | 1 | x | x | x | x | x | Rx =lsr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | - | 1 | 1 | 0 | x | x | x | x | x | Rx =asl(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 1 | 1 | 1 | x | x | х | x | x | Rx =rol(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 1 | x | x | x | x | x | Rx^=lsr(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | 0 | i | i | i | i | - | 0 | 1 | 0 | x | x | x | x | x | Rx^=asl(Rs,#u5) | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | - | s | s | s | s | s | P | Р | 0 | i | i | i | i | i | 0 | 1 | 1 | x | x | x | x | x | Rx^=rol(Rs,#u5) | | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|---|---|---|---|-------|----|---|-------------------------| | | ICL | ASS | 3 | | Reg | Тур | е | | | | | | х5 | | | Pa | rse | | | | | | | | | | | Λ | /lajC | )p | | | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | x | x | x | x | x | Р | P | i | ı | I | I | ı | I | i | i | i | 0 | i | 0 | 0 | - | Rx=and(#u8,asl(Rx,#U5)) | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | х | х | х | х | х | Р | Р | i | ı | I | I | ı | I | i | i | i | 0 | i | 0 | 1 | - | Rx=or(#u8,asl(Rx,#U5)) | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | х | х | х | х | х | Р | Р | i | ı | I | I | ı | ı | i | i | i | 1 | i | 0 | 0 | - | Rx=and(#u8,lsr(Rx,#U5)) | | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | i | i | i | х | х | х | х | х | Р | Р | i | ı | I | I | ı | ı | i | i | i | 1 | i | 0 | 1 | - | Rx=or(#u8,lsr(Rx,#U5)) | ### Field name ## Description RegType Register type ICLASS Instruction class Parse Packet/loop parse bits s5 Field to encode register s x5 Field to encode register x MajOp Major opcode MinOp Minor opcode ## Shift right by immediate with rounding Perform an arithmetic right shift by an immediate amount, and then round the result. This instruction first shifts right, then adds the value +1 to the result, and finally shifts right again by one bit. The right shifts always inserts the sign-bit in the vacated position. When using the asrrnd instruction, the assembler adjusts the immediate appropriately. Syntax Behavior # Class: XTYPE (slots 2,3) ## Intrinsics | Rd=asr(Rs, #u5):rnd | Word32<br>Iu5) | Q6_R_asr_RI_rnd(Word32 Rs, Word32 | |----------------------|----------------|------------------------------------| | Rd=asrrnd(Rs,#u5) | Word32<br>Iu5) | Q6_R_asrrnd_RI(Word32 Rs, Word32 | | Rdd=asr(Rss,#u6):rnd | Word64<br>Iu6) | Q6_P_asr_PI_rnd(Word64 Rss, Word32 | | Rdd=asrrnd(Rss,#u6) | Word64<br>Iu6) | Q6_P_asrrnd_PI(Word64 Rss, Word32 | ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|---|---|---|----|---|---|----------------------| | Ī | ICL | .ASS | 3 | ı | Reg | Тур | е | Ma | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | р | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | s | s | Р | Р | i | i | i | i | i | i | 1 | 1 | 1 | d | d | d | d | d | Rdd=asr(Rss,#u6):rnd | | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rd=asr(Rs,#u5):rnd | Description | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | MajOp | Major opcode | | MinOp | Minor opcode | | RegType | Register type | ## Shift left by immediate with saturation Perform a left shift of the 32-bit source register value by an immediate amount and saturate. Saturation first sign-extends the 32-bit Rs register to 64 bits. It is then left shifted by the immediate amount. If this 64-bit value cannot fit in a signed 32-bit number (the upper word is not the sign-extension of bit 31), saturation is performed based on the sign of the original value. Saturation clamps the 32-bit result to the range 0x8000\_0000 to 0x7fff\_ffff. | Syntax | Behavior | |---------------------|------------------------------------------| | Rd=asl(Rs, #u5):sat | $Rd = sat_{32}(sxt_{32->64}(Rs) << #u);$ | Class: XTYPE (slots 2,3) ### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** ### **Encoding** | ; | 31 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|--------------------| | Ī | ICI | ASS | 3 | | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | Ī | 1 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | s | w | ø | ø | s | Р | Р | 0 | i | i | i | i | i | 0 | 1 | 0 | d | d | d | d | d | Rd=asl(Rs,#u5):sat | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | RegType | Register type | | # Shift by register The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If the shift amount is negative (bit six of Rt is set), the direction of the shift indicated in the opcode is reversed (see figure). The source data to shift is always performed as a 64-bit shift. When the Rs source register is a 32-bit register, this register is first sign or zero-extended to 64 bits. Arithmetic shifts sign-extend the 32-bit source to 64 bits, while logical shifts zero extend. The 64-bit source value is then right or left shifted based on the shift amount and the type of instruction. Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right shifts place zeros in the vacated positions. | Syntax | Behavior | |-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Rd=asl(Rs,Rt) | | | Rd=asr(Rs,Rt) | | | Rd=lsl(#s6,Rt) | $ \begin{array}{lll} {\rm shamt} & = & {\rm sxt}_{7->32} ({\rm Rt}) ; \\ {\rm Rd} & = & ({\rm shamt}>0) ? ({\rm zxt}_{32->64} (\#{\rm s}) << {\rm shamt}) ; \\ \end{array} $ | | Rd=lsl(Rs,Rt) | | | Rd=lsr(Rs,Rt) | | | Rdd=asl(Rss,Rt) | <pre>shamt=sxt<sub>7-&gt;32</sub> (Rt); Rdd = (shamt&gt;0)?(Rss&lt;<shamt):(rss>&gt;shamt);</shamt):(rss></pre> | | Rdd=asr(Rss,Rt) | <pre>shamt=sxt<sub>7-&gt;32</sub> (Rt); Rdd = (shamt&gt;0)?(Rss&gt;&gt;shamt):(Rss&lt;<shamt);< pre=""></shamt);<></pre> | ### **Syntax** $\label{eq:rdd=lsl} \mbox{Rdd=lsl(Rss,Rt)} \quad \mbox{shamt=sxt}_{7->32} \, \mbox{(Rt)} \; ;$ Rdd = (shamt>0)?(Rss<<shamt):(Rss>>>shamt); Rdd=lsr(Rss,Rt) $shamt=sxt_{7->32}(Rt);$ Rdd = (shamt>0)?(Rss>>>shamt):(Rss<<shamt);</pre> **Behavior** ## Class: XTYPE (slots 2,3) ### **Intrinsics** | Rd=asl(Rs,Rt) | Word32 | Q6_R_asl_RR(Word32 | Rs, Word32 Rt) | |-----------------|--------|--------------------|-----------------| | Rd=asr(Rs,Rt) | Word32 | Q6_R_asr_RR(Word32 | Rs, Word32 Rt) | | Rd=ls1(#s6,Rt) | Word32 | Q6_R_lsl_IR(Word32 | Is6, Word32 Rt) | | Rd=lsl(Rs,Rt) | Word32 | Q6_R_lsl_RR(Word32 | Rs, Word32 Rt) | | Rd=lsr(Rs,Rt) | Word32 | Q6_R_lsr_RR(Word32 | Rs, Word32 Rt) | | Rdd=asl(Rss,Rt) | Word64 | Q6_P_asl_PR(Word64 | Rss, Word32 Rt) | | Rdd=asr(Rss,Rt) | Word64 | Q6_P_asr_PR(Word64 | Rss, Word32 Rt) | | Rdd=1s1(Rss,Rt) | Word64 | Q6_P_lsl_PR(Word64 | Rss, Word32 Rt) | | Rdd=lsr(Rss,Rt) | Word64 | Q6_P_lsr_PR(Word64 | Rss, Word32 Rt) | ## **Encoding** Маj Min RegType | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|-----------------| | | ICL | ASS | ; | | Reg | Тур | е | М | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rdd=asr(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rdd=lsr(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rdd=asl(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | - | s | s | s | s | s | P | Р | | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rdd=lsl(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rd=asr(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | - | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 1 | • | d | d | d | d | d | Rd=lsr(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | | s | s | s | s | s | P | Р | | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rd=asl(Rs,Rt) | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | | s | s | s | s | s | P | Р | | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rd=IsI(Rs,Rt) | | | ICL | ASS | ; | | Reg | Тур | е | М | laj | | | | | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | - | i | i | i | i | i | P | Р | - | t | t | t | t | t | 1 | 1 | i | d | d | d | d | d | Rd=IsI(#s6,Rt) | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | Major opcode Minor opcode Register type Syntay ## Shift by register and accumulate The shift amount is the least significant seven bits of Rt, treated as a two's complement value. When the shift amount is negative (bit six of Rt is set), reverse the direction of the shift indicated in the opcode. Shift the source register value right or left based on the shift amount and the type of instruction. Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right shifts place zeros in the vacated positions. The shift operation is always performed as a 64-bit shift. When Rs is a 32-bit register, this register is first sign- or zero-extended to 64 bits. Arithmetic shifts sign-extend the 32-bit source to 64 bits, while logical shifts zero extend. After shifting, add or subtract the 64-bit shifted amount from the destination register or register pair. Rehavior | Syntax | Benavior | |------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | <pre>Rx[+-]=asl(Rs,Rt)</pre> | | | Rx[+-]=asr(Rs,Rt) | | | Rx[+-]=lsl(Rs,Rt) | | | Rx[+-]=lsr(Rs,Rt) | $\begin{split} & \text{shamt=sxt}_{7->32} (\text{Rt}) ; \\ & \text{Rx = Rx [+-] (shamt>0)?(zxt}_{32->64} (\text{Rs})>>> \text{shamt}) ; \\ & \text{>>>} (\text{Rs}) << \text{shamt}) ; \end{split}$ | | Rxx[+-] = asl(Rss,Rt) | <pre>shamt=sxt<sub>7-&gt;32</sub>(Rt); Rxx = Rxx [+-] (shamt&gt;0)?(Rss&lt;<shamt):(rss>&gt;shamt);</shamt):(rss></pre> | | Syntax | Behavior | |----------------------------------|----------------------------------------------------------------------------------------------------------------------------------| | <pre>Rxx[+-] = asr(Rss,Rt)</pre> | <pre>shamt=sxt<sub>7-&gt;32</sub>(Rt); Rxx = Rxx [+-] (shamt&gt;0)?(Rss&gt;&gt;shamt):(Rss&lt;<shamt);< pre=""></shamt);<></pre> | | <pre>Rxx[+-] =lsl(Rss,Rt)</pre> | | | <pre>Rxx[+-] = lsr(Rss,Rt)</pre> | $shamt=sxt_{7->32}(Rt);$ $Rxx = Rxx [+-] (shamt>0)?(Rss>>>shamt):(Rss<$ | # Class: XTYPE (slots 2,3) ## Intrinsics | <pre>Rx+=asl(Rs,Rt)</pre> | Word32<br>Rt) | Q6_R_ | _aslacc | _RR(Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | |-----------------------------|---------------|-------|---------|----------|------|------|--------|-----|----------|--| | Rx+=asr(Rs,Rt) | Word32<br>Rt) | Q6_R_ | _asracc | _RR (Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | | Rx+=lsl(Rs,Rt) | Word32<br>Rt) | Q6_R_ | lslacc | _RR (Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | | Rx+=lsr(Rs,Rt) | Word32<br>Rt) | Q6_R_ | _lsracc | _RR (Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | | Rx-=asl(Rs,Rt) | Word32<br>Rt) | Q6_R_ | _aslnac | _RR(Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | | Rx-=asr(Rs,Rt) | Word32<br>Rt) | Q6_R_ | _asrnac | _RR (Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | | Rx-=lsl(Rs,Rt) | Word32<br>Rt) | Q6_R_ | lslnac | _RR (Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | | Rx-=lsr(Rs,Rt) | Word32<br>Rt) | Q6_R_ | _lsrnac | _RR (Wo | rd32 | Rx, | Word32 | Rs, | Word32 | | | <pre>Rxx+=asl(Rss,Rt)</pre> | Word64<br>Rt) | Q6_P_ | aslacc | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | <pre>Rxx+=asr(Rss,Rt)</pre> | Word64<br>Rt) | Q6_P_ | asracc | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | <pre>Rxx+=lsl(Rss,Rt)</pre> | Word64<br>Rt) | Q6_P_ | lslacc | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | <pre>Rxx+=lsr(Rss,Rt)</pre> | Word64<br>Rt) | Q6_P_ | lsracc | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | <pre>Rxx-=asl(Rss,Rt)</pre> | Word64<br>Rt) | Q6_P_ | aslnac | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | <pre>Rxx-=asr(Rss,Rt)</pre> | Word64<br>Rt) | Q6_P_ | asrnac | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | Rxx-=lsl(Rss,Rt) | Word64<br>Rt) | Q6_P_ | lslnac | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | Rxx-=lsr(Rss,Rt) | Word64<br>Rt) | Q6_P_ | lsrnac | _PR (Wo: | rd64 | Rxx, | Word64 | Rss | , Word32 | | | | | | | | | | | | | | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|------------------| | | ICL/ | ASS | 3 | I | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | N | lin | | | | х5 | | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | х | x | x | x | х | Rxx-=asr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | х | x | х | х | х | Rxx-=lsr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 1 | 0 | - | x | x | x | x | x | Rxx-=asl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | х | х | х | х | х | Rxx-=IsI(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 0 | 0 | - | x | x | x | x | x | Rxx+=asr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 0 | 1 | - | x | x | x | x | x | Rxx+=lsr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 1 | 0 | - | x | x | x | x | x | Rxx+=asl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 1 | 1 | - | x | x | x | x | x | Rxx+=lsl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 0 | 0 | - | x | x | x | x | x | Rx-=asr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | х | х | х | х | х | Rx-=lsr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 1 | 0 | - | x | x | x | x | x | Rx-=asl(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | | s | S | s | s | s | P | Р | - | t | t | t | t | t | 1 | 1 | - | x | x | x | x | x | Rx-=IsI(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | х | х | х | х | х | Rx+=asr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | | s | ß | s | s | s | P | Р | - | t | t | t | t | t | 0 | 1 | - | x | x | x | x | x | Rx+=lsr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | х | x | х | х | х | Rx+=asl(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | x | x | x | x | x | Rx+=IsI(Rs,Rt) | | E | اما | ы | na | me | |---|-----|----|----|----| | г | ıe | ıu | Ha | me | ## Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | Maj Major opcode Min Minor opcode RegType Register type ## Shift by register and logical The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If the shift amount is negative (bit 6 of Rt is set), reverse the direction of the shift indicated in the opcode. Shift the source register value right or left based on the shift amount and the type of instruction. Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right shifts place zeros in the vacated positions. The shift operation is always performed as a 64-bit shift. When the Rs source register is a 32-bit register, this register is first sign or zero-extended to 64 bits. Arithmetic shifts sign-extend the 32-bit source to 64 bits, while logical shifts zero extend. After shifting, take the logical AND or OR of the shifted amount and the destination register or register pair, and place the result back in the destination register or register pair. Saturation is not available for these instructions. | Syntax | Behavior | |----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | <pre>Rx[&amp; ]=asl(Rs,Rt)</pre> | | | <pre>Rx[&amp; ]=asr(Rs,Rt)</pre> | $ \begin{array}{l} {\rm shamt} = {\rm sxt}_{7 - > 32} ({\rm Rt}) ; \\ {\rm Rx} = {\rm Rx} \left[ \& \right] \left( {\rm shamt} {>} 0 \right) ? \left( {\rm sxt}_{32 - > 64} ({\rm Rs}) \right) > > {\rm shamt}) ; \\ {\rm < shamt}) ; \\ \end{array} $ | | <pre>Rx[&amp; ]=ls1(Rs,Rt)</pre> | | | <pre>Rx[&amp; ]=lsr(Rs,Rt)</pre> | | | Syntax | Behavior | |------------------------------------|------------------------------------------------------------------------------------------------------------------------------------| | <pre>Rxx[&amp; ]=asl(Rss,Rt)</pre> | | | <pre>Rxx[&amp; ]=asr(Rss,Rt)</pre> | | | <pre>Rxx[&amp; ]=lsl(Rss,Rt)</pre> | | | <pre>Rxx[&amp; ]=lsr(Rss,Rt)</pre> | | | <pre>Rxx^=asl(Rss,Rt)</pre> | <pre>shamt=sxt<sub>7-&gt;32</sub> (Rt); Rxx = Rxx ^ (shamt&gt;0)?(Rss&lt;<shamt):(rss>&gt;shamt);</shamt):(rss></pre> | | <pre>Rxx^=asr(Rss,Rt)</pre> | <pre>shamt=sxt<sub>7-&gt;32</sub> (Rt); Rxx = Rxx ^ (shamt&gt;0)?(Rss&gt;&gt;shamt):(Rss&lt;<shamt);< pre=""></shamt);<></pre> | | <pre>Rxx^=lsl(Rss,Rt)</pre> | <pre>shamt=sxt<sub>7-&gt;32</sub> (Rt); Rxx = Rxx ^ (shamt&gt;0)?(Rss&lt;<shamt):(rss>&gt;&gt;shamt);</shamt):(rss></pre> | | <pre>Rxx^=lsr(Rss,Rt)</pre> | <pre>shamt=sxt<sub>7-&gt;32</sub> (Rt); Rxx = Rxx ^ (shamt&gt;0)?(Rss&gt;&gt;&gt;shamt):(Rss&lt;<shamt);< pre=""></shamt);<></pre> | # Class: XTYPE (slots 2,3) ## Intrinsics | Rx&=asl(Rs,Rt) | Word32 Q6_R_asland_RR(Word32 Rx, Word32 Rs, Word32 Rt) | |---------------------------------|-----------------------------------------------------------| | Rx&=asr(Rs,Rt) | Word32 Q6_R_asrand_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx&=lsl(Rs,Rt) | Word32 Q6_R_lsland_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx&=lsr(Rs,Rt) | Word32 Q6_R_lsrand_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx =asl(Rs,Rt)</pre> | Word32 Q6_R_aslor_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx =asr(Rs,Rt)</pre> | Word32 Q6_R_asror_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | Rx =lsl(Rs,Rt) | Word32 Q6_R_lslor_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rx =lsr(Rs,Rt)</pre> | Word32 Q6_R_lsror_RR(Word32 Rx, Word32 Rs, Word32 Rt) | | <pre>Rxx&amp;=asl(Rss,Rt)</pre> | Word64 Q6_P_asland_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | Rxx&=asr(Rss,Rt) | Word64 Q6_P_asrand_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx&amp;=lsl(Rss,Rt)</pre> | Word64 Q6_P_lsland_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx&amp;=lsr(Rss,Rt)</pre> | Word64 Q6_P_lsrand_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx^=asl(Rss,Rt)</pre> | Word64 Q6_P_aslxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx^=asr(Rss,Rt)</pre> | Word64 Q6_P_asrxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx^=lsl(Rss,Rt)</pre> | Word64 Q6_P_lslxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx^=lsr(Rss,Rt)</pre> | Word64 Q6_P_lsrxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | |-----------------------------|-----------------------------------------------------------| | <pre>Rxx =asl(Rss,Rt)</pre> | Word64 Q6_P_aslor_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx =asr(Rss,Rt)</pre> | Word64 Q6_P_asror_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx =lsl(Rss,Rt)</pre> | Word64 Q6_P_lslor_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | | <pre>Rxx =lsr(Rss,Rt)</pre> | Word64 Q6_P_lsror_PR(Word64 Rxx, Word64 Rss, Word32 Rt) | # **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | М | aj | | | | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | х5 | | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | х | x | х | х | х | Rxx =asr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | х | x | х | х | х | Rxx =lsr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | х | x | x | х | x | Rxx =asl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 1 | 1 | - | x | x | x | x | x | Rxx =lsl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | - | x | x | x | x | x | Rxx&=asr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Ρ | P | - | t | t | t | t | t | 0 | 1 | - | x | x | x | x | x | Rxx&=lsr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 1 | 0 | - | x | x | x | x | x | Rxx&=asl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 1 | 1 | - | x | x | x | x | x | Rxx&=lsl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Ρ | P | - | t | t | t | t | t | 0 | 0 | - | x | x | x | x | x | Rxx^=asr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 1 | - | x | x | x | x | x | Rxx^=lsr(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | P | - | t | t | t | t | t | 1 | 0 | - | x | x | x | x | x | Rxx^=asl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | х | x | x | х | x | Rxx^=lsl(Rss,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 0 | - | x | x | x | x | x | Rx =asr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | - | s | s | s | s | s | Р | P | - | t | t | t | t | t | 0 | 1 | - | x | x | x | x | x | Rx =lsr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | - | s | s | s | s | s | Ρ | P | - | t | t | t | t | t | 1 | 0 | - | x | x | x | x | x | Rx =asl(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | - | s | s | s | s | s | Р | P | - | t | t | t | t | t | 1 | 1 | - | x | x | x | x | x | Rx =IsI(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 0 | - | х | x | x | х | х | Rx&=asr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | х | x | x | x | x | Rx&=lsr(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | х | x | x | x | х | Rx&=asl(Rs,Rt) | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | х | x | x | x | х | Rx&=lsl(Rs,Rt) | | Field name | Description | |------------|-------------| | | | | ICLASS | Instruction class | |---------|----------------------------| | Parse | Packet/loop parse bits | | s5 | Field to encode register s | | t5 | Field to encode register t | | x5 | Field to encode register x | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | # Shift by register with saturation The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If the shift amount is negative (bit six of Rt is set), reverse the direction of the shift indicated in the opcode. Saturation is available for 32-bit arithmetic left shifts. This can be either an ASL instruction with positive Rt, or an ASR instruction with negative Rt. Saturation first sign-extends the 32-bit Rs register to 64 bits. It is then shifted by the shift amount. If this 64-bit value cannot fit in a signed 32-bit number (the upper word is not the sign-extension of bit 31), saturation is performed based on the sign of the original value. Saturation clamps the 32-bit result to the range 0x80000000 to 0x7fffffff. | Syntax | Behavior | |--------|----------| | Syntax | Denavior | | Rd=asl(Rs,Rt):sat | <pre>shamt=sxt<sub>7-&gt;32</sub> (Rt); Rd = bidir_shiftl(Rs,shamt);</pre> | |-------------------|----------------------------------------------------------------------------| | Rd=asr(Rs,Rt):sat | <pre>shamt=sxt<sub>7-&gt;32</sub>(Rt); Rd = bidir_shiftr(Rs,shamt);</pre> | Class: XTYPE (slots 2,3) #### **Notes** If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. #### **Intrinsics** ``` Rd=asl(Rs,Rt):sat Word32 Q6_R_asl_RR_sat(Word32 Rs, Word32 Rt) Rd=asr(Rs,Rt):sat Word32 Q6_R_asr_RR_sat(Word32 Rs, Word32 Rt) ``` ### **Encoding** RegType | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|------|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|-------------------| | | ICL. | ASS | 3 | | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | P | Р | - | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rd=asr(Rs,Rt):sat | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | - | s | s | s | s | s | P | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rd=asl(Rs,Rt):sat | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | Maj | Major opcode | | | Min | Minor opcode | | Register type # Vector shift halfwords by immediate Shift individual halfwords of the source vector. Arithmetic right shifts place the sign bit of the source values in the vacated positions. Logical right shifts place zeros in the vacated positions. # Syntax Behavior | Rdd=vaslh(Rss, #u4) | <pre>for (i=0;i&lt;4;i++) { Rdd.h[i]=(Rss.h[i]&lt;&lt;#u); }</pre> | |---------------------|------------------------------------------------------------------------| | Rdd=vasrh(Rss, #u4) | <pre>for (i=0;i&lt;4;i++) { Rdd.h[i]=(Rss.h[i]&gt;&gt;#u); }</pre> | | Rdd=vlsrh(Rss, #u4) | <pre>for (i=0;i&lt;4;i++) { Rdd.h[i]=(Rss.uh[i]&gt;&gt;#u); }</pre> | ## Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=vaslh(Rss,#u4) | Word64 Q6_P_vaslh_PI(Word64 Rss, Word32 Iu4) | |--------------------|----------------------------------------------| | Rdd=vasrh(Rss,#u4) | Word64 Q6_P_vasrh_PI(Word64 Rss, Word32 Iu4) | | Rdd=vlsrh(Rss,#u4) | Word64 Q6_P_vlsrh_PI(Word64 Rss, Word32 Iu4) | ## **Encoding** | 3 | 31 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-------|------|----|----|-----|-----|----|----|-----|----|----|---------|----|----|----|----|-----|----|----|----|----|---|---|---|------|---|---|---|----|--------------------|---|--------------------| | Ī | IC | LASS | 3 | ı | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | р | | | d5 | | | | | | 1 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 0 | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rdd=vasrh(Rss,#u4) | | | 1 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s s s s | | Р | Р | 0 | 0 | i | i | i | i | 0 | 0 | 1 | d | d | d | d | d | Rdd=vlsrh(Rss,#u4) | | | | Γ. | 1 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | s | s | s | s | s | Р | Р | 0 | 0 | i | i | i | i | 0 | 1 | 0 | d | d | d | d | d | Rdd=vaslh(Rss,#u4) | ## Field name Description | ICLASS | Instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | MajOp | Major opcode | Majop Major opcode MinOp Minor opcode RegType Register type ## Vector arithmetic shift halfwords with round For each halfword in the vector, round then arithmetic shift right by an immediate amount. Store the results in the destination register. ### Syntax Behavior ``` Rdd = vasrh(Rss, #u4):raw Rdd.h[i]=( ((Rss.h[i] >> #u)+1)>>1 ); } Rdd = vasrh(Rss, #u4):rnd if ("#u4==0") { Assembler mapped to: "Rdd=Rss"; } else { Assembler mapped to: "Rdd=vasrh(Rss, #u4-1):raw"; } ``` ### Class: XTYPE (slots 2,3) #### **Intrinsics** ``` Rdd=vasrh(Rss, #u4):rnd Word64 Q6_P_vasrh_PI_rnd(Word64 Rss, Word32 Iu4) ``` ## **Encoding** | 3 | 1 3 | 0 2 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|------------------------| | | IC | LA | SS | | ı | Reg | Тур | е | Ма | jОр | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | ) ( | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | s | s | s | s | s | Р | Р | 0 | 0 | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rdd=vasrh(Rss,#u4):raw | | Field name | | Description | |------------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | ReaType | Register type | | # Vector arithmetic shift halfwords with saturate and pack For each halfword in the vector, optionally round, then arithmetic shift right by an immediate amount. Saturate the results to unsigned [0-255] and then pack in the destination register. Syntax Behavior Class: XTYPE (slots 2,3) ### Notes If saturation occurs during execution of this instruction (a result is clamped to either maximum or minimum values), the OVF bit in the status register is set. OVF remains set until explicitly cleared by a transfer to SR. ### **Intrinsics** ``` Rd = Word32 Q6_R_vasrhub_PI_rnd_sat(Word64 Rss, word32 Iu4) Rd = vasrhub(Rss, #u4):sat Word32 Q6_R_vasrhub_PI_sat(Word64 Rss, Word32 Iu4) ``` ## **Encoding** | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |----|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | 0 | i | i | i | i | 1 | 0 | 0 | d | d | d | d | d | Rd=vasrhub(Rss,#u4):raw | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | s | s | s | s | s | Р | Р | 0 | 0 | i | i | i | i | 1 | 0 | 1 | d | d | d | d | d | Rd=vasrhub(Rss,#u4):sat | Description | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | MajOp | Major opcode | MinOp Minor opcode RegType Register type ## Vector shift halfwords by register The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If the shift amount is negative, reverse the direction of the shift. Shift the source values right or left based on the shift amount and the type of instruction. Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right shifts place zeros in the vacated positions. ``` Syntax Behavior Rdd=vaslh(Rss,Rt) for (i=0; i<4; i++) { Rdd.h[i] = (sxt_{7->32}(Rt)>0)?(sxt_{16->64}(Rss.h[i]) << sxt_{7-} _{>32} (Rt)): (sxt<sub>16->64</sub> (Rss.h[i])>>sxt<sub>7->32</sub> (Rt)); Rdd=vasrh(Rss,Rt) for (i=0;i<4;i++) { Rdd.h[i] = (sxt_{7->32}(Rt)>0)?(sxt_{16->64}(Rss.h[i])>> sxt_{7-} _{>32} (Rt)): (sxt<sub>16->64</sub> (Rss.h[i]) << sxt<sub>7->32</sub> (Rt)); for (i=0;i<4;i++) { Rdd=vlslh(Rss,Rt) \texttt{Rdd.h[i]} = (\texttt{sxt}_{7->32}(\texttt{Rt}) > \texttt{0}) ? (\texttt{zxt}_{16->64}(\texttt{Rss.uh[i]}) < < \texttt{sxt}_{7-} _{>32} (Rt)): (zxt_{16->64} (Rss.uh[i])>>>sxt_{7->32} (Rt)); Rdd=vlsrh(Rss,Rt) for (i=0; i<4; i++) { Rdd.h[i] = (sxt_{7->32}(Rt)>0)?(zxt_{16->64}(Rss.uh[i])>>> sxt_{7-} ``` $_{>32}$ (Rt)): (zxt $_{16->64}$ (Rss.uh[i]) << sxt $_{7->32}$ (Rt)); Class: XTYPE (slots 2,3) ### **Notes** • If the number of bits to shift is greater than the width of the vector element, the result is either sign-bits (for arithmetic right shifts) or zeros for logical and left shifts. ## Intrinsics | Rdd=vaslh(Rss,Rt) | Word64 | Q6_P_vaslh_PR(Word64 | Rss, | Word32 | Rt) | |-------------------|--------|----------------------|------|--------|-----| | Rdd=vasrh(Rss,Rt) | Word64 | Q6_P_vasrh_PR(Word64 | Rss, | Word32 | Rt) | | Rdd=vlslh(Rss,Rt) | Word64 | Q6_P_vlslh_PR(Word64 | Rss, | Word32 | Rt) | | Rdd=vlsrh(Rss,Rt) | Word64 | Q6_P_vlsrh_PR(Word64 | Rss, | Word32 | Rt) | # **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|-------------------| | Ī | ICL | ASS | 3 | | Reg | Тур | е | M | laj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | | 1 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | | s | s | s | s | s | Р | Р | | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rdd=vasrh(Rss,Rt) | | | 1 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | | s | s | s | s | s | P | Ρ | | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rdd=vlsrh(Rss,Rt) | | | 1 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | | s | s | s | s | s | Р | Р | | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rdd=vaslh(Rss,Rt) | | Ī | 1 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rdd=vlslh(Rss,Rt) | | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | ## Description # Vector shift words by immediate Shift individual words of the source vector. Arithmetic right shifts place the sign bit of the source values in the vacated positions. Logical right shifts place zeros in the vacated positions. ### Syntax Behavior | Rdd=vaslw(Rss, #u5) | <pre>for (i=0;i&lt;2;i++) { Rdd.w[i]=(Rss.w[i]&lt;&lt;#u); }</pre> | |---------------------|------------------------------------------------------------------------| | Rdd=vasrw(Rss, #u5) | <pre>for (i=0;i&lt;2;i++) { Rdd.w[i]=(Rss.w[i]&gt;&gt;#u); }</pre> | | Rdd=vlsrw(Rss, #u5) | <pre>for (i=0;i&lt;2;i++) { Rdd.w[i]=(Rss.uw[i]&gt;&gt;#u); }</pre> | ## Class: XTYPE (slots 2,3) ### **Intrinsics** | Rdd=vaslw(Rss, #u5) | Word64 Q6_P_vaslw_PI(Word64 Rss, Word32 Iu5) | | |---------------------|----------------------------------------------|--| | Rdd=vasrw(Rss,#u5) | Word64 Q6_P_vasrw_PI(Word64 Rss, Word32 Iu5) | | | Rdd=vlsrw(Rss,#u5) | Word64 Q6_P_vlsrw_PI(Word64 Rss, Word32 Iu5) | | ## **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|------|----|---|---|----|---|---|--------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ма | јОр | | | | s5 | | | Pa | rse | | | | | | | N | 1inC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 0 | d | d | d | d | d | Rdd=vasrw(Rss,#u5) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | i | i | i | i | 0 | 0 | 1 | d | d | d | d | d | Rdd=vlsrw(Rss,#u5) | | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | s | s | s | s | s | Р | Р | 0 | i | ï | i | - | i | 0 | 1 | 0 | d | d | d | d | d | Rdd=vaslw(Rss,#u5) | | Field name | | | Description | |------------|--|--|-------------| | | | | | | ICLASS | instruction class | |--------|----------------------------| | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | MaiOp | Major opcode | MinOp Minor opcode RegType Register type ## Vector shift words by register The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If the shift amount is negative, the direction of the shift is reversed. Shift the source values right or left based on the shift amount and the type of instruction. Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right shifts place zeros in the vacated positions. ``` Syntax Behavior Rdd=vaslw(Rss,Rt) for (i=0;i<2;i++) { Rdd.w[i] = (sxt_{7->32}(Rt)>0)?(sxt_{32->64}(Rss.w[i]) << sxt_{7-} >32 (Rt)): (sxt<sub>32->64</sub> (Rss.w[i])>>sxt<sub>7->32</sub> (Rt)); Rdd=vasrw(Rss,Rt) for (i=0;i<2;i++) { Rdd.w[i] = (sxt_{7->32}(Rt)>0)?(sxt_{32->64}(Rss.w[i])>> sxt_{7-} _{>32} (Rt)): (sxt_{32->64} (Rss.w[i]) <<sxt_{7->32} (Rt)); for (i=0;i<2;i++) { Rdd=vlslw(Rss,Rt) Rdd.w[i] = (sxt_{7->32}(Rt)>0)?(zxt_{32->64}(Rss.uw[i]) << sxt_{7-} >32 (Rt)): (zxt<sub>32->64</sub> (Rss.uw[i])>>>sxt<sub>7->32</sub> (Rt)); Rdd=vlsrw(Rss,Rt) for (i=0;i<2;i++) { Rdd.w[i] = (sxt_{7->32}(Rt)>0)?(zxt_{32->64}(Rss.uw[i])>>> sxt_{7-} _{>32} (Rt)): (zxt_{32->64} (Rss.uw[i]) << sxt_{7->32} (Rt)); ``` ### Class: XTYPE (slots 2,3) #### **Notes** • If the number of bits to shift is greater than the width of the vector element, the result is either all sign-bits (for arithmetic right shifts) or all zeros for logical and left shifts. ## Intrinsics | Rdd=vaslw(Rss,Rt) | Word64 | Q6_P_vaslw_PR(Word64 Rs | s, Word32 Rt) | |-------------------|--------|-------------------------|---------------| | Rdd=vasrw(Rss,Rt) | Word64 | Q6_P_vasrw_PR(Word64 Rs | s, Word32 Rt) | | Rdd=vlslw(Rss,Rt) | Word64 | Q6_P_vlslw_PR(Word64 Rs | s, Word32 Rt) | | Rdd=vlsrw(Rss,Rt) | Word64 | Q6_P_vlsrw_PR(Word64 Rs | s, Word32 Rt) | ## **Encoding** | 3 | 1 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|------|-----|----|----|-----|-----|----|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-----|---|---|---|----|---|---|-------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | М | aj | | | | s5 | | | Pa | rse | | | | t5 | | | M | lin | | | | d5 | | | | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | | s | s | s | s | s | P | Р | | t | t | t | t | t | 0 | 0 | - | d | d | d | d | d | Rdd=vasrw(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 0 | 1 | - | d | d | d | d | d | Rdd=vlsrw(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 0 | - | d | d | d | d | d | Rdd=vaslw(Rss,Rt) | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | - | s | s | s | s | s | Р | Р | - | t | t | t | t | t | 1 | 1 | - | d | d | d | d | d | Rdd=vlslw(Rss,Rt) | | Field name | | |------------|----------------------------| | ICLASS | Instruction class | | Parse | Packet/loop parse bits | | d5 | Field to encode register d | | s5 | Field to encode register s | | t5 | Field to encode register t | | Maj | Major opcode | | Min | Minor opcode | | RegType | Register type | | | | Description # Vector shift words with truncate and pack Shift individual words of the source vector Rss right by a register or immediate amount. The low 16 bits of each word are packed into destination register Rd. ### Syntax Behavior ## Class: XTYPE (slots 2,3) ### **Intrinsics** ``` Rd=vasrw(Rss,#u5) Word32 Q6_R_vasrw_PI(Word64 Rss, Word32 Iu5) Rd=vasrw(Rss,Rt) Word32 Q6 R vasrw PR(Word64 Rss, Word32 Rt) ``` ### **Encoding** | 3 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | |---|-----|-----|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|----|----|----|---|---|---|-------|----|---|---|----|---|---|-------------------| | | ICL | ASS | 3 | ı | Reg | Тур | е | Ма | jOp | | | | s5 | | | Pa | rse | | | | | | | N | /linC | )p | | | d5 | | | | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | s | s | s | ø | s | Р | Р | 0 | i | i | i | i | i | 0 | 1 | 0 | d | d | d | d | d | Rd=vasrw(Rss,#u5) | | | ICL | ASS | 3 | ı | Reg | Тур | е | | | | | • | s5 | | | Pa | rse | | | | t5 | | | М | lin | | | | d5 | • | • | | | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | - | - | - | s | s | s | s | s | Р | Р | | t | t | t | t | t | 0 | 1 | 0 | d | d | d | d | d | Rd=vasrw(Rss,Rt) | | Field na | me | Description | |----------|----------------------------|-------------| | ICLASS | Instruction class | | | Parse | Packet/loop parse bits | | | d5 | Field to encode register d | | | s5 | Field to encode register s | | | t5 | Field to encode register t | | | MajOp | Major opcode | | | MinOp | Minor opcode | | | Min | Minor opcode | | | RegType | Register type | | | | | | ``` Α abs Rd=abs(Rs)[:sat] 303 Rdd=abs(Rss) 302 add if ([!]Pu[.new]) Rd=add(Rs, \#s8) 158 if ([!]Pu[.new]) Rd=add(Rs,Rt) 158 Rd=add(#u6, mpyi(Rs, #U6)) 443 Rd=add(\#u6,mpyi(Rs,Rt)) 443 Rd=add(Rs, #s16) 137 Rd=add(Rs, add(Ru, #s6)) 304 Rd=add(Rs,Rt) 137 Rd=add(Rs,Rt):sat 137 Rd=add(Rs,Rt):sat:deprecated 306 Rd = add(Rt.[HL], Rs.[HL])[:sat]: << 16308 Rd=add(Rt.L,Rs.[HL])[:sat] 308 Rd=add(Ru, mpyi(#u6:2, Rs)) 443 Rd=add(Ru, mpyi(Rs, #u6)) 443 \texttt{Rdd=add}\,(\texttt{Rs,Rtt})\ 306 Rdd=add(Rss,Rtt,Px):carry 310 Rdd=add(Rss,Rtt) 306 Rdd=add(Rss,Rtt):raw:hi 306 Rdd=add(Rss,Rtt):raw:lo306 Rdd=add(Rss,Rtt):sat 306 Rx += add(Rs, #s8) 304 Rx + = add(Rs, Rt) 304 Rx-=add(Rs, #s8) 304 Rx-=add(Rs,Rt) 304 Ry=add(Ru,mpyi(Ry,Rs)) 444 addasl Rd=addasl(Rt,Rs,#u3) 543 al18 Pd=all8(Ps) 174 allocframe allocframe (\#u11:3) 283 allocframe (Rx, \#u11:3):raw 283 and if ([!]Pu[.new]) Rd=and(Rs,Rt) 163 Pd=and(Ps, and(Pt, [!]Pu)) 180 Pd=and(Pt,[!]Ps) 180 Rd=and(Rs, #s10) 139 Rd=and(Rs,Rt) 139 Rd=and(Rt, \sim Rs) 139 Rdd=and(Rss,Rtt) 312 Rdd=and(Rtt,~Rss) 312 Rx[\&|^]=and(Rs, \sim Rt) 314 Rx[\&|^]=and(Rs,Rt) 314 Rx \mid = and (Rs, #s10) 314 any8 Pd=any8 (Ps) 174 asl Rd=asl(Rs, #u5) 538 Rd=asl(Rs, #u5):sat 550 ``` ``` Rd=asl(Rs,Rt) 551 Rd=asl(Rs,Rt):sat 559 Rdd=asl(Rss, \#u6) 538 Rdd=asl(Rss,Rt) 551 Rx[\&|]=asl(Rs, #u5) 544 Rx[\&|]=asl(Rs,Rt) 556 Rx[+-]=asl(Rs, #u5) 540 Rx[+-]=asl(Rs,Rt) 553 Rx^=asl(Rs, #u5) 544 Rx = add(\#u8, asl(Rx, \#U5)) 540 Rx = and(\#u8, asl(Rx, \#U5)) 544 Rx=or(\#u8,asl(Rx,\#U5)) 544 Rx = sub(#u8, asl(Rx, #U5)) 540 Rxx[\&|]=asl(Rss, #u6) 544 Rxx[\&|]=asl(Rss,Rt) 557 Rxx[+-]=asl(Rss, #u6) 540 Rxx[+-]=asl(Rss,Rt) 553 Rxx^=asl(Rss, #u6) 545 Rxx^=asl(Rss,Rt) 557 aslh if ([!]Pu[.new]) Rd=aslh(Rs) 160 Rd=aslh(Rs) 156 asr Rd=asr(Rs, #u5) 538 Rd=asr(Rs, #u5):rnd 548 Rd=asr(Rs,Rt) 551 Rd=asr(Rs,Rt):sat 559 Rdd=asr(Rss, #u6) 538 Rdd=asr(Rss, #u6):rnd 548 Rdd=asr(Rss,Rt) 551 Rx[\&|] = asr(Rs, #u5) 544 Rx[\&|]=asr(Rs,Rt) 556 Rx[+-] = asr(Rs, #u5) 540 Rx[+-]=asr(Rs,Rt) 553 Rxx[\&|]=asr(Rss, #u6) 545 Rxx[\&|]=asr(Rss,Rt) 557 Rxx[+-]=asr(Rss, #u6) 540 Rxx[+-]=asr(Rss,Rt) 554 Rxx^=asr(Rss,Rt) 557 if ([!]Pu[.new]) Rd=asrh(Rs) 160 Rd=asrh(Rs) 156 Rd=asrrnd(Rs, #u5) 548 Rdd=asrrnd(Rss, #u6) 548 barrier barrier 288 bitsclr Pd=[!]bitsclr(Rs, #u6) 523 Pd=[!]bitsclr(Rs,Rt) 523 bitsplit Rdd=bitsplit(Rs, #u5) 386 ``` ``` Rdd=bitsplit(Rs,Rt) 386 bitsset Pd=[!]bitsset(Rs,Rt) 523 boundscheck Pd=boundscheck(Rs,Rtt) 517 Pd=boundscheck(Rss,Rtt):raw:hi 517 Pd=boundscheck(Rss,Rtt):raw:lo517 brev Rd=brev(Rs) 383 Rdd=brev(Rss) 383 brkpt brkpt 289 C call call #r22:2 186 if ([!]Pu) call #r15:2 186 callr callr Rs 183 if ([!]Pu) callr Rs 183 c10 Rd=cl0(Rs) 371 Rd=cl0(Rss) 371 cl1 Rd=cl1(Rs) 371 Rd=cl1(Rss) 371 clb Rd=add(clb(Rs), #s6) 371 Rd=add(clb(Rss), #s6) 371 Rd=clb(Rs) 371 Rd=clb(Rss) 371 clip Rd=clip(Rs, #u5) 311 clrbit memb (Rs+\#u6:0) = clrbit (\#U5) 242 memh(Rs+#u6:1)=clrbit(#U5) 244 memw(Rs+#u6:2)=clrbit(#U5) 245 Rd=clrbit(Rs, #u5) 384 Rd=clrbit(Rs,Rt) 384 cmp.eq if ([!]cmp.eq(Ns.new,#-1)) jump:<hint> #r9:2 246 if ([!]cmp.eq(Ns.new, #U5)) jump: <hint> #r9:2 246 if ([!]cmp.eq(Ns.new,Rt)) jump:\langle hint \rangle \#r9:2246 p[01] = cmp.eq(Rs, \#-1) 188 p[01] = cmp.eq(Rs, #U5) 188 p[01] = cmp.eq(Rs,Rt) 188 Pd=[!]cmp.eq(Rs, #s10) 169 Pd=[!]cmp.eq(Rs,Rt) 169 Pd=cmp.eq(Rss,Rtt) 522 Rd=[!]cmp.eq(Rs, #s8) 171 Rd=[!]cmp.eq(Rs,Rt) 171 cmp.ge Pd=cmp.ge(Rs, #s8) 169 ``` ``` cmp.qeu Pd=cmp.geu(Rs, #u8) 169 cmp.qt if ([!]cmp.qt(Ns.new, \#-1)) jump:<hint> \#r9:2 246 if ([!]cmp.gt(Ns.new, #U5)) jump: <hint> #r9:2 246 if ([!]cmp.gt(Ns.new,Rt)) jump:<hint> #r9:2 246 if ([!]cmp.gt(Rt, Ns.new)) jump:<hint> #r9:2 247 p[01] = cmp.gt(Rs, #-1) 188 p[01] = cmp.qt(Rs, #U5) 188 p[01] = cmp.gt(Rs,Rt) 188 Pd=[!]cmp.gt(Rs, #s10) 169 Pd=[!]cmp.gt(Rs,Rt) 169 Pd=cmp.qt(Rss,Rtt) 522 cmp.gtu if ([!]cmp.gtu(Ns.new, #U5)) jump:<hint> #r9:2 247 if ([!]cmp.gtu(Ns.new,Rt)) jump:<hint> #r9:2 247 if ([!]cmp.qtu(Rt, Ns.new)) jump:<hint> #r9:2 247 p[01] = cmp.qtu(Rs, #U5) 189 p[01]=cmp.qtu(Rs,Rt) 189 Pd=[!]cmp.qtu(Rs, #u9) 169 Pd=[!]cmp.gtu(Rs,Rt) 169 Pd=cmp.gtu(Rss,Rtt) 522 cmp.lt Pd=cmp.lt(Rs,Rt) 169 cmp.ltu Pd=cmp.ltu(Rs,Rt) 169 cmpb.ea Pd=cmpb.eq(Rs, \#u8) 518 Pd=cmpb.eq(Rs,Rt) 518 cmpb.qt Pd=cmpb.gt(Rs, #s8) 518 Pd=cmpb.qt(Rs,Rt) 518 cmpb.gtu Pd=cmpb.gtu(Rs, #u7) 518 Pd=cmpb.gtu(Rs,Rt) 518 cmph.eq Pd=cmph.eq(Rs, \#s8) 520 Pd=cmph.eq(Rs,Rt) 520 cmph.at Pd=cmph.gt(Rs, #s8) 520 Pd=cmph.gt(Rs,Rt) 520 cmph.gtu Pd=cmph.gtu(Rs, \#u7) 520 Pd=cmph.gtu(Rs,Rt) 520 cmpy Rd=cmpy(Rs,Rt)[:<<1]:rnd:sat 400 Rd=cmpy(Rs,Rt*)[:<<1]:rnd:sat 400 Rdd=cmpy(Rs,Rt)[:<<1]:sat 395 Rdd=cmpy(Rs,Rt*)[:<<1]:sat 395 Rxx+=cmpy(Rs,Rt)[:<<1]:sat 396 Rxx+=cmpy(Rs,Rt*)[:<<1]:sat 396 Rxx-=cmpy(Rs,Rt)[:<<1]:sat 396 Rxx-=cmpy(Rs,Rt*)[:<<1]:sat 396 ``` ``` cmpyi Rdd=cmpyi(Rs,Rt) 398 Rxx+=cmpyi(Rs,Rt) 398 cmpyiw Rd=cmpyiw(Rss,Rtt):<<1:rnd:sat 404 Rd=cmpyiw(Rss,Rtt):<<1:sat 404 Rd=cmpyiw(Rss,Rtt*):<<1:rnd:sat 404 Rd=cmpyiw(Rss,Rtt*):<<1:sat 404 Rdd=cmpyiw(Rss,Rtt) 405 Rdd=cmpyiw(Rss,Rtt*) 405 Rxx+=cmpyiw(Rss,Rtt) 405 Rxx+=cmpyiw(Rss,Rtt*) 405 cmpyiwh Rd=cmpyiwh(Rss,Rt):<<1:rnd:sat 402 Rd=cmpyiwh(Rss,Rt*):<<1:rnd:sat 402 cmpyr Rdd=cmpyr(Rs,Rt) 398 Rxx+=cmpyr(Rs,Rt) 398 cmpyrw Rd=cmpyrw(Rss,Rtt):<<1:rnd:sat 404 Rd=cmpyrw(Rss,Rtt):<<1:sat 404 Rd=cmpyrw(Rss,Rtt*):<<1:rnd:sat 405 Rd=cmpyrw(Rss,Rtt*):<<1:sat 405 Rdd=cmpyrw(Rss,Rtt) 405 Rdd=cmpyrw(Rss,Rtt*) 405 Rxx+=cmpyrw(Rss,Rtt) 405 Rxx+=cmpyrw(Rss,Rtt*) 405 cmpyrwh Rd=cmpyrwh(Rss,Rt):<<1:rnd:sat 402 Rd=cmpyrwh(Rss,Rt*):<<1:rnd:sat 402 combine if ([!]Pu[.new]) Rdd=combine(Rs,Rt) 162 Rd=combine(Rt.[HL],Rs.[HL]) 152 Rdd=combine(#s8, #S8) 152 Rdd=combine(\#s8,\#U6) 152 Rdd=combine(\#s8,Rs) 152 Rdd=combine (Rs, \#s8) 152 Rdd=combine(Rs,Rt) 152 convert d2df Rdd=convert d2df(Rss) 427 convert d2sf Rd=convert d2sf(Rss) 427 convert df2d Rdd=convert df2d(Rss) 429 Rdd=convert df2d(Rss):chop 429 convert df2sf Rd=convert df2sf(Rss) 426 convert df2ud Rdd=convert df2ud(Rss) 429 Rdd=convert df2ud(Rss):chop 429 convert df2uw Rd=convert df2uw(Rss) 429 Rd=convert df2uw(Rss):chop 429 ``` ``` convert df2w Rd=convert df2w(Rss) 429 Rd=convert df2w(Rss):chop 429 convert sf2d Rdd=convert_sf2d(Rs) 429 Rdd=convert_sf2d(Rs):chop 429 convert sf2df Rdd=convert sf2df(Rs) 426 convert sf2ud Rdd=convert sf2ud(Rs) 429 Rdd=convert sf2ud(Rs):chop 429 convert sf2uw Rd=convert sf2uw(Rs) 429 Rd=convert_sf2uw(Rs):chop 429 convert sf2w Rd=convert sf2w(Rs) 429 Rd=convert sf2w(Rs):chop 429 convert ud2df Rdd=convert ud2df(Rss) 427 convert ud2sf Rd=convert ud2sf(Rss) 427 convert uw2df Rdd=convert uw2df(Rs) 427 convert uw2sf Rd=convert uw2sf(Rs) 427 convert w2df Rdd=convert_w2df(Rs) 427 convert_w2sf Rd=convert w2sf(Rs) 427 cround Rd=cround(Rs, #u5) 322 Rd=cround(Rs,Rt) 322 Rdd=cround(Rss, #u6) 322 Rdd=cround(Rss,Rt) 323 ct0 Rd=ct0(Rs) 374 Rd=ct0(Rss) 374 ct1 Rd=ct1(Rs) 374 Rd=ct1(Rss) 374 D dccleana dccleana(Rs) 291 dccleaninva dccleaninva (Rs) 291 dcfetch \texttt{dcfetch}\,(\texttt{Rs})\ 290 dcfetch(Rs+#u11:3) 290 dcinva dcinva(Rs) 291 dczeroa dczeroa (Rs) 287 ``` ``` deallocframe deallocframe 230 Rdd=deallocframe(Rs):raw 230 dealloc return dealloc return 232 if ([!]Pv.new) Rdd=dealloc return(Rs):nt:raw 232 if ([!]Pv.new) Rdd=dealloc return(Rs):t:raw 232 if ([!]Pv) dealloc return 232 if ([!]Pv) Rdd=dealloc return(Rs):raw 232 if ([!]Pv.new) dealloc return:nt 232 Rdd=dealloc_return(Rs):raw 232 if ([!]Pv.new) dealloc return:t 232 decbin Rdd=decbin(Rss,Rtt) 494 deinterleave Rdd=deinterleave(Rss) 380 dfadd Rdd=dfadd(Rss,Rtt) 422 dfclass Pd=dfclass(Rss, #u5) 423 dfcmp.eq Pd=dfcmp.eq(Rss,Rtt) 424 dfcmp.ge Pd=dfcmp.ge(Rss,Rtt) 424 dfcmp.gt Pd=dfcmp.gt(Rss,Rtt) 424 dfcmp.uo Pd=dfcmp.uo(Rss,Rtt) 424 dfmake Rdd=dfmake(#u10):neg 437 Rdd=dfmake(#u10):pos 437 dfmax Rdd=dfmax(Rss,Rtt) 438 dfmin Rdd=dfmin(Rss,Rtt) 439 dfmpyfix Rdd=dfmpyfix(Rss,Rtt) 440 dfmpyhh Rxx+=dfmpyhh(Rss,Rtt) 432 dfmpylh Rxx+=dfmpylh(Rss,Rtt) 432 dfmpyll Rdd=dfmpyll(Rss,Rtt) 440 dfsub Rdd=dfsub(Rss,Rtt) 442 diag diag(Rs) 293 diag0 diag0(Rss,Rtt) 293 diag1 ``` ``` diag1(Rss,Rtt) 293 dmsyncht Rd=dmsyncht 299 Ε endloop0 endloop0 172 endloop01 endloop01 172 endloop1 endloop1 172 extract Rd=extract(Rs, #u5, #U5) 375 Rd=extract(Rs,Rtt) 375 Rdd=extract(Rss, #u6, #U6) 376 Rdd=extract(Rss,Rtt) 376 extractu Rd=extractu(Rs, #u5, #U5) 375 Rd=extractu(Rs,Rtt) 375 Rdd=extractu(Rss, #u6, #U6) 376 Rdd=extractu(Rss,Rtt) 376 fastcorner9 Pd=[!]fastcorner9(Ps,Pt) 173 Н hintjr hintjr(Rs) 184 icinva icinva(Rs) 294 if ([!]p[01].new) jump:<hint> #r9:2 188, 189 insert Rx=insert(Rs, #u5, #U5) 378 Rx=insert(Rs,Rtt) 378 Rxx=insert (Rss, \#u6, \#u6) 378 Rxx=insert(Rss,Rtt) 379 interleave Rdd=interleave(Rss) 380 isync isync 295 jump if ([!]Pu.new) jump:<hint> #r15:2 193 if ([!]Pu) jump #r15:2 192 if ([!]Pu) jump:<hint> #r15:2 192 jump #r22:2 192 nt if (Rs!=#0) jump:nt #r13:2 194 if (Rs<=#0) jump:nt #r13:2 194 if (Rs==#0) jump:nt #r13:2 194 if (Rs>=#0) jump:nt #r13:2 194 Rd=#U6 196 ``` ``` Rd=Rs 196 if (Rs!=#0) jump:t #r13:2 194 if (Rs<=#0) jump:t #r13:2 194 if (Rs==#0) jump:t #r13:2 194 if (Rs \ge \#0) jump:t \#r13:2194 jump #r9:2 196 jumpr if ([!]Pu) jumpr Rs 185 if ([!]Pu[.new]) jumpr:<hint> Rs 185 jumpr Rs 185 L 12fetch 12fetch (Rs, Rt) 297 12fetch (Rs, Rtt) 297 lfs Rdd=lfs(Rss,Rtt) 381 loop0 loop0(#r7:2, #U10) 175 loop0(#r7:2,Rs) 175 loop1(#r7:2, #U10) 175 loop1(#r7:2,Rs) 175 lsl Rd=ls1(#s6,Rt) 551 Rd=lsl(Rs,Rt) 551 Rdd=lsl(Rss,Rt) 552 Rx[\&|]=lsl(Rs,Rt) 556 Rx[+-]=lsl(Rs,Rt) 553 Rxx[\&|]=lsl(Rss,Rt) 557 Rxx[+-]=lsl(Rss,Rt) 554 Rxx^=lsl(Rss,Rt) 557 lsr Rd=lsr(Rs, #u5) 538 Rd=lsr(Rs,Rt) 551 Rdd=lsr(Rss, \#u6) 538 Rdd=lsr(Rss,Rt) 552 Rx[\&|] = lsr(Rs, #u5) 544 Rx[\&|]=lsr(Rs,Rt) 556 Rx[+-]=lsr(Rs, #u5) 540 Rx[+-]=lsr(Rs,Rt) 553 Rx^=lsr(Rs, #u5) 544 Rx = add(\#u8, lsr(Rx, \#U5)) 540 Rx = and(\#u8, lsr(Rx, \#U5)) 544 Rx=or(\#u8,lsr(Rx,\#U5)) 544 Rx = sub(#u8, lsr(Rx, #U5)) 540 Rxx[\&|]=lsr(Rss, #u6) 545 Rxx[\&|]=lsr(Rss,Rt) 557 Rxx[+-]=lsr(Rss, #u6) 540 Rxx[+-]=lsr(Rss,Rt) 554 Rxx^=lsr(Rss, #u6) 545 Rxx^=lsr(Rss,Rt) 557 ``` ``` M mask Rd=mask(#u5, #U5) 537 Rdd=mask(Pt) 524 max Rd=max(Rs,Rt) 316 Rdd=max(Rss,Rtt) 317 maxu Rd=maxu(Rs,Rt) 316 Rdd=maxu(Rss,Rtt) 317 memb if ([!]Pt[.new]) Rd=memb(\#u6) 204 if ([!]Pt[.new]) Rd=memb(Rs+\#u6:0) 204 if ([!]Pt[.new]) Rd=memb(Rx++\#s4:0) 204 if ([!]Pv[.new]) memb(#u6)=Nt.new 252 if ([!]Pv[.new]) memb(\#u6)=Rt 269 if ([!]Pv[.new]) memb(Rs+\#u6:0)=\#S6 269 if ([!]Pv[.new]) memb(Rs+\#u6:0)=Nt.new 252 if ([!]Pv[.new]) memb(Rs+\#u6:0)=Rt 269 if ([!]Pv[.new]) memb(Rs+Ru<<\#u2)=Nt.new 252 if ([!]Pv[.new]) memb(Rs+Ru<<#u2)=Rt 269 if ([!]Pv[.new]) memb(Rx++\#s4:0)=Nt.new 252 if ([!]Pv[.new]) memb(Rx++\#s4:0)=Rt 269 if ([!]Pv[.new]) Rd=memb(Rs+Rt<<\#u2) 204 memb (qp+\#u16:0) =Nt.new 250 memb (qp+\#u16:0) =Rt 267 memb (Re=\#U6)=Nt.new 250 memb (Re=\#U6) =Rt 267 memb (Rs+\#s11:0) =Nt.new 250 memb (Rs+\#s11:0) =Rt 267 memb (Rs+\#u6:0) [+-]=\#U5 242 memb (Rs+\#u6:0) [+-|&]=Rt 242 memb (Rs+\#u6:0) =\#S8 267 memb (Rs+Ru<<\#u2) =Nt.new 250 memb (Rs+Ru<<#u2) =Rt 267 memb (Ru<<$\mu2+$\mu06) =Nt.new 250 memb (Ru << \#u2 + \#U6) = Rt 267 memb (Rx++#s4:0:circ(Mu)) =Nt.new 250 memb (Rx++#s4:0:circ(Mu)) =Rt 267 memb (Rx++#s4:0) = Nt.new 250 memb (Rx++#s4:0) = Rt 267 memb (Rx++1:circ(Mu)) =Nt.new 250 memb (Rx++I:circ(Mu)) = Rt 267 memb(Rx++Mu:brev)=Nt.new 250 memb(Rx++Mu:brev)=Rt 267 memb (Rx++Mu) = Nt. new 250 memb (Rx++Mu) = Rt 267 Rd=memb(qp+#u16:0) 202 Rd=memb(Re=\#U6) 202 Rd=memb (Rs+\#s11:0) 202 Rd=memb(Rs+Rt<<\#u2) 202 Rd=memb (Rt<<\mu2+\mu06) 202 Rd=memb(Rx++\#s4:0:circ(Mu)) 202 ``` ``` Rd=memb(Rx++#s4:0) 202 Rd=memb(Rx++I:circ(Mu)) 202 Rd=memb(Rx++Mu:brev) 202 Rd=memb(Rx++Mu) 202 memb fifo Ryy=memb fifo(Re=\#U6) 206 Ryy=memb fifo(Rs) 206 Ryy=memb fifo(Rs+\#s11:0) 206 Ryy=memb fifo(Rt<<\mu2+\mu06) Ryy=memb fifo(Rx++#s4:0:circ(Mu)) 206 Ryy=memb fifo(Rx++#s4:0) 206 Ryy=memb fifo(Rx++I:circ(Mu)) 207 Ryy=memb fifo(Rx++Mu:brev) 207 Ryy=memb fifo(Rx++Mu) 207 membh Rd=membh (Re=\#U6) 234 Rd=membh (Rs) 234 Rd=membh (Rs+#s11:1) 235 Rd=membh (Rt<<\#u2+\#U6) 235 Rd=membh(Rx++#s4:1:circ(Mu)) 235 Rd=membh (Rx++#s4:1) 235 Rd=membh(Rx++I:circ(Mu)) 235 Rd=membh(Rx++Mu:brev) 236 Rd=membh(Rx++Mu) 235 Rdd=membh (Re=\#U6) 237 Rdd=membh(Rs) 237 Rdd=membh (Rs+#s11:2) 237 Rdd=membh (Rt<<#u2+#U6) 237 Rdd=membh(Rx++\#s4:2:circ(Mu)) 238 Rdd=membh (Rx++#s4:2) 238 Rdd=membh(Rx++I:circ(Mu)) 238 Rdd=membh(Rx++Mu:brev) 238 Rdd=membh (Rx++Mu) 238 memd if ([!]Pt[.new]) Rdd=memd(\#u6) 200 if ([!]Pt[.new]) Rdd=memd(Rs+\#u6:3) 200 if ([!]Pt[.new]) Rdd=memd(Rx++\#s4:3) 200 if ([!]Pv[.new]) memd(\#u6)=Rtt 265 if ([!]Pv[.new]) memd(Rs+\#u6:3)=Rtt 265 if ([!]Pv[.new]) memd(Rs+Ru<<#u2)=Rtt 265 if ([!]Pv[.new]) memd(Rx++\#s4:3)=Rtt 265 if ([!]Pv[.new]) Rdd=memd(Rs+Rt<<\#u2) 200 memd (gp+#u16:3) =Rtt 262 memd (Re=\#U6) =Rtt 262 memd (Rs+\#s11:3) =Rtt 262 memd(Rs+Ru<<\#u2)=Rtt 262 memd (Ru << \#u2+\#U6) = Rtt 262 memd(Rx++#s4:3:circ(Mu))=Rtt 262 memd (Rx++#s4:3) = Rtt 262 memd(Rx++I:circ(Mu))=Rtt 262 memd (Rx++Mu:brev) =Rtt 262 memd(Rx++Mu)=Rtt 262 Rdd=memd(gp+#u16:3) 197 ``` ``` Rdd=memd(Re=\#U6) 197 Rdd=memd(Rs+#s11:3) 197 Rdd=memd(Rs+Rt<<\#u2) 197 Rdd=memd(Rt<<#u2+#U6) 197 Rdd=memd(Rx++#s4:3:circ(Mu)) 197 Rdd=memd (Rx++#s4:3) 197 Rdd=memd(Rx++I:circ(Mu)) 197 Rdd=memd(Rx++Mu:brev) 197 Rdd=memd(Rx++Mu) 197 memd aq Rdd=memd aq(Rs) 199 memd locked memd locked (Rs, Pd) = Rtt 286 Rdd=memd locked(Rs) 285 memd rl memd rl(Rs):at=Rtt 264 memd rl(Rs):st=Rtt 264 memh if ([!]Pt[.new]) Rd=memh(\#u6) 214 if ([!]Pt[.new]) Rd=memh(Rs+#u6:1) 214 if ([!]Pt[.new]) Rd=memh(Rx++\#s4:1) 214 if ([!]Pv[.new]) memh(\#u6)=Nt.new 256 if ([!]Pv[.new]) memh(\#u6)=Rt 274 if ([!]Pv[.new]) memh(#u6)=Rt.H 274 if ([!]Pv[.new]) memh(Rs+\#u6:1)=\#S6 274 if ([!]Pv[.new]) memh(Rs+#u6:1)=Nt.new 256 if ([!]Pv[.new]) memh(Rs+\#u6:1)=Rt 274 if ([!]Pv[.new]) memh(Rs+\#u6:1)=Rt.H 274 if ([!]Pv[.new]) memh(Rs+Ru<<#u2)=Nt.new 256 if ([!]Pv[.new]) memh(Rs+Ru<<#u2)=Rt 275 if ([!]Pv[.new]) memh(Rs+Ru<<#u2)=Rt.H 274 if ([!]Pv[.new]) memh(Rx++#s4:1)=Nt.new 256 if ([!]Pv[.new]) memh(Rx++#s4:1)=Rt 275 if ([!]Pv[.new]) memh (Rx++\#s4:1)=Rt.H 275 if ([!]Pv[.new]) Rd=memh(Rs+Rt<<#u2) 214 memh (gp+#u16:1) =Nt.new 254 memh (gp+#u16:1) =Rt 272 memh (gp+#u16:1) =Rt.H 272 memh (Re=\#U6)=Nt.new 254 memh (Re=\#U6) =Rt 271 memh (Re=\#U6) =Rt.H 271 memh (Rs+\#s11:1) =Nt.new 254 memh (Rs+\#s11:1) =Rt 271 memh (Rs+\#s11:1) =Rt.H 271 memh (Rs+\#u6:1) [+-]=\#U5 244 memh (Rs+\#u6:1) [+-|&]=Rt 244 memh (Rs+\#u6:1) =\#S8 271 memh (Rs+Ru<<#u2)=Nt.new 254 memh (Rs+Ru<<#u2)=Rt 271 memh (Rs+Ru<<#u2) =Rt.H 271 memh (Ru << \#u2 + \#u6) = Nt. new 254 memh (Ru << #u2 + #U6) = Rt 271 memh (Ru<<#u2+#U6) =Rt.H 271 ``` ``` memh (Rx++\#s4:1:circ(Mu))=Nt.new 254 memh (Rx++\#s4:1:circ(Mu))=Rt 271 memh (Rx++#s4:1:circ(Mu)) =Rt.H 271 memh (Rx++\#s4:1) = Nt.new 254 memh (Rx++#s4:1) = Rt 271 memh (Rx++#s4:1) = Rt.H 271 memh(Rx++I:circ(Mu))=Nt.new 254 memh(Rx++I:circ(Mu))=Rt 272 memh (Rx++1:circ(Mu))=Rt.H 271 memh (Rx++Mu:brev) =Nt.new 254 memh (Rx++Mu:brev) =Rt 272 memh (Rx++Mu:brev) =Rt.H 272 memh(Rx++Mu)=Nt.new 254 memh(Rx++Mu)=Rt 272 memh (Rx++Mu)=Rt.H 272 Rd=memh (qp+\#u16:1) 212 Rd=memh(Re=\#U6) 212 Rd=memh (Rs+#s11:1) 212 Rd=memh(Rs+Rt<<\#u2) 212 Rd=memh (Rt<<\#u2+\#U6) 212 Rd=memh(Rx++#s4:1:circ(Mu)) 212 Rd=memh(Rx++#s4:1) 212 Rd=memh(Rx++I:circ(Mu)) 212 Rd=memh (Rx++Mu:brev) 212 Rd=memh (Rx++Mu) 212 memh fifo Ryy=memh fifo(Re=\#U6) 209 Ryy=memh fifo(Rs) 209 Ryy=memh fifo(Rs+\#s11:1) 209 Ryy=memh fifo(Rt<<\mu2+\mu06) 209 Ryy=memh fifo(Rx++#s4:1:circ(Mu)) 209 Ryy=memh fifo(Rx++#s4:1) 209 Ryy=memh fifo(Rx++I:circ(Mu)) 210 Ryy=memh fifo(Rx++Mu:brev) 210 Ryy=memh fifo(Rx++Mu) 210 memub if ([!]Pt[.new]) Rd=memub(#u6) 219 if ([!]Pt[.new]) Rd=memub(Rs+#u6:0) 219 if ([!]Pt[.new]) Rd=memub(Rx++#s4:0) 219 if ([!]Pv[.new]) Rd=memub(Rs+Rt<<\#u2) 219 Rd=memub(gp+#u16:0) 217 Rd=memub (Re=\#U6) 217 Rd=memub(Rs+#s11:0) 217 Rd=memub (Rs+Rt<<\#u2) 217 Rd=memub (Rt<<#u2+#U6) 217 Rd=memub(Rx++\#s4:0:circ(Mu)) 217 Rd=memub (Rx++#s4:0) 217 Rd=memub(Rx++I:circ(Mu)) 217 Rd=memub(Rx++Mu:brev) 217 Rd=memub(Rx++Mu) 217 memubh Rd=memubh (Re=\#U6) 236 Rd=memubh (Rs+#s11:1) 236 ``` ``` Rd=memubh (Rt<<\#u2+\#U6) 236 Rd=memubh (Rx++#s4:1:circ(Mu)) 236 Rd=memubh (Rx++#s4:1) 236 Rd=memubh (Rx++I:circ(Mu)) 237 Rd=memubh (Rx++Mu:brev) 237 Rd=memubh(Rx++Mu) 237 Rdd=memubh (Re=#U6) 238 Rdd=memubh (Rs+\#s11:2) 239 Rdd=memubh (Rt<<\#u2+\#U6) 239 Rdd=memubh (Rx++#s4:2:circ(Mu)) 239 Rdd=memubh (Rx++#s4:2) 239 Rdd=memubh (Rx++1:circ(Mu)) 239 Rdd=memubh(Rx++Mu:brev) 240 Rdd=memubh (Rx++Mu) 239 memuh if ([!]Pt[.new]) Rd=memuh(#u6) 223 if ([!]Pt[.new]) Rd=memuh(Rs+#u6:1) 223 if ([!]Pt[.new]) Rd=memuh(Rx++#s4:1) 223 if ([!]Pv[.new]) Rd=memuh(Rs+Rt<<#u2) 223 Rd=memuh(gp+#u16:1) 221 Rd=memuh (Re=\#U6) 221 Rd=memuh (Rs+#s11:1) 221 Rd=memuh (Rs+Rt<<\#u2) 221 Rd=memuh (Rt<<#u2+#U6) 221 Rd=memuh(Rx++#s4:1:circ(Mu)) 221 Rd=memuh(Rx++#s4:1) 221 Rd=memuh (Rx++I:circ(Mu)) 221 Rd=memuh(Rx++Mu:brev) 221 Rd=memuh(Rx++Mu) 221 memw if ([!]Pt[.new]) Rd=memw(#u6) 228 if ([!]Pt[.new]) Rd=memw(Rs+#u6:2) 228 if ([!]Pt[.new]) Rd=memw(Rx++\#s4:2) 228 if ([!]Pv[.new]) memw(\#u6)=Nt.new 260 if ([!]Pv[.new]) memw(\#u6)=Rt 281 if ([!]Pv[.new]) memw(Rs+\#u6:2)=\#S6 281 if ([!]Pv[.new]) memw(Rs+\#u6:2)=Nt.new 260 if ([!]Pv[.new]) memw(Rs+\#u6:2)=Rt 281 if ([!]Pv[.new]) memw(Rs+Ru<<#u2)=Nt.new 260 if ([!]Pv[.new]) memw(Rs+Ru<<#u2)=Rt 281 if ([!]Pv[.new]) memw(Rx++\#s4:2)=Nt.new 260 if ([!]Pv[.new]) memw(Rx++\#s4:2)=Rt 281 if ([!]Pv[.new]) Rd=memw(Rs+Rt<<#u2) 228 memw (qp+\#u16:2) =Nt.new 258 memw (gp+\#u16:2) =Rt 278 memw (Re=\#U6) =Nt.new 258 memw (Re=\#U6) =Rt 278 memw (Rs+\#s11:2) =Nt.new 258 memw (Rs+\#s11:2) =Rt 278 memw (Rs+\#u6:2) [+-]=\#U5 245 memw (Rs+\#u6:2) [+-|&]=Rt 245 memw (Rs+\#u6:2) =\#S8 278 memw (Rs+Ru<<\#u2) =Nt.new 258 memw (Rs+Ru<<\#u2) =Rt 278 ``` ``` memw (Ru << \#u2 + \#U6) = Nt. new 258 memw (Ru << \#u2 + \#U6) = Rt 278 memw(Rx++#s4:2:circ(Mu))=Nt.new 258 memw(Rx++#s4:2:circ(Mu))=Rt 278 memw(Rx++#s4:2)=Nt.new 258 memw (Rx++#s4:2) = Rt 278 memw(Rx++I:circ(Mu))=Nt.new 258 memw(Rx++I:circ(Mu))=Rt 278 memw (Rx++Mu:brev) =Nt.new 258 memw (Rx++Mu:brev) =Rt 278 memw (Rx++Mu) = Nt. new 258 memw(Rx++Mu)=Rt 278 Rd=memw(gp+#u16:2) 225 Rd=memw (Re=#U6) 225 Rd=memw(Rs+#s11:2) 225 Rd=memw(Rs+Rt<<\#u2) 225 Rd=memw(Rt<<\#u2+\#U6) 225 Rd=memw(Rx++#s4:2:circ(Mu)) 225 Rd=memw(Rx++#s4:2) 225 Rd=memw(Rx++I:circ(Mu)) 225 Rd=memw(Rx++Mu:brev) 225 Rd=memw(Rx++Mu) 225 memw aq Rd=memw aq(Rs) 227 memw locked memw locked (Rs, Pd) = Rt 286 Rd=memw locked(Rs) 285 memw rl memw rl(Rs):at=Rt 280 memw rl(Rs):st=Rt 280 min Rd=min(Rt,Rs) 318 Rdd=min(Rtt,Rss) 319 minu Rd=minu(Rt,Rs) 318 Rdd=minu(Rtt,Rss) 319 modwrap Rd=modwrap(Rs,Rt) 320 mpy Rd=mpy(Rs,Rt.H):<<1:rnd:sat 469 Rd = mpy(Rs, Rt.H) : <<1: sat 469 Rd=mpy(Rs,Rt.L):<<1:rnd:sat 469 Rd=mpy(Rs,Rt.L):<<1:sat 469 Rd=mpy(Rs,Rt) 469 Rd=mpy(Rs,Rt):<<1469 Rd=mpy(Rs,Rt):<<1:sat 469 Rd=mpy(Rs,Rt):rnd469 Rd=mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd][:sat] 454 Rdd=mpy(Rs,Rt) 471 Rdd=mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd] 454 Rx + = mpy(Rs, Rt) : <<1 : sat 469 Rx+=mpy(Rs.[HL],Rt.[HL])[:<<1][:sat] 454 Rx-=mpy(Rs,Rt):<<1:sat 469 ``` ``` Rx-=mpy(Rs.[HL],Rt.[HL])[:<<1][:sat] 454 Rxx[+-]=mpy(Rs,Rt) 471 Rxx+=mpy(Rs.[HL],Rt.[HL])[:<<1] 454 Rxx-=mpy(Rs.[HL],Rt.[HL])[:<<1] 454 mpyi Rd=+mpyi(Rs, #u8) 443 Rd=mpyi(Rs, #m9) 444 Rd=-mpyi(Rs, #u8) 443 Rd=mpyi(Rs,Rt) 444 Rx+=mpyi(Rs, #u8) 444 Rx + = mpyi(Rs, Rt) 444 Rx-=mpyi(Rs, #u8) 444 Rx-=mpyi(Rs,Rt) 444 mpysu Rd=mpysu(Rs,Rt) 469 mpyu Rd=mpyu(Rs,Rt) 469 Rd=mpyu(Rs.[HL],Rt.[HL])[:<<1] 461 Rdd=mpyu(Rs,Rt) 471 Rdd=mpyu(Rs.[HL],Rt.[HL])[:<<1] 461 Rx+=mpyu(Rs.[HL],Rt.[HL])[:<<1] 461 Rx-=mpyu(Rs.[HL],Rt.[HL])[:<<1] 461 Rxx[+-]=mpyu(Rs,Rt) 471 Rxx+=mpyu(Rs.[HL],Rt.[HL])[:<<1] 461 Rxx-=mpyu(Rs.[HL],Rt.[HL])[:<<1] 461 mpyui Rd=mpyui(Rs,Rt) 444 mux Rd=mux(Pu, #s8, #S8) 154 Rd=mux(Pu, #s8, Rs) 154 Rd=mux(Pu,Rs,#s8) 154 Rd=mux(Pu,Rs,Rt) 154 Ν neq Rd=neg(Rs) 141 Rd=neg(Rs):sat 321 Rdd=neg(Rss) 321 no mnemonic Cd=Rs 182 Cdd=Rss 182 if ([!]Pu[.new]) Rd=#s12 167 if ([!]Pu[.new]) Rd=Rs 167 if ([!]Pu[.new]) Rdd=Rss 167 Pd=Ps 180 Pd=Rs 526 Rd=#s16 145 Rd=Cs 182 Rd=Ps 526 Rd=Rs 147 Rdd=#s8 145 Rdd=Css 182 Rdd=Rss 147 Rx.[HL]=#u16 145 ``` ``` nop nop 142 normamt Rd=normamt(Rs) 371 Rd=normamt(Rss) 371 not Pd=not(Ps) 180 Rd=not (Rs) 139 Rdd=not(Rss) 312 0 or if ([!]Pu[.new]) Rd=or(Rs,Rt) 163 Pd=and(Ps,or(Pt,[!]Pu)) 180 Pd=or(Ps, and(Pt,[!]Pu)) 180 Pd=or(Ps,or(Pt,[!]Pu)) 180 Pd=or(Pt,[!]Ps) 180 Rd=or(Rs, #s10) 139 Rd=or(Rs,Rt) 139 Rd=or(Rt, \sim Rs) 139 Rdd=or(Rss,Rtt) 312 Rdd=or(Rtt,~Rss) 312 Rx[\&|^]=or(Rs,Rt) 314 Rx=or(Ru, and(Rx, #s10)) 314 Rx = or(Rs, #s10) 314 Р packhl Rdd=packhl(Rs,Rt) 157 parity Rd=parity(Rs,Rt) 382 Rd=parity(Rss,Rtt) 382 pause pause(#u8) 298 Rd=add(pc, #u6) 177 pmpyw Rdd=pmpyw(Rs,Rt) 465 Rxx^=pmpyw(Rs,Rt) 465 popcount Rd=popcount (Rss) 373 R release release (Rs): at 277 release (Rs):st 277 rol Rd=rol(Rs, #u5) 538 Rdd=rol(Rss, #u6) 538 Rx[\&|] = rol(Rs, #u5) 544 Rx[+-]=rol(Rs, #u5) 540 Rx^-=rol(Rs, #u5) 544 Rxx[\&|]=rol(Rss, #u6) 545 Rxx[+-]=rol(Rss, #u6) 540 Rxx^=rol(Rss, #u6) 545 ``` ``` round Rd=round(Rs, #u5)[:sat] 322 Rd=round(Rs,Rt)[:sat] 322 Rd=round(Rss):sat 322 S sat Rd=sat(Rss) 496 satb Rd=satb(Rs) 496 sath Rd=sath(Rs) 496 satub Rd=satub(Rs) 496 satuh Rd=satuh(Rs) 496 setbit memb (Rs+\#u6:0) = setbit (\#U5) 242 memh (Rs+\#u6:1) = setbit (\#U5) 244 memw(Rs+#u6:2)=setbit(#U5) 245 Rd=setbit(Rs, #u5) 384 Rd=setbit(Rs,Rt) 384 sfadd Rd=sfadd(Rs,Rt) 422 sfclass Pd=sfclass(Rs, #u5) 423 sfcmp.eq Pd=sfcmp.eq(Rs,Rt) 424 sfcmp.ge Pd=sfcmp.ge(Rs,Rt) 424 sfcmp.gt Pd=sfcmp.gt(Rs,Rt) 424 sfcmp.uo Pd=sfcmp.uo(Rs,Rt) 424 sffixupd Rd=sffixupd(Rs,Rt) 431 sffixupn Rd=sffixupn(Rs,Rt) 431 sffixupr Rd=sffixupr(Rs) 431 sfinvsqrta Rd, Pe=sfinvsqrta(Rs) 434 sfmake Rd=sfmake(#u10):neg 437 Rd=sfmake(#u10):pos 437 sfmax Rd=sfmax(Rs,Rt) 438 sfmin Rd=sfmin(Rs,Rt) 439 sfmpy {\tt Rd=sfmpy(Rs,Rt)}~440 Rx+=sfmpy(Rs,Rt,Pu):scale 433 Rx + = sfmpy(Rs, Rt) 432 ``` ``` Rx + = sfmpy(Rs, Rt): lib 435 Rx-=sfmpy(Rs,Rt) 432 Rx-=sfmpy(Rs,Rt):lib 435 sfrecipa Rd, Pe=sfrecipa(Rs, Rt) 441 sfsub Rd=sfsub(Rs,Rt) 442 shuffeb Rdd=shuffeb(Rss,Rtt) 506 shuffeh Rdd=shuffeh(Rss,Rtt) 506 shuffob Rdd=shuffob(Rtt,Rss) 506 shuffoh Rdd=shuffoh(Rtt,Rss) 506 splloop0 p3=sp1loop0(#r7:2, #U10) 178 p3=sp1loop0(#r7:2,Rs) 178 sp2loop0 p3=sp2loop0(#r7:2, #U10) 178 p3=sp2loop0(#r7:2,Rs) 178 sp3loop0 p3=sp3loop0(#r7:2, #U10) 178 p3=sp3loop0(#r7:2,Rs) 178 sub if ([!]Pu[.new]) Rd=sub(Rt,Rs) 165 Rd=add(Rs, sub(\#s6, Ru)) 304 Rd=sub(#s10,Rs) 143 Rd=sub(Rt,Rs) 143 Rd=sub(Rt,Rs):sat 143 Rd=sub(Rt,Rs):sat:deprecated 325 Rd=sub(Rt.[HL],Rs.[HL])[:sat]:<<16 327 Rd=sub(Rt.L,Rs.[HL])[:sat] 327 Rdd=sub(Rss,Rtt,Px):carry 310 Rdd=sub(Rtt,Rss) 325 Rx += sub(Rt, Rs) 326 swiz Rd=swiz(Rs) 498 sxtb if ([!]Pu[.new]) Rd=sxtb(Rs) 166 Rd=sxtb(Rs) 144 sxth if ([!]Pu[.new]) Rd=sxth(Rs) 166 Rd=sxth(Rs) 144 sxtw Rdd=sxtw(Rs) 329 syncht syncht 299 Т tableidxb Rx=tableidxb(Rs, #u4, #S6):raw 388 Rx=tableidxb(Rs, #u4, #U5) 388 ``` ``` tableidxd Rx=tableidxd(Rs, #u4, #S6):raw 388 Rx=tableidxd(Rs, #u4, #U5) 388 tableidxh Rx=tableidxh(Rs, #u4, #S6):raw 389 Rx=tableidxh(Rs, #u4, #U5) 389 tableidxw Rx=tableidxw(Rs, #u4, #S6):raw 389 Rx=tableidxw(Rs, #u4, #U5) 389 tlbmatch Pd=tlbmatch(Rss,Rt) 525 togglebit Rd=togglebit(Rs, #u5) 384 Rd=togglebit(Rs,Rt) 384 trace trace(Rs) 300 trap0 trap0(#u8) 301 trap1 trap1(#u8) 301 trap1(Rx, #u8) 301 tstbit if ([!]tstbit(Ns.new, #0)) jump:<hint> #r9:2 247 p[01] = tstbit(Rs, #0) 189 Pd=[!]tstbit(Rs, #u5) 527 Pd=[!]tstbit(Rs,Rt) 527 vabsdiffb Rdd=vabsdiffb(Rtt,Rss) 332 vabsdiffh Rdd=vabsdiffh(Rtt,Rss) 333 vabsdiffub Rdd=vabsdiffub(Rtt,Rss) 332 vabsdiffw Rdd=vabsdiffw(Rtt,Rss) 334 vabsh Rdd=vabsh(Rss) 330 Rdd=vabsh(Rss):sat 330 vabsw Rdd=vabsw(Rss) 331 Rdd=vabsw(Rss):sat 331 vacsh Rxx, Pe=vacsh(Rss,Rtt) 336 vaddb Rdd=vaddb(Rss,Rtt) 344 vaddh Rd=vaddh(Rs,Rt)[:sat] 148 Rdd=vaddh(Rss,Rtt)[:sat] 338 vaddhub Rd=vaddhub(Rss,Rtt):sat 339 vaddub Rdd=vaddub(Rss,Rtt)[:sat] 344 ``` ``` vadduh Rd=vadduh(Rs,Rt):sat 148 Rdd=vadduh(Rss,Rtt):sat 338 vaddw Rdd=vaddw(Rss,Rtt)[:sat] 345 valignb Rdd=valignb(Rtt,Rss,#u3) 499 Rdd=valignb(Rtt,Rss,Pu) 499 vaslh Rdd=vaslh(Rss, \#u4) 560 Rdd=vaslh(Rss,Rt) 564 vaslw Rdd=vaslw(Rss, #u5) 566 Rdd=vaslw(Rss,Rt) 567 vasrh Rdd=vasrh(Rss, #u4) 560 Rdd=vasrh(Rss, #u4):raw 561 Rdd=vasrh(Rss, #u4):rnd 561 Rdd=vasrh(Rss,Rt) 564 vasrhub Rd=vasrhub(Rss, #u4):raw 562 Rd=vasrhub(Rss, #u4):rnd:sat 562 Rd=vasrhub(Rss, #u4):sat 562 vasrw Rd=vasrw(Rss, #u5) 569 Rd=vasrw(Rss,Rt) 569 Rdd=vasrw(Rss, #u5) 566 Rdd=vasrw(Rss,Rt) 567 vavqh Rd=vavgh(Rs,Rt) 149 Rd=vavgh(Rs,Rt):rnd 149 Rdd=vavgh(Rss,Rtt) 346 Rdd=vavgh(Rss,Rtt):crnd 346 Rdd=vavgh(Rss,Rtt):rnd 346 vavqub Rdd=vavgub(Rss,Rtt) 348 Rdd=vavgub(Rss,Rtt):rnd348 vavquh Rdd=vavguh (Rss, Rtt) 346 Rdd=vavguh (Rss, Rtt):rnd 346 vavquw Rdd=vavguw(Rss,Rtt)[:rnd] 349 vavqw Rdd=vavgw(Rss,Rtt):crnd 349 Rdd=vavgw(Rss,Rtt)[:rnd] 349 vclip Rdd=vclip(Rss, #u5) 351 vcmpb.eq Pd=!any8(vcmpb.eq(Rss,Rtt)) 530 Pd=any8 (vcmpb.eq(Rss,Rtt)) 530 Pd=vcmpb.eq(Rss, #u8) 531 Pd=vcmpb.eq(Rss,Rtt) 531 ``` ``` vcmpb.qt Pd=vcmpb.gt(Rss, #s8) 531 Pd=vcmpb.qt(Rss,Rtt) 531 vcmpb.qtu Pd=vcmpb.gtu(Rss, #u7) 531 Pd=vcmpb.gtu(Rss,Rtt) 531 vcmph.eq Pd=vcmph.eq(Rss, #s8) 528 Pd=vcmph.eq(Rss,Rtt) 528 vcmph.qt Pd=vcmph.gt(Rss, #s8) 528 Pd=vcmph.gt(Rss,Rtt) 528 vcmph.qtu Pd=vcmph.gtu(Rss, #u7) 528 Pd=vcmph.gtu(Rss,Rtt) 528 vcmpw.ea Pd=vcmpw.eq(Rss, #s8) 533 Pd=vcmpw.eq(Rss,Rtt) 533 vcmpw.qt Pd=vcmpw.gt(Rss, #s8) 533 Pd=vcmpw.qt(Rss,Rtt) 533 vcmpw.qtu Pd=vcmpw.gtu(Rss, #u7) 533 Pd=vcmpw.gtu(Rss,Rtt) 533 Rdd=vcmpyi(Rss,Rtt)[:<<1]:sat 408 Rxx+=vcmpyi(Rss,Rtt):sat 408 vcmpyr Rdd=vcmpyr(Rss,Rtt)[:<<1]:sat 408 Rxx+=vcmpyr(Rss,Rtt):sat 409 vcnegh Rdd=vcnegh(Rss,Rt) 352 vconj Rdd=vconj(Rss):sat 411 vcrotate Rdd=vcrotate(Rss,Rt) 412 vdmpy Rd=vdmpy(Rss,Rtt)[:<<1]:rnd:sat 475 Rdd=vdmpy(Rss,Rtt):<<1:sat 473 Rdd=vdmpy(Rss,Rtt):sat 473 Rxx+=vdmpy(Rss,Rtt):<<1:sat 473 Rxx+=vdmpy(Rss,Rtt):sat 474 vdmpybsu Rdd=vdmpybsu(Rss,Rtt):sat 479 Rxx+=vdmpybsu(Rss,Rtt):sat 479 vitpack Rd=vitpack(Ps,Pt) 535 vlslh Rdd=vlslh(Rss,Rt) 564 vlslw Rdd=vlslw(Rss,Rt) 567 vlsrh ``` ``` Rdd=vlsrh(Rss, \#u4) 560 Rdd=vlsrh(Rss,Rt) 564 vlsrw Rdd=vlsrw(Rss, #u5) 566 Rdd=vlsrw(Rss,Rt) 567 vmaxb Rdd=vmaxb(Rtt,Rss) 353 vmaxh Rdd=vmaxh(Rtt,Rss) 354 vmaxub Rdd=vmaxub(Rtt,Rss) 353 vmaxuh Rdd=vmaxuh(Rtt,Rss) 354 Rdd=vmaxuw(Rtt,Rss) 358 vmaxw Rdd=vmaxw(Rtt,Rss) 358 vminb Rdd=vminb(Rtt,Rss) 359 vminh Rdd=vminh(Rtt,Rss) 360 vminub Rdd, Pe=vminub(Rtt, Rss) 359 Rdd=vminub(Rtt,Rss) 359 vminuh Rdd=vminuh(Rtt,Rss) 360 vminuw Rdd=vminuw(Rtt,Rss) 365 vminw Rdd=vminw(Rtt,Rss) 365 vmpybsu Rdd=vmpybsu(Rs,Rt) 490 Rxx+=vmpybsu(Rs,Rt) 490 vmpybu Rdd=vmpybu(Rs,Rt) 490 Rxx+=vmpybu(Rs,Rt) 490 vmpyeh Rdd=vmpyeh(Rss,Rtt):<<1:sat 481 Rdd=vmpyeh(Rss,Rtt):sat 481 Rxx+=vmpyeh(Rss,Rtt) 481 Rxx+=vmpyeh(Rss,Rtt):<<1:sat 481 Rxx+=vmpyeh(Rss,Rtt):sat 481 vmpyh Rd=vmpyh(Rs,Rt)[:<<1]:rnd:sat 485 Rdd=vmpyh (Rs, Rt) [:<<1]:sat 483 Rxx+=vmpyh(Rs,Rt) 483 Rxx+=vmpyh(Rs,Rt)[:<<1]:sat 483 vmpyhsu Rdd=vmpyhsu(Rs,Rt)[:<<1]:sat 487 Rxx+=vmpyhsu(Rs,Rt)[:<<1]:sat 487 vmpyweh Rdd=vmpyweh (Rss, Rtt) [:<<1]:rnd:sat 446 ``` ``` Rdd=vmpyweh(Rss,Rtt)[:<<1]:sat 447 Rxx+=vmpyweh (Rss,Rtt) [:<<1]:rnd:sat 447</pre> Rxx+=vmpyweh(Rss,Rtt)[:<<1]:sat 447 vmpyweuh Rdd=vmpyweuh(Rss,Rtt)[:<<1]:rnd:sat 450 Rdd=vmpyweuh(Rss,Rtt)[:<<1]:sat 451 Rxx+=vmpyweuh(Rss,Rtt)[:<<1]:rnd:sat 451 Rxx+=vmpyweuh(Rss,Rtt)[:<<1]:sat 451 vmpywoh Rdd=vmpywoh(Rss,Rtt)[:<<1]:rnd:sat 447 Rdd=vmpywoh(Rss,Rtt)[:<<1]:sat 447 Rxx+=vmpywoh(Rss,Rtt)[:<<1]:rnd:sat 447</pre> Rxx+=vmpvwoh(Rss,Rtt)[:<<1]:sat 447 vmpywouh Rdd=vmpywouh(Rss,Rtt)[:<<1]:rnd:sat 451 Rdd=vmpywouh (Rss, Rtt) [:<<1]:sat 451 Rxx+=vmpywouh(Rss,Rtt)[:<<1]:rnd:sat 451</pre> Rxx+=vmpywouh(Rss,Rtt)[:<<1]:sat 451 vmux Rdd=vmux(Pu,Rss,Rtt) 536 vnavqh Rd=vnavgh(Rt,Rs) 149 Rdd=vnavgh(Rtt,Rss) 346 Rdd=vnavgh(Rtt,Rss):crnd:sat 346 Rdd=vnavgh(Rtt,Rss):rnd:sat 346 vnavqw Rdd=vnavgw(Rtt,Rss) 349 Rdd=vnavgw(Rtt,Rss):crnd:sat 349 Rdd=vnavgw(Rtt,Rss):rnd:sat 349 vpmpyh Rdd=vpmpyh (Rs,Rt) 492 Rxx^=vpmpyh(Rs,Rt) 493 vraddh Rd=vraddh(Rss,Rtt) 342 vraddub Rdd=vraddub(Rss,Rtt) 340 Rxx+=vraddub(Rss,Rtt) 340 vradduh Rd=vradduh(Rss,Rtt) 342 vrcmpys Rd=vrcmpys(Rss,Rt):<<1:rnd:sat 417 Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:hi 417 Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:lo418 Rdd=vrcmpys(Rss,Rt):<<1:sat 414 Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:hi 414 Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:lo415 Rxx+=vrcmpys(Rss,Rt):<<1:sat 415 Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:hi 415 Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:lo415 vrcnegh Rxx+=vrcnegh(Rss,Rt) 352 vrcrotate Rdd=vrcrotate(Rss,Rt,#u2) 420 ``` ``` Rxx+=vrcrotate(Rss,Rt,#u2) 420 vrmaxh Rxx=vrmaxh(Rss,Ru) 355 vrmaxuh Rxx=vrmaxuh(Rss,Ru) 355 vrmaxuw Rxx=vrmaxuw(Rss,Ru) 357 vrmaxw Rxx=vrmaxw(Rss,Ru) 357 vrminh Rxx=vrminh(Rss,Ru) 361 vrminuh Rxx=vrminuh(Rss,Ru) 361 vrminuw Rxx=vrminuw(Rss,Ru) 363 vrminw Rxx=vrminw(Rss,Ru) 363 vrmpybsu Rdd=vrmpybsu(Rss,Rtt) 477 Rxx+=vrmpybsu(Rss,Rtt) 477 vrmpybu Rdd=vrmpybu(Rss,Rtt) 477 Rxx+=vrmpybu(Rss,Rtt) 478 vrmpyh Rdd=vrmpyh (Rss, Rtt) 488 Rxx+=vrmpyh(Rss,Rtt) 488 vrmpyweh Rdd=vrmpyweh(Rss,Rtt)[:<<1] 467 Rxx+=vrmpyweh(Rss,Rtt)[:<<1] 467 vrmpywoh Rdd=vrmpywoh(Rss,Rtt)[:<<1] 467 Rxx+=vrmpywoh(Rss,Rtt)[:<<1] 467 vrndwh Rd=vrndwh(Rss) 500 Rd=vrndwh(Rss):sat 500 vrsadub Rdd=vrsadub(Rss,Rtt) 366 Rxx+=vrsadub(Rss,Rtt) 366 vsathb Rd=vsathb(Rs) 502 Rd=vsathb(Rss) 502 Rdd=vsathb(Rss) 504 vsathub Rd=vsathub(Rs) 502 Rd=vsathub(Rss) 503 Rdd=vsathub(Rss) 504 vsatwh Rd=vsatwh(Rss) 503 {\tt Rdd=vsatwh\,(Rss)}\ 504 vsatwuh Rd=vsatwuh(Rss) 503 Rdd=vsatwuh(Rss) 504 ``` ``` vsplatb Rd=vsplatb(Rs) 508 Rdd=vsplatb(Rs) 508 vsplath Rdd=vsplath(Rs) 509 vspliceb Rdd=vspliceb(Rss,Rtt,#u3) 510 Rdd=vspliceb(Rss,Rtt,Pu) 510 vsubb Rdd=vsubb(Rss,Rtt) 369 vsubh Rd=vsubh(Rt,Rs)[:sat] 150 Rdd=vsubh(Rtt,Rss)[:sat] 368 Rdd=vsubub(Rtt,Rss)[:sat] 369 vsubuh Rd=vsubuh(Rt,Rs):sat 150 Rdd=vsubuh(Rtt,Rss):sat 368 vsubw Rdd=vsubw(Rtt,Rss)[:sat] 370 vsxtbh Rdd=vsxtbh(Rs) 511 vsxthw Rdd=vsxthw(Rs) 511 vtrunehb Rd=vtrunehb(Rss) 513 Rdd=vtrunehb(Rss,Rtt) 513 vtrunewh Rdd=vtrunewh (Rss, Rtt) 513 vtrunohb Rd=vtrunohb(Rss) 513 Rdd=vtrunohb(Rss,Rtt) 513 vtrunowh Rdd=vtrunowh (Rss, Rtt) 514 vxaddsubh Rdd=vxaddsubh(Rss,Rtt):rnd:>>1:sat 390 Rdd=vxaddsubh (Rss, Rtt):sat 391 vxaddsubw Rdd=vxaddsubw(Rss,Rtt):sat 393 vxsubaddh Rdd=vxsubaddh(Rss,Rtt):rnd:>>1:sat 391 Rdd=vxsubaddh (Rss, Rtt):sat 391 vxsubaddw Rdd=vxsubaddw(Rss,Rtt):sat 393 vzxtbh Rdd=vzxtbh(Rs) 515 vzxthw Rdd=vzxthw(Rs) 515 Χ xor if ([!]Pu[.new]) Rd=xor(Rs,Rt) 163 Pd=xor(Ps,Pt) 180 ``` ``` Rd=xor(Rs,Rt) 139 Rdd=xor(Rss,Rtt) 312 Rx[&|^]=xor(Rs,Rt) 314 Rxx^=xor(Rss,Rtt) 313 Z zxtb if ([!]Pu[.new]) Rd=zxtb(Rs) 168 Rd=zxtb(Rs) 151 zxth if ([!]Pu[.new]) Rd=zxth(Rs) 168 Rd=zxth(Rs) 151 ``` ``` Α abs Rd=abs(Rs) Word32 Q6_R_abs_R(Word32 Rs) 303 Rd=abs(Rs):sat Word32 Q6 R abs R sat(Word32 Rs) 303 Rdd=abs(Rss) Word64 Q6 P abs P(Word64 Rss) 302 add Rd=add(#u6,mpyi(Rs,#U6)) Word32 Q6 R add mpyi IRI(Word32 lu6, Word32 Rs, Word32 lU6) 444 Rd=add(#u6.mpvi(Rs.Rt)) Word32 Q6 R add mpyi IRR(Word32 lu6, Word32 Rs, Word32 Rt) 444 Rd=add(Rs,#s16) Word32 Q6_R_add_RI(Word32 Rs, Word32 Is16) 137 Rd=add(Rs,add(Ru,#s6)) Word32 Q6_R_add_add_RRI(Word32 Rs, Word32 Ru, Word32 Is6) 304 Rd=add(Rs,Rt) Word32 Q6 R add RR(Word32 Rs, Word32 Rt) 137 Rd=add(Rs,Rt):sat Word32 Q6_R_add_RR_sat(Word32 Rs, Word32 Rt) 137 Rd=add(Rt.H,Rs.H):<<16 Word32 Q6 R add RhRh s16(Word32 Rt, Word32 Rs) 308 Rd=add(Rt.H,Rs.H):sat:<<16 Word32 Q6_R_add_RhRh_sat_s16(Word32 Rt, Word32 Rs) 308 Rd=add(Rt.H,Rs.L):<<16 Word32 Q6_R_add_RhRl_s16(Word32 Rt, Word32 Rs) 308 Rd=add(Rt.H,Rs.L):sat:<<16 Word32 Q6 R add RhRl sat s16(Word32 Rt, Word32 Rs) 308 Rd=add(Rt.L,Rs.H) Word32 Q6_R_add_RIRh(Word32 Rt, Word32 Rs) 308 Rd=add(Rt.L,Rs.H):<<16 Word32 Q6 R add RIRh s16(Word32 Rt, Word32 Rs) 309 Rd=add(Rt.L,Rs.H):sat Word32 Q6 R add RIRh sat(Word32 Rt, Word32 Rs) 309 Rd=add(Rt.L,Rs.H):sat:<<16 Word32 Q6_R_add_RIRh_sat_s16(Word32 Rt, Word32 Rs) 309 Rd=add(Rt.L,Rs.L) Word32 Q6 R add RIRI(Word32 Rt, Word32 Rs) 309 Rd=add(Rt.L,Rs.L):<<16 Word32 Q6 R add RIRI s16(Word32 Rt, Word32 Rs) 309 Rd=add(Rt.L,Rs.L):sat Word32 Q6_R_add_RIRI_sat(Word32 Rt, Word32 Rs) 309 Rd=add(Rt.L,Rs.L):sat:<<16 Word32 Q6 R add RIRI sat s16(Word32 Rt, Word32 Rs) 309 Rd=add(Ru,mpyi(#u6:2,Rs)) Word32 Q6_R_add_mpyi_RIR(Word32 Ru, Word32 Iu6_2, Word32 Rs) 444 Rd=add(Ru,mpyi(Rs,#u6)) Word32 Q6_R_add_mpyi_RRI(Word32 Ru, Word32 Rs, Word32 Iu6) 444 Rdd=add(Rs.Rtt) Word64 Q6 P add RP(Word32 Rs, Word64 Rtt) 306 Rdd=add(Rss,Rtt) Word64 Q6_P_add_PP(Word64 Rss, Word64 Rtt) 306 ``` ``` Rdd=add(Rss,Rtt):sat Word64 Q6_P_add_PP_sat(Word64 Rss, Word64 Rtt) 306 Rx += add(Rs, #s8) Word32 Q6 R addacc RI(Word32 Rx, Word32 Rs, Word32 Is8) 304 Rx += add(Rs,Rt) Word32 Q6_R_addacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 304 Rx-=add(Rs,#s8) Word32 Q6_R_addnac_RI(Word32 Rx, Word32 Rs, Word32 Is8) 304 Rx-=add(Rs,Rt) Word32 Q6 R addnac RR(Word32 Rx, Word32 Rs, Word32 Rt) 304 Ry=add(Ru,mpyi(Ry,Rs)) Word32 Q6_R_add_mpyi_RRR(Word32 Ru, Word32 Ry, Word32 Rs) 444 addasl Rd=addasl(Rt,Rs,#u3) Word32 Q6 R addasl RRI(Word32 Rt, Word32 Rs, Word32 lu3) 543 all8 Pd=all8(Ps) Byte Q6 p all8 p(Byte Ps) 174 and Pd=and(Ps,and(Pt,!Pu)) Byte Q6_p_and_and_ppnp(Byte Ps, Byte Pt, Byte Pu) 180 Pd=and(Ps,and(Pt,Pu)) Byte Q6 p and and ppp(Byte Ps, Byte Pt, Byte Pu) 180 Pd=and(Pt,!Ps) Byte Q6 p and pnp(Byte Pt, Byte Ps) 180 Pd=and(Pt,Ps) Byte Q6_p_and_pp(Byte Pt, Byte Ps) 180 Rd=and(Rs,#s10) Word32 Q6 R and RI(Word32 Rs, Word32 Is10) 139 Rd=and(Rs,Rt) Word32 Q6 R and RR(Word32 Rs, Word32 Rt) 139 Rd=and(Rt,~Rs) Word32 Q6_R_and_RnR(Word32 Rt, Word32 Rs) 139 Rdd=and(Rss,Rtt) Word64 Q6 P and PP(Word64 Rss, Word64 Rtt) 312 Rdd=and(Rtt,~Rss) Word64 Q6_P_and_PnP(Word64 Rtt, Word64 Rss) 312 Rx&=and(Rs,~Rt) Word32 Q6 R and RnR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx&=and(Rs.Rt) Word32 Q6 R and RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx^=and(Rs,~Rt) Word32 Q6_R_andxacc_RnR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx^=and(Rs,Rt) Word32 Q6 R andxacc RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx = and(Rs, #s10) Word32 Q6 R andor RI(Word32 Rx, Word32 Rs, Word32 Is10) 314 Rx = and(Rs, Rt) Word32 Q6_R_andor_RnR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx = and(Rs,Rt) Word32 Q6 R andor RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 any8 Pd=any8(Ps) ``` ``` Byte Q6_p_any8_p(Byte Ps) 174 asl Rd=asl(Rs,#u5) Word32 Q6 R asl RI(Word32 Rs, Word32 lu5) 538 Rd=asl(Rs,#u5):sat Word32 Q6_R_asl_RI_sat(Word32 Rs, Word32 lu5) 550 Rd=asl(Rs,Rt) Word32 Q6_R_asl_RR(Word32 Rs, Word32 Rt) 552 Rd=asl(Rs,Rt):sat Word32 Q6 R asl RR sat(Word32 Rs, Word32 Rt) 559 Rdd=asl(Rss,#u6) Word64 Q6 P asl PI(Word64 Rss, Word32 lu6) 538 Rdd=asl(Rss,Rt) Word64 Q6 P asl PR(Word64 Rss, Word32 Rt) 552 Rx\&=asl(Rs,#u5) Word32 Q6 R asland RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx&=asl(Rs,Rt) Word32 Q6_R_asland_RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rx^=asl(Rs, #u5) Word32 Q6_R_aslxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 545 Rx+=asl(Rs,#u5) Word32 Q6_R_aslacc_RI(Word32 Rx, Word32 Rs, Word32 lu5) 541 Rx+=asl(Rs,Rt) Word32 Q6_R_aslacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx=add(#u8,asl(Rx,#U5)) Word32 Q6 R add asl IRI(Word32 lu8, Word32 Rx, Word32 lU5) 541 Rx=and(\#u8,asl(Rx,\#U5)) Word32 Q6_R_and_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 545 Rx-=asl(Rs,#u5) Word32 Q6_R_aslnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 541 Rx-=asl(Rs.Rt) Word32 Q6 R aslnac RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx=or(#u8,asl(Rx,#U5)) Word32 Q6 R or asl IRI(Word32 lu8, Word32 Rx, Word32 lU5) 545 Rx=sub(\#u8,asl(Rx,\#U5)) Word32 Q6 R sub asl IRI(Word32 lu8, Word32 Rx, Word32 lU5) 541 Rx = asl(Rs, #u5) Word32 Q6 R aslor RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx = asl(Rs,Rt) Word32 Q6_R_aslor_RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rxx&=asl(Rss,#u6) Word64 Q6 P asland PI(Word64 Rxx, Word64 Rss, Word32 lu6) 545 Rxx&=asl(Rss,Rt) Word64 Q6_P_asland_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 557 Rxx^=asl(Rss,#u6) Word64 Q6_P_aslxacc_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 546 Rxx^=asl(Rss,Rt) Word64 Q6 P aslxacc PR(Word64 Rxx, Word64 Rss, Word32 Rt) 557 Rxx+=asl(Rss,#u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 541 Rxx+=asl(Rss,Rt) Word64 Q6_P_aslacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 Rxx-=asl(Rss,#u6) Word64 Q6 P aslnac PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 541 ``` ``` Rxx-=asl(Rss,Rt) Word64 Q6 P aslnac PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 Rxx = asl(Rss, #u6) Word64 Q6_P_aslor_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 546 Rxxl=asl(Rss.Rt) Word64 Q6_P_aslor_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 558 aslh Rd=aslh(Rs) Word32 Q6 R aslh R(Word32 Rs) 156 asr Rd=asr(Rs,#u5) Word32 Q6 R asr RI(Word32 Rs, Word32 lu5) 538 Rd=asr(Rs,#u5):rnd Word32 Q6 R asr RI rnd(Word32 Rs, Word32 lu5) 549 Rd=asr(Rs,Rt) Word32 Q6 R asr RR(Word32 Rs, Word32 Rt) 552 Rd=asr(Rs,Rt):sat Word32 Q6 R asr RR sat(Word32 Rs, Word32 Rt) 559 Rdd=asr(Rss,#u6) Word64 Q6 P asr PI(Word64 Rss, Word32 lu6) 538 Rdd=asr(Rss,#u6):rnd Word64 Q6 P asr PI rnd(Word64 Rss, Word32 lu6) 549 Rdd=asr(Rss,Rt) Word64 Q6 P asr PR(Word64 Rss, Word32 Rt) 552 Rx&=asr(Rs,#u5) Word32 Q6 R asrand RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx&=asr(Rs,Rt) Word32 Q6 R asrand RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rx+=asr(Rs,#u5) Word32 Q6 R asracc RI(Word32 Rx, Word32 Rs, Word32 lu5) 541 Rx+=asr(Rs,Rt) Word32 Q6 R asracc RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx-=asr(Rs,#u5) Word32 Q6 R asrnac RI(Word32 Rx, Word32 Rs, Word32 lu5) 541 Rx-=asr(Rs,Rt) Word32 Q6 R asrnac RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx = asr(Rs, #u5) Word32 Q6 R asror RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx = asr(Rs,Rt) Word32 Q6 R asror RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rxx&=asr(Rss,#u6) Word64 Q6 P asrand PI(Word64 Rxx, Word64 Rss, Word32 lu6) 545 Rxx&=asr(Rss,Rt) Word64 Q6 P asrand PR(Word64 Rxx, Word64 Rss, Word32 Rt) 557 Rxx^=asr(Rss,Rt) Word64 Q6 P asrxacc PR(Word64 Rxx, Word64 Rss, Word32 Rt) 557 Rxx+=asr(Rss,#u6) Word64 Q6 P asracc PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 541 Rxx+=asr(Rss,Rt) Word64 Q6 P asracc PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 Rxx-=asr(Rss,#u6) Word64 Q6 P asrnac PI(Word64 Rxx, Word64 Rss, Word32 lu6) 541 Rxx-=asr(Rss,Rt) Word64 Q6 P asrnac PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 ``` ``` Rxx | =asr(Rss, #u6) Word64 Q6_P_asror_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 546 Rxx | =asr(Rss,Rt) Word64 Q6 P asror PR(Word64 Rxx, Word64 Rss, Word32 Rt) 558 asrh Rd=asrh(Rs) Word32 Q6_R_asrh_R(Word32 Rs) 156 asrrnd Rd=asrrnd(Rs,#u5) Word32 Q6 R asrrnd RI(Word32 Rs, Word32 lu5) 549 Rdd=asrrnd(Rss,#u6) Word64 Q6_P_asrrnd_PI(Word64 Rss, Word32 lu6) 549 В bitsclr Pd=!bitsclr(Rs,#u6) Byte Q6 p not bitsclr RI(Word32 Rs, Word32 lu6) 523 Pd=!bitsclr(Rs,Rt) Byte Q6 p not bitsclr RR(Word32 Rs, Word32 Rt) 523 Pd=bitsclr(Rs,#u6) Byte Q6 p bitsclr RI(Word32 Rs, Word32 lu6) 523 Pd=bitsclr(Rs,Rt) Byte Q6 p bitsclr RR(Word32 Rs, Word32 Rt) 523 bitsplit Rdd=bitsplit(Rs,#u5) Word64 Q6_P_bitsplit_RI(Word32 Rs, Word32 Iu5) 386 Rdd=bitsplit(Rs,Rt) Word64 Q6 P bitsplit RR(Word32 Rs, Word32 Rt) 386 bitsset Pd=!bitsset(Rs,Rt) Byte Q6 p not bitsset RR(Word32 Rs, Word32 Rt) 523 Pd=bitsset(Rs,Rt) Byte Q6_p_bitsset_RR(Word32 Rs, Word32 Rt) 523 boundscheck Pd=boundscheck(Rs,Rtt) Byte Q6 p boundscheck RP(Word32 Rs, Word64 Rtt) 517 brev Rd=brev(Rs) Word32 Q6_R_brev_R(Word32 Rs) 383 Rdd=brev(Rss) Word64 Q6_P_brev_P(Word64 Rss) 383 C cl0 Rd=cl0(Rs) Word32 Q6_R_cl0_R(Word32 Rs) 372 Rd=cl0(Rss) Word32 Q6_R_cl0_P(Word64 Rss) 372 cl1 Rd=cl1(Rs) Word32 Q6_R_cl1_R(Word32 Rs) 372 Rd=cl1(Rss) Word32 Q6 R cl1 P(Word64 Rss) 372 clb Rd=add(clb(Rs),#s6) ``` ``` Word32 Q6_R_add_clb_RI(Word32 Rs, Word32 Is6) 372 Rd=add(clb(Rss),#s6) Word32 Q6_R_add_clb_PI(Word64 Rss, Word32 Is6) 372 Rd=clb(Rs) Word32 Q6 R clb R(Word32 Rs) 372 Rd=clb(Rss) Word32 Q6_R_clb_P(Word64 Rss) 372 clip Rd=clip(Rs,#u5) Word32 Q6 R clip RI(Word32 Rs, Word32 lu5) 311 clrbit Rd=clrbit(Rs,#u5) Word32 Q6_R_clrbit_RI(Word32 Rs, Word32 lu5) 384 Rd=clrbit(Rs,Rt) Word32 Q6 R clrbit RR(Word32 Rs, Word32 Rt) 384 cmp.eq Pd=!cmp.eq(Rs,#s10) Byte Q6 p not cmp eq RI(Word32 Rs, Word32 Is10) 169 Pd=!cmp.eq(Rs,Rt) Byte Q6_p_not_cmp_eq_RR(Word32 Rs, Word32 Rt) 169 Pd=cmp.eq(Rs,#s10) Byte Q6_p_cmp_eq_RI(Word32 Rs, Word32 Is10) 169 Pd=cmp.eq(Rs,Rt) Byte Q6_p_cmp_eq_RR(Word32 Rs, Word32 Rt) 169 Pd=cmp.eq(Rss,Rtt) Byte Q6_p_cmp_eq_PP(Word64 Rss, Word64 Rtt) 522 Rd=!cmp.eq(Rs,#s8) Word32 Q6 R not cmp eq RI(Word32 Rs, Word32 Is8) 171 Rd=!cmp.eq(Rs,Rt) Word32 Q6_R_not_cmp_eq_RR(Word32 Rs, Word32 Rt) 171 Rd=cmp.eq(Rs,#s8) Word32 Q6_R_cmp_eq_RI(Word32 Rs, Word32 Is8) 171 Rd=cmp.eq(Rs,Rt) Word32 Q6 R cmp eq RR(Word32 Rs, Word32 Rt) 171 cmp.ge Pd=cmp.ge(Rs,#s8) Byte Q6_p_cmp_ge_RI(Word32 Rs, Word32 Is8) 169 cmp.geu Pd=cmp.geu(Rs,#u8) Byte Q6_p_cmp_geu_RI(Word32 Rs, Word32 lu8) 169 cmp.gt Pd=!cmp.gt(Rs,#s10) Byte Q6 p not cmp gt RI(Word32 Rs, Word32 Is10) 169 Pd=!cmp.gt(Rs,Rt) Byte Q6_p_not_cmp_gt_RR(Word32 Rs, Word32 Rt) 169 Pd=cmp.gt(Rs,#s10) Byte Q6 p cmp gt RI(Word32 Rs, Word32 Is10) 169 Pd=cmp.gt(Rs,Rt) Byte Q6_p_cmp_gt_RR(Word32 Rs, Word32 Rt) 170 Pd=cmp.gt(Rss,Rtt) Byte Q6_p_cmp_gt_PP(Word64 Rss, Word64 Rtt) 522 cmp.gtu Pd=!cmp.gtu(Rs,#u9) ``` ``` Byte Q6_p_not_cmp_gtu_RI(Word32 Rs, Word32 lu9) 169 Pd=!cmp.gtu(Rs,Rt) Byte Q6_p_not_cmp_gtu_RR(Word32 Rs, Word32 Rt) 169 Pd=cmp.gtu(Rs,#u9) Byte Q6 p cmp gtu RI(Word32 Rs, Word32 lu9) 170 Pd=cmp.gtu(Rs,Rt) Byte Q6_p_cmp_gtu_RR(Word32 Rs, Word32 Rt) 170 Pd=cmp.gtu(Rss,Rtt) Byte Q6 p cmp gtu PP(Word64 Rss, Word64 Rtt) 522 cmp.lt Pd=cmp.lt(Rs,Rt) Byte Q6_p_cmp_lt_RR(Word32 Rs, Word32 Rt) 170 cmp.ltu Pd=cmp.ltu(Rs,Rt) Byte Q6 p cmp ltu RR(Word32 Rs, Word32 Rt) 170 cmpb.eq Pd=cmpb.eq(Rs,#u8) Byte Q6_p_cmpb_eq_RI(Word32 Rs, Word32 lu8) 518 Pd=cmpb.eq(Rs,Rt) Byte Q6_p_cmpb_eq_RR(Word32 Rs, Word32 Rt) 518 cmpb.gt Pd=cmpb.gt(Rs,#s8) Byte Q6_p_cmpb_gt_RI(Word32 Rs, Word32 Is8) 518 Pd=cmpb.gt(Rs,Rt) Byte Q6_p_cmpb_gt_RR(Word32 Rs, Word32 Rt) 518 cmpb.gtu Pd=cmpb.gtu(Rs,#u7) Byte Q6_p_cmpb_gtu_RI(Word32 Rs, Word32 Iu7) 518 Pd=cmpb.gtu(Rs,Rt) Byte Q6_p_cmpb_gtu_RR(Word32 Rs, Word32 Rt) 518 cmph.eq Pd=cmph.eq(Rs,#s8) Byte Q6_p_cmph_eq_RI(Word32 Rs, Word32 Is8) 520 Pd=cmph.eq(Rs,Rt) Byte Q6_p_cmph_eq_RR(Word32 Rs, Word32 Rt) 520 cmph.gt Pd=cmph.gt(Rs,#s8) Byte Q6_p_cmph_gt_RI(Word32 Rs, Word32 Is8) 520 Pd=cmph.gt(Rs,Rt) Byte Q6_p_cmph_gt_RR(Word32 Rs, Word32 Rt) 520 cmph.gtu Pd=cmph.gtu(Rs,#u7) Byte Q6_p_cmph_gtu_RI(Word32 Rs, Word32 Iu7) 520 Pd=cmph.gtu(Rs,Rt) Byte Q6 p cmph gtu RR(Word32 Rs, Word32 Rt) 520 cmpy Rd=cmpy(Rs,Rt):<<1:rnd:sat Word32 Q6_R_cmpy_RR_s1_rnd_sat(Word32 Rs, Word32 Rt) 401 Rd=cmpy(Rs,Rt):rnd:sat Word32 Q6 R cmpy RR rnd sat(Word32 Rs, Word32 Rt) 401 Rd=cmpy(Rs,Rt*):<<1:rnd:sat Word32 Q6 R cmpy RR conj s1 rnd sat(Word32 Rs, Word32 Rt) 401 Rd=cmpy(Rs,Rt*):rnd:sat ``` ``` Word32 Q6_R_cmpy_RR_conj_rnd_sat(Word32 Rs, Word32 Rt) 401 Rdd=cmpy(Rs,Rt):<<1:sat Word64 Q6_P_cmpy_RR_s1_sat(Word32 Rs, Word32 Rt) 396 Rdd=cmpy(Rs,Rt):sat Word64 Q6 P cmpy RR sat(Word32 Rs, Word32 Rt) 396 Rdd=cmpy(Rs,Rt*):<<1:sat Word64 Q6 P cmpy RR conj s1 sat(Word32 Rs, Word32 Rt) 396 Rdd=cmpy(Rs,Rt*):sat Word64 Q6 P cmpy RR conj sat(Word32 Rs, Word32 Rt) 396 Rxx+=cmpy(Rs,Rt):<<1:sat Word64 Q6 P cmpyacc RR s1 sat(Word64 Rxx, Word32 Rs, Word32 Rt) 396 Rxx+=cmpy(Rs,Rt):sat Word64 Q6_P_cmpyacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 396 Rxx+=cmpy(Rs,Rt*):<<1:sat Word64 Q6_P_cmpyacc_RR_conj_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 396 Rxx+=cmpy(Rs,Rt*):sat Word64 Q6_P_cmpyacc_RR_conj_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 396 Rxx-=cmpy(Rs,Rt):<<1:sat Word64 Q6_P_cmpynac_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 396 Rxx-=cmpy(Rs,Rt):sat Word64 Q6_P_cmpynac_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 397 Rxx-=cmpy(Rs,Rt*):<<1:sat Word64 Q6_P_cmpynac_RR_conj_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 397 Rxx-=cmpy(Rs,Rt*):sat Word64 Q6_P_cmpynac_RR_conj_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 397 cmpyi Rdd=cmpyi(Rs,Rt) Word64 Q6_P_cmpyi_RR(Word32 Rs, Word32 Rt) 398 Rxx+=cmpyi(Rs,Rt) Word64 Q6_P_cmpyiacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 398 cmpyiw Rd=cmpyiw(Rss,Rtt):<<1:rnd:sat Word32 Q6 R cmpyiw PP s1 rnd sat(Word64 Rss, Word64 Rtt) 406 Rd=cmpyiw(Rss,Rtt):<<1:sat Word32 Q6 R cmpyiw PP s1 sat(Word64 Rss, Word64 Rtt) 406 Rd=cmpyiw(Rss,Rtt*):<<1:rnd:sat Word32 Q6 R cmpyiw PP conj s1 rnd sat(Word64 Rss, Word64 Rtt) 406 Rd=cmpyiw(Rss,Rtt*):<<1:sat Word32 Q6_R_cmpyiw_PP_conj_s1_sat(Word64 Rss, Word64 Rtt) 406 Rdd=cmpyiw(Rss,Rtt) Word64 Q6 P cmpyiw PP(Word64 Rss, Word64 Rtt) 406 Rdd=cmpyiw(Rss,Rtt*) Word64 Q6_P_cmpyiw_PP_conj(Word64 Rss, Word64 Rtt) 406 Rxx+=cmpyiw(Rss,Rtt) Word64 Q6_P_cmpyiwacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 406 Rxx+=cmpviw(Rss,Rtt*) Word64 Q6 P cmpyiwacc PP conj(Word64 Rxx, Word64 Rss, Word64 Rtt) 406 cmpyiwh Rd=cmpyiwh(Rss,Rt):<<1:rnd:sat Word32 Q6 R cmpyiwh PR s1 rnd sat(Word64 Rss, Word32 Rt) 403 Rd=cmpyiwh(Rss,Rt*):<<1:rnd:sat Word32 Q6_R_cmpyiwh_PR_conj_s1_rnd_sat(Word64 Rss, Word32 Rt) 403 cmpyr ``` ``` Rdd=cmpyr(Rs,Rt) Word64 Q6_P_cmpyr_RR(Word32 Rs, Word32 Rt) 398 Rxx+=cmpyr(Rs,Rt) Word64 Q6 P cmpyracc RR(Word64 Rxx, Word32 Rs, Word32 Rt) 398 cmpvrw Rd=cmpyrw(Rss,Rtt):<<1:rnd:sat Word32 Q6_R_cmpyrw_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 406 Rd=cmpyrw(Rss,Rtt):<<1:sat Word32 Q6_R_cmpyrw_PP_s1_sat(Word64 Rss, Word64 Rtt) 406 Rd=cmpyrw(Rss,Rtt*):<<1:rnd:sat Word32 Q6_R_cmpyrw_PP_conj_s1_rnd_sat(Word64 Rss, Word64 Rtt) 406 Rd=cmpyrw(Rss,Rtt*):<<1:sat Word32 Q6_R_cmpyrw_PP_conj_s1_sat(Word64 Rss, Word64 Rtt) 406 Rdd=cmpyrw(Rss,Rtt) Word64 Q6 P cmpyrw PP(Word64 Rss, Word64 Rtt) 406 Rdd=cmpyrw(Rss,Rtt*) Word64 Q6_P_cmpyrw_PP_conj(Word64 Rss, Word64 Rtt) 406 Rxx+=cmpyrw(Rss,Rtt) Word64 Q6_P_cmpyrwacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 406 Rxx+=cmpyrw(Rss,Rtt*) Word64 Q6 P cmpyrwacc PP conj(Word64 Rxx, Word64 Rss, Word64 Rtt) 406 cmpyrwh Rd=cmpyrwh(Rss,Rt):<<1:rnd:sat Word32 Q6 R cmpyrwh PR s1 rnd sat(Word64 Rss, Word32 Rt) 403 Rd=cmpyrwh(Rss,Rt*):<<1:rnd:sat Word32 Q6 R cmpyrwh PR conj s1 rnd sat(Word64 Rss, Word32 Rt) 403 combine Rd=combine(Rt.H,Rs.H) Word32 Q6 R combine RhRh(Word32 Rt, Word32 Rs) 153 Rd=combine(Rt.H,Rs.L) Word32 Q6 R combine RhRI(Word32 Rt, Word32 Rs) 153 Rd=combine(Rt.L,Rs.H) Word32 Q6_R_combine_RIRh(Word32 Rt, Word32 Rs) 153 Rd=combine(Rt.L,Rs.L) Word32 Q6 R combine RIRI(Word32 Rt, Word32 Rs) 153 Rdd=combine(#s8,#S8) Word64 Q6_P_combine_II(Word32 Is8, Word32 IS8) 153 Rdd=combine(#s8,Rs) Word64 Q6 P combine IR(Word32 Is8, Word32 Rs) 153 Rdd=combine(Rs.#s8) Word64 Q6 P combine RI(Word32 Rs, Word32 Is8) 153 Rdd=combine(Rs,Rt) Word64 Q6_P_combine_RR(Word32 Rs, Word32 Rt) 153 convert d2df Rdd=convert d2df(Rss) Word64 Q6_P_convert_d2df_P(Word64 Rss) 427 convert_d2sf Rd=convert d2sf(Rss) Word32 Q6 R convert d2sf P(Word64 Rss) 427 convert df2d Rdd=convert df2d(Rss) Word64 Q6_P_convert_df2d_P(Word64 Rss) 430 Rdd=convert df2d(Rss):chop ``` ``` Word64 Q6_P_convert_df2d_P_chop(Word64 Rss) 430 convert_df2sf Rd=convert df2sf(Rss) Word32 Q6 R convert df2sf P(Word64 Rss) 426 convert df2ud Rdd=convert df2ud(Rss) Word64 Q6 P convert df2ud P(Word64 Rss) 430 Rdd=convert df2ud(Rss):chop Word64 Q6 P convert df2ud P chop(Word64 Rss) 430 convert df2uw Rd=convert_df2uw(Rss) Word32 Q6 R convert df2uw P(Word64 Rss) 429 Rd=convert df2uw(Rss):chop Word32 Q6 R convert df2uw P chop(Word64 Rss) 429 convert df2w Rd=convert_df2w(Rss) Word32 Q6 R convert df2w P(Word64 Rss) 429 Rd=convert df2w(Rss):chop Word32 Q6 R convert df2w P chop(Word64 Rss) 429 convert sf2d Rdd=convert sf2d(Rs) Word64 Q6 P convert sf2d R(Word32 Rs) 430 Rdd=convert sf2d(Rs):chop Word64 Q6_P_convert_sf2d_R_chop(Word32 Rs) 430 convert sf2df Rdd=convert sf2df(Rs) Word64 Q6 P convert sf2df R(Word32 Rs) 426 convert sf2ud Rdd=convert_sf2ud(Rs) Word64 Q6 P convert sf2ud R(Word32 Rs) 430 Rdd=convert sf2ud(Rs):chop Word64 Q6 P convert sf2ud R chop(Word32 Rs) 430 convert sf2uw Rd=convert sf2uw(Rs) Word32 Q6 R convert sf2uw R(Word32 Rs) 429 Rd=convert sf2uw(Rs):chop Word32 Q6_R_convert_sf2uw_R_chop(Word32 Rs) 430 convert sf2w Rd=convert sf2w(Rs) Word32 Q6 R convert sf2w R(Word32 Rs) 430 Rd=convert sf2w(Rs):chop Word32 Q6_R_convert_sf2w_R_chop(Word32 Rs) 430 convert ud2df Rdd=convert ud2df(Rss) Word64 Q6 P convert ud2df P(Word64 Rss) 427 convert ud2sf Rd=convert ud2sf(Rss) Word32 Q6 R convert ud2sf P(Word64 Rss) 427 convert uw2df Rdd=convert uw2df(Rs) Word64 Q6_P_convert_uw2df_R(Word32 Rs) 427 convert uw2sf Rd=convert uw2sf(Rs) ``` ``` Word32 Q6_R_convert_uw2sf_R(Word32 Rs) 427 convert_w2df Rdd=convert w2df(Rs) Word64 Q6_P_convert_w2df_R(Word32 Rs) 427 convert_w2sf Rd=convert w2sf(Rs) Word32 Q6 R convert w2sf R(Word32 Rs) 427 cround Rd=cround(Rs,#u5) Word32 Q6_R_cround_RI(Word32 Rs, Word32 lu5) 323 Rd=cround(Rs,Rt) Word32 Q6 R cround RR(Word32 Rs, Word32 Rt) 323 Rdd=cround(Rss,#u6) Word64 Q6 P cround PI(Word64 Rss, Word32 lu6) 323 Rdd=cround(Rss,Rt) Word64 Q6_P_cround_PR(Word64 Rss, Word32 Rt) 323 ct0 Rd=ctO(Rs) Word32 Q6_R_ct0_R(Word32 Rs) 374 Rd=ctO(Rss) Word32 Q6_R_ct0_P(Word64 Rss) 374 ct1 Rd=ct1(Rs) Word32 Q6_R_ct1_R(Word32 Rs) 374 Rd=ct1(Rss) Word32 Q6 R ct1 P(Word64 Rss) 374 D dccleana dccleana(Rs) void Q6 dccleana A(Address a) 291 dccleaninva dccleaninva(Rs) void Q6_dccleaninva_A(Address a) 291 dcfetch dcfetch(Rs) void Q6_dcfetch_A(Address a) 290 dcinva dcinva(Rs) void Q6 dcinva A(Address a) 291 dczeroa dczeroa(Rs) void Q6_dczeroa_A(Address a) 287 deinterleave Rdd=deinterleave(Rss) Word64 Q6_P_deinterleave_P(Word64 Rss) 380 dfadd Rdd=dfadd(Rss,Rtt) Word64 Q6_P_dfadd_PP(Word64 Rss, Word64 Rtt) 422 dfclass Pd=dfclass(Rss,#u5) Byte Q6_p_dfclass_PI(Word64 Rss, Word32 lu5) 423 dfcmp.eq ``` ``` Pd=dfcmp.eq(Rss,Rtt) Byte Q6_p_dfcmp_eq_PP(Word64 Rss, Word64 Rtt) 424 dfcmp.ge Pd=dfcmp.ge(Rss,Rtt) Byte Q6_p_dfcmp_ge_PP(Word64 Rss, Word64 Rtt) 424 dfcmp.gt Pd=dfcmp.gt(Rss,Rtt) Byte Q6 p dfcmp gt PP(Word64 Rss, Word64 Rtt) 424 dfcmp.uo Pd=dfcmp.uo(Rss,Rtt) Byte Q6_p_dfcmp_uo_PP(Word64 Rss, Word64 Rtt) 424 dfmake Rdd=dfmake(#u10):neg Word64 Q6_P_dfmake_I_neg(Word32 lu10) 437 Rdd=dfmake(#u10):pos Word64 Q6_P_dfmake_I_pos(Word32 Iu10) 437 dfmax Rdd=dfmax(Rss,Rtt) Word64 Q6 P dfmax PP(Word64 Rss, Word64 Rtt) 438 dfmin Rdd=dfmin(Rss,Rtt) Word64 Q6_P_dfmin_PP(Word64 Rss, Word64 Rtt) 439 dfmpyfix Rdd=dfmpyfix(Rss,Rtt) {\tt Word64\ Q6\_P\_dfmpyfix\_PP(Word64\ Rss,\ Word64\ Rtt)\ 440} dfmpyhh Rxx+=dfmpyhh(Rss,Rtt) Word64 Q6_P_dfmpyhhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 432 dfmpylh Rxx+=dfmpylh(Rss,Rtt) Word64 Q6_P_dfmpylhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 432 Rdd=dfmpyll(Rss,Rtt) Word64 Q6 P dfmpyll PP(Word64 Rss, Word64 Rtt) 440 dfsub Rdd=dfsub(Rss,Rtt) Word64 Q6 P dfsub PP(Word64 Rss, Word64 Rtt) 442 dmsvncht Rd=dmsyncht Word32 Q6_R_dmsyncht() 299 Ε extract Rd=extract(Rs,#u5,#U5) Word32 Q6_R_extract_RII(Word32 Rs, Word32 Iu5, Word32 IU5) 376 Rd=extract(Rs,Rtt) Word32 Q6 R extract RP(Word32 Rs, Word64 Rtt) 376 Rdd=extract(Rss,#u6,#U6) Word64 Q6_P_extract_PII(Word64 Rss, Word32 lu6, Word32 lU6) 376 Rdd=extract(Rss,Rtt) Word64 Q6 P extract PP(Word64 Rss, Word64 Rtt) 376 extractu Rd=extractu(Rs,#u5,#U5) ``` ``` Word32 Q6_R_extractu_RII(Word32 Rs, Word32 Iu5, Word32 IU5) 376 Rd=extractu(Rs,Rtt) Word32 Q6_R_extractu_RP(Word32 Rs, Word64 Rtt) 376 Rdd=extractu(Rss,#u6,#U6) Word64 Q6_P_extractu_PII(Word64 Rss, Word32 Iu6, Word32 IU6) 376 Rdd=extractu(Rss,Rtt) Word64 Q6 P extractu PP(Word64 Rss, Word64 Rtt) 376 F fastcorner9 Pd=!fastcorner9(Ps,Pt) Byte Q6_p_not_fastcorner9_pp(Byte Ps, Byte Pt) 173 Pd=fastcorner9(Ps,Pt) Byte Q6_p_fastcorner9_pp(Byte Ps, Byte Pt) 173 insert Rx=insert(Rs,#u5,#U5) Word32 Q6_R_insert_RII(Word32 Rx, Word32 Rs, Word32 Iu5, Word32 IU5) 379 Rx=insert(Rs,Rtt) Word32 Q6_R_insert_RP(Word32 Rx, Word32 Rs, Word64 Rtt) 379 Rxx=insert(Rss,#u6,#U6) Word64 Q6_P_insert_PII(Word64 Rxx, Word64 Rss, Word32 Iu6, Word32 IU6) 379 Rxx=insert(Rss,Rtt) Word64 Q6_P_insert_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 379 interleave Rdd=interleave(Rss) Word64 Q6 P interleave P(Word64 Rss) 380 L 12fetch I2fetch(Rs,Rt) void Q6 | 12fetch AR(Address a, Word32 Rt) 297 I2fetch(Rs,Rtt) void Q6 l2fetch AP(Address a, Word64 Rtt) 297 lfs Rdd=lfs(Rss,Rtt) Word64 Q6_P_lfs_PP(Word64 Rss, Word64 Rtt) 381 Isl Rd=IsI(#s6,Rt) Word32 Q6_R_lsl_IR(Word32 ls6, Word32 Rt) 552 Rd=IsI(Rs,Rt) Word32 Q6_R_lsl_RR(Word32 Rs, Word32 Rt) 552 Rdd=IsI(Rss,Rt) Word64 Q6 P Isl PR(Word64 Rss, Word32 Rt) 552 Rx&=IsI(Rs,Rt) Word32 Q6_R_Island_RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rx+=IsI(Rs,Rt) Word32 Q6_R_Islacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx-=IsI(Rs,Rt) Word32 Q6 R Islnac RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx = IsI(Rs,Rt) Word32 Q6_R_Islor_RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rxx&=IsI(Rss,Rt) Word64 Q6 P Island PR(Word64 Rxx, Word64 Rss, Word32 Rt) 557 Rxx^=lsl(Rss,Rt) ``` ``` Word64 Q6 P Islxacc PR(Word64 Rxx, Word64 Rss, Word32 Rt) 557 Rxx+=IsI(Rss,Rt) Word64 Q6_P_Islacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 Rxx-=lsl(Rss,Rt) Word64 Q6 P Islnac PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 Rxx | = |s|(Rss,Rt) Word64 Q6 P Islor PR(Word64 Rxx, Word64 Rss, Word32 Rt) 558 Isr Rd=lsr(Rs,#u5) Word32 Q6 R lsr RI(Word32 Rs, Word32 lu5) 538 Rd=Isr(Rs,Rt) Word32 Q6 R Isr RR(Word32 Rs, Word32 Rt) 552 Rdd=lsr(Rss,#u6) Word64 Q6 P Isr PI(Word64 Rss, Word32 lu6) 539 Rdd=lsr(Rss,Rt) Word64 Q6 P Isr PR(Word64 Rss, Word32 Rt) 552 Rx&=lsr(Rs,#u5) Word32 Q6_R_lsrand_RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx&=lsr(Rs,Rt) Word32 Q6 R Isrand RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rx^=Isr(Rs, #u5) Word32 Q6_R_Isrxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 545 Rx+=Isr(Rs,#u5) Word32 Q6_R_Isracc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 541 Rx+=Isr(Rs,Rt) Word32 Q6 R Isracc RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx=add(\#u8,Isr(Rx,\#U5)) Word32 Q6_R_add_lsr_IRI(Word32 lu8, Word32 Rx, Word32 lU5) 541 Rx=and(\#u8,Isr(Rx,\#U5)) Word32 Q6_R_and_lsr_IRI(Word32 lu8, Word32 Rx, Word32 lU5) 545 Rx-=Isr(Rs.#u5) Word32 Q6 R Isrnac RI(Word32 Rx, Word32 Rs, Word32 Iu5) 541 Rx-=Isr(Rs,Rt) Word32 Q6 R Isrnac RR(Word32 Rx, Word32 Rs, Word32 Rt) 554 Rx=or(\#u8,Isr(Rx,\#U5)) Word32 Q6 R or lsr IRI(Word32 lu8, Word32 Rx, Word32 lU5) 545 Rx=sub(\#u8,Isr(Rx,\#U5)) Word32 Q6 R sub lsr IRI(Word32 lu8, Word32 Rx, Word32 lU5) 541 Rx = Isr(Rs, #u5) Word32 Q6_R_lsror_RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx = Isr(Rs,Rt) Word32 Q6 R Isror RR(Word32 Rx, Word32 Rs, Word32 Rt) 557 Rxx&=lsr(Rss,#u6) Word64 Q6_P_Isrand_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 546 Rxx&=Isr(Rss,Rt) Word64 Q6_P_lsrand_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 557 Rxx^=lsr(Rss,#u6) Word64 Q6 P Isrxacc PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 546 Rxx^=lsr(Rss,Rt) Word64 Q6_P_Isrxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 558 Rxx+=lsr(Rss,#u6) Word64 Q6 P Isracc PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 541 Rxx+=lsr(Rss.Rt) Word64 Q6 P Isracc PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 ``` ``` Rxx-=lsr(Rss,#u6) Word64 Q6 P Isrnac PI(Word64 Rxx, Word64 Rss, Word32 lu6) 541 Rxx-=lsr(Rss,Rt) Word64 Q6 P Isrnac PR(Word64 Rxx, Word64 Rss, Word32 Rt) 554 Rxx | =|sr(Rss.#u6) Word64 Q6_P_lsror_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 546 Rxx | = lsr(Rss,Rt) Word64 Q6_P_lsror_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 558 M mask Rd=mask(#u5,#U5) Word32 Q6 R mask II(Word32 Iu5, Word32 IU5) 537 Rdd=mask(Pt) Word64 Q6 P mask p(Byte Pt) 524 max Rd=max(Rs,Rt) Word32 Q6 R max RR(Word32 Rs, Word32 Rt) 316 Rdd=max(Rss,Rtt) Word64 Q6 P max PP(Word64 Rss, Word64 Rtt) 317 maxu Rd=maxu(Rs,Rt) UWord32 Q6_R_maxu_RR(Word32 Rs, Word32 Rt) 316 Rdd=maxu(Rss,Rtt) UWord64 Q6 P maxu PP(Word64 Rss, Word64 Rtt) 317 memb memb(Rx++#s4:0:circ(Mu))=Rt void Q6 memb IMR circ(void** StartAddress, Word32 Is4 0, Word32 Mu, Word32 Rt, void* BaseAddress) 267 memb(Rx++I:circ(Mu))=Rt void Q6_memb_MR_circ(void** StartAddress, Word32 Mu, Word32 Rt, void* BaseAddress) 267 Rd=memb(Rx++#s4:0:circ(Mu)) Word32 Q6 R memb IM circ(void** StartAddress, Word32 Is4 0, Word32 Mu, void* BaseAddress) 202 Rd=memb(Rx++I:circ(Mu)) Word32 Q6 R memb M circ(void** StartAddress, Word32 Mu, void* BaseAddress) 202 memd memd(Rx++#s4:3:circ(Mu))=Rtt void Q6 memd IMP circ(void** StartAddress, Word32 Is4 3, Word32 Mu, Word64 Rtt, void* BaseAd- dress) 263 memd(Rx++I:circ(Mu))=Rtt void Q6_memd_MP_circ(void** StartAddress, Word32 Mu, Word64 Rtt, void* BaseAddress) 263 Rdd=memd(Rx++#s4:3:circ(Mu)) Word32 Q6 R memd IM circ(void** StartAddress, Word32 Is4 3, Word32 Mu, void* BaseAddress) 197 Rdd=memd(Rx++I:circ(Mu)) Word32 Q6_R_memd_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) 198 memh memh(Rx++#s4:1:circ(Mu))=Rt void Q6 memh IMR circ(void** StartAddress, Word32 Is4 1, Word32 Mu, Word32 Rt, void* BaseAddress) 272 memh(Rx++#s4:1:circ(Mu))=Rt.H void Q6 memh IMRh circ(void** StartAddress, Word32 Is4 1, Word32 Mu, Word32 Rt, void* BaseAd- dress) 272 memh(Rx++I:circ(Mu))=Rt void Q6 memh MR circ(void** StartAddress, Word32 Mu, Word32 Rt, void* BaseAddress) 272 ``` ``` memh(Rx++I:circ(Mu))=Rt.H void Q6 memh MRh circ(void** StartAddress, Word32 Mu, Word32 Rt, void* BaseAddress) 272 Rd=memh(Rx++#s4:1:circ(Mu)) Word32 Q6 R memh IM circ(void** StartAddress, Word32 Is4 1, Word32 Mu, void* BaseAddress) 212 Rd=memh(Rx++I:circ(Mu)) Word32 Q6_R_memh_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) 212 memub Rd=memub(Rx++#s4:0:circ(Mu)) Word32 Q6 R memub IM circ(void** StartAddress, Word32 Is4 0, Word32 Mu, void* BaseAddress) 217 Rd=memub(Rx++I:circ(Mu)) Word32 Q6_R_memub_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) 217 memuh Rd=memuh(Rx++#s4:1:circ(Mu)) Word32 Q6_R_memuh_IM_circ(void** StartAddress, Word32 Is4_1, Word32 Mu, void* BaseAddress) 221 Rd=memuh(Rx++1:circ(Mu)) Word32 Q6 R memuh M circ(void** StartAddress, Word32 Mu, void* BaseAddress) 221 memw memw(Rx++#s4:2:circ(Mu))=Rt void Q6 memw IMR circ(void** StartAddress, Word32 Is4 2, Word32 Mu, Word32 Rt, void* BaseAd- dress) 278 memw(Rx++I:circ(Mu))=Rt void Q6_memw_MR_circ(void** StartAddress, Word32 Mu, Word32 Rt, void* BaseAddress) 278 Rd=memw(Rx++#s4:2:circ(Mu)) Word32 Q6_R_memw_IM_circ(void** StartAddress, Word32 Is4_2, Word32 Mu, void* BaseAddress) 225 Rd=memw(Rx++I:circ(Mu)) Word32 Q6_R_memw_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress) 225 min Rd=min(Rt,Rs) Word32 Q6 R min RR(Word32 Rt, Word32 Rs) 318 Rdd=min(Rtt,Rss) Word64 Q6 P min PP(Word64 Rtt, Word64 Rss) 319 minu Rd=minu(Rt,Rs) UWord32 Q6 R minu RR(Word32 Rt, Word32 Rs) 318 Rdd=minu(Rtt,Rss) UWord64 Q6 P minu PP(Word64 Rtt, Word64 Rss) 319 modwrap Rd=modwrap(Rs.Rt) \label{lem:word32} \textbf{Word32} \ \textbf{Q6\_R\_modwrap\_RR(Word32} \ \textbf{Rs, Word32} \ \textbf{Rt)} \ 320 mpy Rd=mpy(Rs,Rt.H):<<1:rnd:sat Word32 Q6 R mpy RRh s1 rnd sat(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs,Rt.H):<<1:sat Word32 Q6_R_mpy_RRh_s1_sat(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs,Rt.L):<<1:rnd:sat Word32 Q6_R_mpy_RRI_s1_rnd_sat(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs,Rt.L):<<1:sat Word32 Q6 R mpy RRI s1 sat(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs,Rt) Word32 Q6_R_mpy_RR(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs,Rt):<<1 Word32 Q6 R mpy RR s1(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs,Rt):<<1:sat ``` ``` Word32 Q6_R_mpy_RR_s1_sat(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs,Rt):rnd Word32 Q6_R_mpy_RR_rnd(Word32 Rs, Word32 Rt) 470 Rd=mpy(Rs.H,Rt.H) Word32 Q6 R mpy RhRh(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.H):<<1 Word32 Q6 R mpy RhRh s1(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.H):<<1:rnd Word32 Q6 R mpy RhRh s1 rnd(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.H):<<1:rnd:sat Word32 Q6 R mpy RhRh s1 rnd sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.H):<<1:sat Word32 Q6_R_mpy_RhRh_s1_sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.H):rnd Word32 Q6_R_mpy_RhRh_rnd(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.H):rnd:sat Word32 Q6_R_mpy_RhRh_rnd_sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.H):sat Word32 Q6_R_mpy_RhRh_sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L) Word32 Q6 R mpy RhRl(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L):<<1 Word32 Q6_R_mpy_RhRl_s1(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L):<<1:rnd Word32 Q6_R_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L):<<1:rnd:sat Word32 Q6 R mpy RhRl s1 rnd sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L):<<1:sat Word32 Q6 R mpy RhRl s1 sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L):rnd Word32 Q6 R mpy RhRl rnd(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L):rnd:sat Word32 Q6_R_mpy_RhRl_rnd_sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.H,Rt.L):sat Word32 Q6_R_mpy_RhRl_sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H) Word32 Q6_R_mpy_RIRh(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H):<<1 Word32 Q6_R_mpy_RlRh_s1(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H):<<1:rnd Word32 Q6_R_mpy_RIRh_s1_rnd(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H):<<1:rnd:sat Word32 Q6 R mpy RIRh s1 rnd sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H):<<1:sat Word32 Q6_R_mpy_RIRh_s1_sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H):rnd Word32 Q6_R_mpy_RIRh_rnd(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H):rnd:sat Word32 Q6 R mpy RIRh rnd sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.H):sat Word32 Q6_R_mpy_RIRh_sat(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.L) Word32 Q6 R mpy RIRI(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.L):<<1 ``` ``` Word32 Q6_R_mpy_RIRI_s1(Word32 Rs, Word32 Rt) 455 Rd=mpy(Rs.L,Rt.L):<<1:rnd Word32 Q6_R_mpy_RIRI_s1_rnd(Word32 Rs, Word32 Rt) 456 Rd=mpy(Rs.L,Rt.L):<<1:rnd:sat Word32 Q6 R mpy RIRI s1 rnd sat(Word32 Rs, Word32 Rt) 456 Rd=mpy(Rs.L,Rt.L):<<1:sat Word32 Q6 R mpy RIRI s1 sat(Word32 Rs, Word32 Rt) 456 Rd=mpy(Rs.L,Rt.L):rnd Word32 Q6 R mpy RIRI rnd(Word32 Rs, Word32 Rt) 456 Rd=mpy(Rs.L,Rt.L):rnd:sat Word32 Q6_R_mpy_RIRI_rnd_sat(Word32 Rs, Word32 Rt) 456 Rd=mpy(Rs.L,Rt.L):sat Word32 Q6_R_mpy_RIRI_sat(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs,Rt) Word64 Q6 P mpy RR(Word32 Rs, Word32 Rt) 471 Rdd=mpy(Rs.H,Rt.H) Word64 Q6_P_mpy_RhRh(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.H,Rt.H):<<1 Word64 Q6_P_mpy_RhRh_s1(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.H,Rt.H):<<1:rnd Word64 Q6 P mpy RhRh s1 rnd(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.H,Rt.H):rnd Word64 Q6_P_mpy_RhRh_rnd(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.H,Rt.L) Word64 Q6_P_mpy_RhRl(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.H,Rt.L):<<1 Word64 Q6 P mpy RhRl s1(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.H,Rt.L):<<1:rnd Word64 Q6_P_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.H,Rt.L):rnd Word64 Q6 P mpy RhRl rnd(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.H) Word64 Q6 P mpy RIRh(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.H):<<1 Word64 Q6_P_mpy_RIRh_s1(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.H):<<1:rnd Word64 Q6_P_mpy_RIRh_s1_rnd(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.H):rnd Word64 Q6_P_mpy_RIRh_rnd(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.L) Word64 Q6_P_mpy_RIRI(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.L):<<1 Word64 Q6 P mpy RIRI s1(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.L):<<1:rnd Word64 Q6_P_mpy_RIRI_s1_rnd(Word32 Rs, Word32 Rt) 456 Rdd=mpy(Rs.L,Rt.L):rnd Word64 Q6_P_mpy_RIRI_rnd(Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs,Rt):<<1:sat Word32 Q6 R mpyacc RR s1 sat(Word32 Rx, Word32 Rs, Word32 Rt) 470 Rx+=mpy(Rs.H,Rt.H) Word32 Q6 R mpyacc RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs.H,Rt.H):<<1 Word32 Q6 R mpyacc RhRh s1(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs.H,Rt.H):<<1:sat ``` ``` Rx+=mpy(Rs.H,Rt.H):sat Word32 Q6_R_mpyacc_RhRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpv(Rs.H.Rt.L) Word32 Q6 R mpyacc RhRI(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs.H,Rt.L):<<1 Word32 Q6 R mpyacc RhRl s1(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs.H,Rt.L):<<1:sat Word32 Q6 R mpyacc RhRl s1 sat(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs.H,Rt.L):sat Word32 Q6 R mpyacc RhRl sat(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs.L,Rt.H) Word32 Q6_R_mpyacc_RIRh(Word32 Rx, Word32 Rs, Word32 Rt) 456 Rx+=mpy(Rs.L,Rt.H):<<1 Word32 Q6 R mpyacc RIRh s1(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx+=mpy(Rs.L,Rt.H):<<1:sat Word32 Q6_R_mpyacc_RlRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx+=mpy(Rs.L,Rt.H):sat Word32 Q6_R_mpyacc_RlRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx+=mpy(Rs.L,Rt.L) Word32 Q6 R mpyacc RIRI(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx+=mpy(Rs.L,Rt.L):<<1 Word32 Q6_R_mpyacc_RIRI_s1(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx+=mpy(Rs.L,Rt.L):<<1:sat Word32 Q6_R_mpyacc_RIRI_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx+=mpy(Rs.L,Rt.L):sat Word32 Q6 R mpyacc RIRI sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs,Rt):<<1:sat Word32 Q6 R mpynac RR s1 sat(Word32 Rx, Word32 Rs, Word32 Rt) 470 Rx-=mpy(Rs.H,Rt.H) Word32 Q6 R mpynac RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.H,Rt.H):<<1 Word32 Q6 R mpynac RhRh s1(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.H,Rt.H):<<1:sat Word32 Q6_R_mpynac_RhRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.H,Rt.H):sat Word32 Q6 R mpynac RhRh sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx=mpy(Rs.H,Rt.L) Word32 Q6_R_mpynac_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.H,Rt.L):<<1 Word32 Q6_R_mpynac_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.H,Rt.L):<<1:sat Word32 Q6 R mpynac RhRl s1 sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.H,Rt.L):sat Word32 Q6_R_mpynac_RhRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.L,Rt.H) Word32 Q6_R_mpynac_RIRh(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.L,Rt.H):<<1 Word32 Q6 R mpynac RlRh s1(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.L,Rt.H):<<1:sat Word32 Q6_R_mpynac_RlRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.L,Rt.H):sat Word32 Q6 R mpynac RIRh sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.L,Rt.L) ``` Word32 Q6\_R\_mpyacc\_RhRh\_s1\_sat(Word32 Rx, Word32 Rs, Word32 Rt) 456 ``` Word32 Q6 R mpynac RIRI(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.L,Rt.L):<<1 Word32 Q6_R_mpynac_RIRI_s1(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpv(Rs.L.Rt.L):<<1:sat Word32 Q6 R mpynac RIRI s1 sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rx-=mpy(Rs.L,Rt.L):sat Word32 Q6 R mpynac RIRI sat(Word32 Rx, Word32 Rs, Word32 Rt) 457 Rxx+=mpy(Rs,Rt) Word64 Q6 P mpyacc RR(Word64 Rxx, Word32 Rs, Word32 Rt) 471 Rxx+=mpy(Rs.H,Rt.H) Word64 Q6 P mpyacc RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx+=mpy(Rs.H,Rt.H):<<1 Word64 Q6_P_mpyacc_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx+=mpy(Rs.H,Rt.L) Word64 Q6 P mpyacc RhRI(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx+=mpy(Rs.H,Rt.L):<<1 Word64 Q6_P_mpyacc_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx+=mpy(Rs.L,Rt.H) Word64 Q6_P_mpyacc_RIRh(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx+=mpy(Rs.L,Rt.H):<<1 Word64 Q6 P mpyacc RIRh s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx+=mpv(Rs.L.Rt.L) Word64 Q6 P mpyacc RIRI(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx+=mpy(Rs.L,Rt.L):<<1 Word64 Q6_P_mpyacc_RIRI_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs,Rt) Word64 Q6 P mpynac RR(Word64 Rxx, Word32 Rs, Word32 Rt) 471 Rxx-=mpy(Rs.H,Rt.H) Word64 Q6 P mpynac RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs.H,Rt.H):<<1 Word64 Q6 P mpynac RhRh s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs.H,Rt.L) Word64 Q6 P mpynac RhRI(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs.H,Rt.L):<<1 Word64 Q6_P_mpynac_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs.L,Rt.H) Word64 Q6 P mpynac RIRh(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs.L,Rt.H):<<1 Word64 Q6_P_mpynac_RIRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs.L,Rt.L) Word64 Q6_P_mpynac_RIRI(Word64 Rxx, Word32 Rs, Word32 Rt) 458 Rxx-=mpy(Rs.L,Rt.L):<<1 Word64 Q6 P mpynac RIRI s1(Word64 Rxx, Word32 Rs, Word32 Rt) 458 mpyi Rd=mpyi(Rs,#m9) Word32 Q6_R_mpyi_RI(Word32 Rs, Word32 Im9) 444 Rd=mpyi(Rs,Rt) Word32 Q6 R mpyi RR(Word32 Rs, Word32 Rt) 444 Rx+=mpyi(Rs,#u8) Word32 Q6_R_mpyiacc_RI(Word32 Rx, Word32 Rs, Word32 lu8) 444 Rx+=mpyi(Rs,Rt) Word32 Q6_R_mpyiacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 444 Rx-=mpyi(Rs,#u8) Word32 Q6 R mpyinac RI(Word32 Rx, Word32 Rs, Word32 lu8) 444 ``` ``` Rx-=mpyi(Rs,Rt) Word32 Q6 R mpyinac RR(Word32 Rx, Word32 Rs, Word32 Rt) 444 mpysu Rd=mpysu(Rs,Rt) Word32 Q6 R mpysu RR(Word32 Rs, Word32 Rt) 470 mpyu Rd=mpyu(Rs,Rt) UWord32 Q6 R mpyu RR(Word32 Rs, Word32 Rt) 470 Rd=mpyu(Rs.H,Rt.H) UWord32 Q6_R_mpyu_RhRh(Word32 Rs, Word32 Rt) 462 Rd=mpyu(Rs.H,Rt.H):<<1 UWord32 Q6_R_mpyu_RhRh_s1(Word32 Rs, Word32 Rt) 462 Rd=mpyu(Rs.H,Rt.L) UWord32 Q6 R mpyu RhRI(Word32 Rs, Word32 Rt) 462 Rd=mpvu(Rs.H,Rt.L):<<1 UWord32 Q6 R mpyu RhRl s1(Word32 Rs, Word32 Rt) 462 Rd=mpyu(Rs.L,Rt.H) UWord32 Q6 R mpyu RlRh(Word32 Rs, Word32 Rt) 462 Rd=mpyu(Rs.L,Rt.H):<<1 UWord32 Q6 R mpyu RIRh s1(Word32 Rs, Word32 Rt) 462 Rd=mpyu(Rs.L,Rt.L) UWord32 Q6 R mpyu RIRI(Word32 Rs, Word32 Rt) 462 Rd=mpyu(Rs.L,Rt.L):<<1 UWord32 Q6_R_mpyu_RIRI_s1(Word32 Rs, Word32 Rt) 462 Rdd=mpyu(Rs,Rt) UWord64 Q6 P mpyu RR(Word32 Rs, Word32 Rt) 471 Rdd=mpyu(Rs.H,Rt.H) UWord64 Q6_P_mpyu_RhRh(Word32 Rs, Word32 Rt) 462 Rdd=mpyu(Rs.H,Rt.H):<<1 UWord64 Q6 P mpyu RhRh s1(Word32 Rs, Word32 Rt) 462 Rdd=mpyu(Rs.H,Rt.L) UWord64 Q6 P mpyu RhRI(Word32 Rs, Word32 Rt) 462 Rdd=mpyu(Rs.H,Rt.L):<<1 UWord64 Q6 P mpyu RhRl s1(Word32 Rs, Word32 Rt) 462 Rdd=mpyu(Rs.L,Rt.H) UWord64 Q6 P mpyu RIRh(Word32 Rs, Word32 Rt) 462 Rdd=mpvu(Rs.L,Rt.H):<<1 UWord64 Q6 P mpyu RIRh s1(Word32 Rs, Word32 Rt) 462 Rdd=mpyu(Rs.L,Rt.L) UWord64 Q6 P mpyu RIRI(Word32 Rs, Word32 Rt) 462 Rdd=mpyu(Rs.L,Rt.L):<<1 UWord64 Q6 P mpyu RIRI s1(Word32 Rs, Word32 Rt) 462 Rx+=mpyu(Rs.H,Rt.H) Word32 Q6 R mpyuacc RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx+=mpyu(Rs.H,Rt.H):<<1 Word32 Q6 R mpyuacc RhRh s1(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx+=mpyu(Rs.H,Rt.L) Word32 Q6 R mpyuacc RhRI(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx+=mpyu(Rs.H,Rt.L):<<1 Word32 Q6 R mpyuacc RhRl s1(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx+=mpyu(Rs.L,Rt.H) Word32 Q6 R mpyuacc RIRh(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx+=mpyu(Rs.L,Rt.H):<<1 Word32 Q6 R mpyuacc RIRh s1(Word32 Rx, Word32 Rs, Word32 Rt) 462 ``` Rx+=mpyu(Rs.L,Rt.L) ``` Word32 Q6_R_mpyuacc_RIRI(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx+=mpyu(Rs.L,Rt.L):<<1 Word32 Q6 R mpyuacc RIRI s1(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx-=mpyu(Rs.H,Rt.H) Word32 Q6_R_mpyunac_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx-=mpyu(Rs.H,Rt.H):<<1 Word32 Q6_R_mpyunac_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx-=mpyu(Rs.H,Rt.L) Word32 Q6 R mpyunac RhRI(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx-=mpyu(Rs.H,Rt.L):<<1 Word32 Q6_R_mpyunac_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx-=mpyu(Rs.L,Rt.H) Word32 Q6_R_mpyunac_RIRh(Word32 Rx, Word32 Rs, Word32 Rt) 462 Rx-=mpvu(Rs.L.Rt.H):<<1 Word32 Q6 R mpyunac RIRh s1(Word32 Rx, Word32 Rs, Word32 Rt) 463 Rx-=mpyu(Rs.L,Rt.L) Word32 Q6 R mpyunac RIRI(Word32 Rx, Word32 Rs, Word32 Rt) 463 Rx-=mpyu(Rs.L,Rt.L):<<1 Word32 Q6 R mpyunac RIRI s1(Word32 Rx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs,Rt) Word64 Q6 P mpyuacc RR(Word64 Rxx, Word32 Rs, Word32 Rt) 471 Rxx+=mpyu(Rs.H,Rt.H) Word64 Q6_P_mpyuacc_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs.H,Rt.H):<<1 Word64 Q6 P mpyuacc RhRh s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs.H,Rt.L) Word64 Q6_P_mpyuacc_RhRI(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs.H,Rt.L):<<1 Word64 Q6_P_mpyuacc_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs.L,Rt.H) Word64 Q6 P mpyuacc RIRh(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs.L,Rt.H):<<1 Word64 Q6_P_mpyuacc_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs.L,Rt.L) Word64 Q6_P_mpyuacc_RIRI(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx+=mpyu(Rs.L,Rt.L):<<1 Word64 Q6 P mpyuacc RIRI s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx-=mpyu(Rs,Rt) Word64 Q6 P mpyunac RR(Word64 Rxx, Word32 Rs, Word32 Rt) 471 Rxx-=mpyu(Rs.H,Rt.H) Word64 Q6 P mpyunac RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx-=mpyu(Rs.H,Rt.H):<<1 Word64 Q6 P mpyunac RhRh s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx-=mpyu(Rs.H,Rt.L) Word64 Q6_P_mpyunac_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx-=mpyu(Rs.H,Rt.L):<<1 Word64 Q6 P mpyunac RhRl s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx-=mpyu(Rs.L,Rt.H) Word64 Q6_P_mpyunac_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx-=mpyu(Rs.L,Rt.H):<<1 Word64 Q6_P_mpyunac_RIRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 Rxx-=mpyu(Rs.L,Rt.L) Word64 Q6 P mpyunac RIRI(Word64 Rxx, Word32 Rs, Word32 Rt) 463 ``` ``` Rxx-=mpyu(Rs.L,Rt.L):<<1 Word64 Q6_P_mpyunac_RIRI_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 463 mpyui Rd=mpyui(Rs,Rt) Word32 Q6 R mpyui RR(Word32 Rs, Word32 Rt) 444 mux Rd=mux(Pu,#s8,#S8) Word32 Q6 R mux pll(Byte Pu, Word32 Is8, Word32 IS8) 154 Rd=mux(Pu,#s8,Rs) Word32 Q6_R_mux_pIR(Byte Pu, Word32 Is8, Word32 Rs) 154 Rd=mux(Pu,Rs,#s8) Word32 Q6_R_mux_pRI(Byte Pu, Word32 Rs, Word32 Is8) 154 Rd=mux(Pu,Rs,Rt) Word32 Q6 R mux pRR(Byte Pu, Word32 Rs, Word32 Rt) 154 Ν neg Rd=neg(Rs) Word32 Q6 R neg R(Word32 Rs) 141 Rd=neg(Rs):sat Word32 Q6_R_neg_R_sat(Word32 Rs) 321 Rdd=neg(Rss) Word64 Q6 P neg P(Word64 Rss) 321 no mnemonic Pd=Ps Byte Q6 p equals p(Byte Ps) 180 Pd=Rs Byte Q6_p_equals_R(Word32 Rs) 526 Rd=#s16 Word32 Q6_R_equals_I(Word32 Is16) 145 Rd=Ps Word32 Q6_R_equals_p(Byte Ps) 526 Rd=Rs Word32 Q6_R_equals_R(Word32 Rs) 147 Rdd=#s8 Word64 Q6_P_equals_I(Word32 Is8) 145 Rdd=Rss Word64 Q6_P_equals_P(Word64 Rss) 147 Rx.H=#u16 Word32 Q6_Rh_equals_I(Word32 Rx, Word32 Iu16) 145 Rx.L=#u16 Word32 Q6_Rl_equals_I(Word32 Rx, Word32 Iu16) 145 normamt Rd=normamt(Rs) Word32 Q6_R_normamt_R(Word32 Rs) 372 Rd=normamt(Rss) Word32 Q6_R_normamt_P(Word64 Rss) 372 not Pd=not(Ps) Byte Q6_p_not_p(Byte Ps) 180 Rd=not(Rs) Word32 Q6 R not R(Word32 Rs) 139 Rdd=not(Rss) Word64 Q6 P not P(Word64 Rss) 312 ``` ``` 0 or Pd=and(Ps,or(Pt,!Pu)) Byte Q6_p_and_or_ppnp(Byte Ps, Byte Pt, Byte Pu) 180 Pd=and(Ps,or(Pt,Pu)) Byte Q6 p and or ppp(Byte Ps, Byte Pt, Byte Pu) 180 Pd=or(Ps,and(Pt,!Pu)) Byte Q6_p_or_and_ppnp(Byte Ps, Byte Pt, Byte Pu) 180 Pd=or(Ps,and(Pt,Pu)) Byte Q6_p_or_and_ppp(Byte Ps, Byte Pt, Byte Pu) 180 Pd=or(Ps,or(Pt,!Pu)) Byte Q6 p or or ppnp(Byte Ps, Byte Pt, Byte Pu) 181 Pd=or(Ps,or(Pt,Pu)) Byte Q6_p_or_or_ppp(Byte Ps, Byte Pt, Byte Pu) 181 Pd=or(Pt,!Ps) Byte Q6_p_or_pnp(Byte Pt, Byte Ps) 181 Pd=or(Pt,Ps) Byte Q6_p_or_pp(Byte Pt, Byte Ps) 181 Rd=or(Rs,#s10) Word32 Q6 R or RI(Word32 Rs, Word32 Is10) 139 Rd=or(Rs,Rt) Word32 Q6_R_or_RR(Word32 Rs, Word32 Rt) 139 Rd=or(Rt,~Rs) Word32 Q6_R_or_RnR(Word32 Rt, Word32 Rs) 139 Rdd=or(Rss,Rtt) Word64 Q6_P_or_PP(Word64 Rss, Word64 Rtt) 312 Rdd=or(Rtt,~Rss) Word64 Q6 P or PnP(Word64 Rtt, Word64 Rss) 312 Rx&=or(Rs.Rt) Word32 Q6 R orand RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx^=or(Rs,Rt) Word32 Q6_R_orxacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx=or(Ru,and(Rx,#s10)) Word32 Q6 R or and RRI(Word32 Ru, Word32 Rx, Word32 Is10) 314 Rx = or(Rs, #s10) Word32 Q6 R oror RI(Word32 Rx, Word32 Rs, Word32 Is10) 314 Rx | = or(Rs,Rt) Word32 Q6_R_oror_RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Р packhl Rdd=packhl(Rs,Rt) Word64 Q6 P packhl RR(Word32 Rs, Word32 Rt) 157 parity Rd=parity(Rs,Rt) Word32 Q6_R_parity_RR(Word32 Rs, Word32 Rt) 382 Rd=parity(Rss,Rtt) Word32 Q6_R_parity_PP(Word64 Rss, Word64 Rtt) 382 pmpyw Rdd=pmpyw(Rs,Rt) Word64 Q6_P_pmpyw_RR(Word32 Rs, Word32 Rt) 465 Rxx^=pmpvw(Rs,Rt) Word64 Q6 P pmpywxacc RR(Word64 Rxx, Word32 Rs, Word32 Rt) 465 popcount ``` ``` Rd=popcount(Rss) Word32 Q6_R_popcount_P(Word64 Rss) 373 R rol Rd=rol(Rs,#u5) Word32 Q6 R rol RI(Word32 Rs, Word32 lu5) 538 Rdd=rol(Rss,#u6) Word64 Q6_P_rol_PI(Word64 Rss, Word32 lu6) 539 Rx&=rol(Rs,#u5) Word32 Q6_R_roland_RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx^=rol(Rs, #u5) Word32 Q6 R rolxacc RI(Word32 Rx, Word32 Rs, Word32 lu5) 545 Rx+=rol(Rs,#u5) Word32 Q6 R rolacc RI(Word32 Rx, Word32 Rs, Word32 lu5) 541 Rx-=rol(Rs,#u5) Word32 Q6_R_rolnac_RI(Word32 Rx, Word32 Rs, Word32 lu5) 541 Rx = rol(Rs, #u5) Word32 Q6 R rolor RI(Word32 Rx, Word32 Rs, Word32 Iu5) 545 Rxx&=rol(Rss,#u6) Word64 Q6_P_roland_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 546 Rxx^=rol(Rss,#u6) Word64 Q6_P_rolxacc_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 546 Rxx+=rol(Rss,#u6) Word64 Q6_P_rolacc_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 541 Rxx-=rol(Rss,#u6) Word64 Q6_P_rolnac_PI(Word64 Rxx, Word64 Rss, Word32 lu6) 541 Rxx|=rol(Rss,#u6) Word64 Q6 P rolor PI(Word64 Rxx, Word64 Rss, Word32 lu6) 546 round Rd=round(Rs,#u5) Word32 Q6_R_round_RI(Word32 Rs, Word32 lu5) 323 Rd=round(Rs,#u5):sat Word32 Q6 R round RI sat(Word32 Rs, Word32 lu5) 323 Rd=round(Rs,Rt) Word32 Q6 R round RR(Word32 Rs, Word32 Rt) 323 Rd=round(Rs,Rt):sat Word32 Q6_R_round_RR_sat(Word32 Rs, Word32 Rt) 323 Rd=round(Rss):sat Word32 Q6 R round P sat(Word64 Rss) 323 S sat Rd=sat(Rss) Word32 Q6 R sat P(Word64 Rss) 496 satb Rd=satb(Rs) Word32 Q6 R satb R(Word32 Rs) 496 sath Rd=sath(Rs) Word32 Q6_R_sath_R(Word32 Rs) 496 satub Rd=satub(Rs) Word32 Q6_R_satub_R(Word32 Rs) 496 satuh ``` ``` Rd=satuh(Rs) Word32 Q6_R_satuh_R(Word32 Rs) 496 setbit Rd=setbit(Rs,#u5) Word32 Q6_R_setbit_RI(Word32 Rs, Word32 lu5) 384 Rd=setbit(Rs,Rt) Word32 Q6_R_setbit_RR(Word32 Rs, Word32 Rt) 384 sfadd Rd=sfadd(Rs,Rt) Word32 Q6_R_sfadd_RR(Word32 Rs, Word32 Rt) 422 sfclass Pd=sfclass(Rs,#u5) Byte Q6_p_sfclass_RI(Word32 Rs, Word32 Iu5) 423 sfcmp.eq Pd=sfcmp.eq(Rs,Rt) Byte Q6_p_sfcmp_eq_RR(Word32 Rs, Word32 Rt) 424 sfcmp.ge Pd=sfcmp.ge(Rs,Rt) Byte Q6_p_sfcmp_ge_RR(Word32 Rs, Word32 Rt) 424 sfcmp.gt Pd=sfcmp.gt(Rs,Rt) Byte Q6_p_sfcmp_gt_RR(Word32 Rs, Word32 Rt) 424 sfcmp.uo Pd=sfcmp.uo(Rs,Rt) Byte Q6_p_sfcmp_uo_RR(Word32 Rs, Word32 Rt) 424 sffixupd Rd=sffixupd(Rs,Rt) Word32 Q6 R sffixupd RR(Word32 Rs, Word32 Rt) 431 sffixupn Rd=sffixupn(Rs,Rt) Word32 Q6_R_sffixupn_RR(Word32 Rs, Word32 Rt) 431 sffixupr Rd=sffixupr(Rs) Word32 Q6_R_sffixupr_R(Word32 Rs) 431 sfmake Rd=sfmake(#u10):neg Word32 Q6 R sfmake I neg(Word32 lu10) 437 Rd=sfmake(#u10):pos Word32 Q6_R_sfmake_I_pos(Word32 Iu10) 437 sfmax Rd=sfmax(Rs,Rt) Word32 Q6 R sfmax RR(Word32 Rs, Word32 Rt) 438 sfmin Rd=sfmin(Rs,Rt) Word32 Q6_R_sfmin_RR(Word32 Rs, Word32 Rt) 439 sfmpy Rd=sfmpy(Rs,Rt) Word32 Q6_R_sfmpy_RR(Word32 Rs, Word32 Rt) 440 Rx+=sfmpy(Rs,Rt,Pu):scale Word32\ Q6\_R\_sfmpyacc\_RRp\_scale(Word32\ Rx,\ Word32\ Rs,\ Word32\ Rt,\ Byte\ Pu)\ 433 Rx += sfmpy(Rs,Rt) Word32 Q6 R sfmpyacc RR(Word32 Rx, Word32 Rs, Word32 Rt) 432 Rx+=sfmpy(Rs,Rt):lib ``` ``` Word32 Q6_R_sfmpyacc_RR_lib(Word32 Rx, Word32 Rs, Word32 Rt) 435 Rx-=sfmpy(Rs,Rt) Word32 Q6_R_sfmpynac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 432 Rx-=sfmpy(Rs,Rt):lib Word32 Q6 R sfmpynac RR lib(Word32 Rx, Word32 Rs, Word32 Rt) 435 sfsub Rd=sfsub(Rs,Rt) Word32 Q6_R_sfsub_RR(Word32 Rs, Word32 Rt) 442 shuffeb Rdd=shuffeb(Rss,Rtt) Word64 Q6 P shuffeb PP(Word64 Rss, Word64 Rtt) 507 shuffeh Rdd=shuffeh(Rss.Rtt) Word64 Q6 P shuffeh PP(Word64 Rss, Word64 Rtt) 507 shuffob Rdd=shuffob(Rtt,Rss) Word64 Q6 P shuffob PP(Word64 Rtt, Word64 Rss) 507 shuffoh Rdd=shuffoh(Rtt,Rss) Word64 Q6 P shuffoh PP(Word64 Rtt, Word64 Rss) 507 sub Rd=add(Rs,sub(#s6,Ru)) Word32 Q6 R add sub RIR(Word32 Rs, Word32 Is6, Word32 Ru) 304 Rd=sub(#s10,Rs) Word32 Q6 R sub IR(Word32 Is10, Word32 Rs) 143 Rd=sub(Rt,Rs) Word32 Q6 R sub RR(Word32 Rt, Word32 Rs) 143 Rd=sub(Rt,Rs):sat Word32 Q6_R_sub_RR_sat(Word32 Rt, Word32 Rs) 143 Rd=sub(Rt.H,Rs.H):<<16 Word32 Q6_R_sub_RhRh_s16(Word32 Rt, Word32 Rs) 327 Rd=sub(Rt.H,Rs.H):sat:<<16 Word32 Q6_R_sub_RhRh_sat_s16(Word32 Rt, Word32 Rs) 327 Rd=sub(Rt.H,Rs.L):<<16 Word32 Q6_R_sub_RhRl_s16(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.H,Rs.L):sat:<<16 Word32 Q6 R sub RhRl sat s16(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.H) Word32 Q6_R_sub_RIRh(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.H):<<16 Word32 Q6_R_sub_RlRh_s16(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.H):sat Word32 Q6 R sub RIRh sat(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.H):sat:<<16 Word32 Q6_R_sub_RIRh_sat_s16(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.L) Word32 Q6 R_sub_RIRI(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.L):<<16 Word32 Q6 R sub RIRI s16(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.L):sat Word32 Q6_R_sub_RIRI_sat(Word32 Rt, Word32 Rs) 328 Rd=sub(Rt.L,Rs.L):sat:<<16 Word32 Q6 R sub RIRI sat s16(Word32 Rt, Word32 Rs) 328 ``` ``` Rdd=sub(Rtt,Rss) Word64 Q6 P sub PP(Word64 Rtt, Word64 Rss) 325 Rx+=sub(Rt,Rs) Word32 Q6 R subacc RR(Word32 Rx, Word32 Rt, Word32 Rs) 326 swiz Rd=swiz(Rs) Word32 Q6_R_swiz_R(Word32 Rs) 498 sxtb Rd=sxtb(Rs) Word32 Q6_R_sxtb_R(Word32 Rs) 144 sxth Rd=sxth(Rs) Word32 Q6_R_sxth_R(Word32 Rs) 144 sxtw Rdd=sxtw(Rs) Word64 Q6_P_sxtw_R(Word32 Rs) 329 Т tableidxb Rx=tableidxb(Rs,#u4,#U5) Word32 Q6 R tableidxb RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) 389 tableidxd Rx=tableidxd(Rs,#u4,#U5) Word32\ Q6\_R\_tableidxd\_RII(Word32\ Rx,\ Word32\ Rs,\ Word32\ Iu4,\ Word32\ IU5)\ 389 tableidxh Rx=tableidxh(Rs,#u4,#U5) Word32 Q6_R_tableidxh_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) 389 tableidxw Rx=tableidxw(Rs,#u4,#U5) Word32 Q6_R_tableidxw_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) 389 tlbmatch Pd=tlbmatch(Rss,Rt) Byte Q6 p tlbmatch PR(Word64 Rss, Word32 Rt) 525 togglebit Rd=togglebit(Rs,#u5) Word32 Q6_R_togglebit_RI(Word32 Rs, Word32 lu5) 384 Rd=togglebit(Rs,Rt) Word32 Q6 R togglebit RR(Word32 Rs, Word32 Rt) 384 tstbit Pd=!tstbit(Rs,#u5) Byte Q6_p_not_tstbit_RI(Word32 Rs, Word32 Iu5) 527 Pd=!tstbit(Rs,Rt) Byte Q6 p not tstbit RR(Word32 Rs, Word32 Rt) 527 Pd=tstbit(Rs,#u5) Byte Q6_p_tstbit_RI(Word32 Rs, Word32 lu5) 527 Pd=tstbit(Rs,Rt) Byte Q6 p tstbit RR(Word32 Rs, Word32 Rt) 527 V vabsdiffb Rdd=vabsdiffb(Rtt.Rss) Word64 Q6_P_vabsdiffb_PP(Word64 Rtt, Word64 Rss) 332 vabsdiffh Rdd=vabsdiffh(Rtt,Rss) ``` ``` Word64 Q6_P_vabsdiffh_PP(Word64 Rtt, Word64 Rss) 333 vabsdiffub Rdd=vabsdiffub(Rtt,Rss) Word64 Q6_P_vabsdiffub_PP(Word64 Rtt, Word64 Rss) 332 vabsdiffw Rdd=vabsdiffw(Rtt,Rss) Word64 Q6_P_vabsdiffw_PP(Word64 Rtt, Word64 Rss) 334 vabsh Rdd=vabsh(Rss) Word64 Q6_P_vabsh_P(Word64 Rss) 330 Rdd=vabsh(Rss):sat Word64 Q6 P vabsh P sat(Word64 Rss) 330 vabsw Rdd=vabsw(Rss) Word64 Q6_P_vabsw_P(Word64 Rss) 331 Rdd=vabsw(Rss):sat Word64 Q6 P vabsw P sat(Word64 Rss) 331 vaddb Rdd=vaddb(Rss,Rtt) Word64 Q6 P vaddb PP(Word64 Rss, Word64 Rtt) 344 vaddh Rd=vaddh(Rs,Rt) Word32 Q6 R vaddh RR(Word32 Rs, Word32 Rt) 148 Rd=vaddh(Rs,Rt):sat Word32 Q6 R vaddh RR sat(Word32 Rs, Word32 Rt) 148 Rdd=vaddh(Rss,Rtt) Word64 Q6 P vaddh PP(Word64 Rss, Word64 Rtt) 338 Rdd=vaddh(Rss,Rtt):sat Word64 Q6_P_vaddh_PP_sat(Word64 Rss, Word64 Rtt) 338 vaddhub Rd=vaddhub(Rss,Rtt):sat Word32 Q6_R_vaddhub_PP_sat(Word64 Rss, Word64 Rtt) 339 vaddub Rdd=vaddub(Rss,Rtt) Word64 Q6 P vaddub PP(Word64 Rss, Word64 Rtt) 344 Rdd=vaddub(Rss,Rtt):sat Word64 Q6_P_vaddub_PP_sat(Word64 Rss, Word64 Rtt) 344 vadduh Rd=vadduh(Rs,Rt):sat Word32\ Q6\_R\_vadduh\_RR\_sat(Word32\ Rs,\ Word32\ Rt)\ 148 Rdd=vadduh(Rss,Rtt):sat Word64 Q6_P_vadduh_PP_sat(Word64 Rss, Word64 Rtt) 338 vaddw Rdd=vaddw(Rss,Rtt) Word64 Q6 P vaddw PP(Word64 Rss, Word64 Rtt) 345 Rdd=vaddw(Rss,Rtt):sat Word64 Q6_P_vaddw_PP_sat(Word64 Rss, Word64 Rtt) 345 valignb Rdd=valignb(Rtt,Rss,#u3) Word64 Q6 P valignb PPI(Word64 Rtt, Word64 Rss, Word32 lu3) 499 Rdd=valignb(Rtt,Rss,Pu) Word64 Q6_P_valignb_PPp(Word64 Rtt, Word64 Rss, Byte Pu) 499 vaslh ``` ``` Rdd=vaslh(Rss,#u4) Word64 Q6_P_vaslh_PI(Word64 Rss, Word32 Iu4) 560 Rdd=vaslh(Rss,Rt) Word64 Q6 P vaslh PR(Word64 Rss, Word32 Rt) 565 vaslw Rdd=vaslw(Rss,#u5) Word64 Q6_P_vaslw_PI(Word64 Rss, Word32 lu5) 566 Rdd=vaslw(Rss,Rt) Word64 Q6 P vaslw PR(Word64 Rss, Word32 Rt) 568 vasrh Rdd=vasrh(Rss,#u4) Word64 Q6_P_vasrh_PI(Word64 Rss, Word32 Iu4) 560 Rdd=vasrh(Rss,#u4):rnd Word64 Q6 P vasrh PI rnd(Word64 Rss, Word32 lu4) 561 Rdd=vasrh(Rss,Rt) Word64 Q6_P_vasrh_PR(Word64 Rss, Word32 Rt) 565 vasrhub Rd=vasrhub(Rss,#u4):rnd:sat Word32 Q6_R_vasrhub_PI_rnd_sat(Word64 Rss, Word32 lu4) 563 Rd=vasrhub(Rss,#u4):sat Word32 Q6_R_vasrhub_PI_sat(Word64 Rss, Word32 Iu4) 563 vasrw Rd=vasrw(Rss,#u5) Word32 Q6_R_vasrw_PI(Word64 Rss, Word32 lu5) 569 Rd=vasrw(Rss,Rt) Word32 Q6_R_vasrw_PR(Word64 Rss, Word32 Rt) 569 Rdd=vasrw(Rss,#u5) Word64 Q6 P vasrw PI(Word64 Rss, Word32 lu5) 566 Rdd=vasrw(Rss,Rt) Word64 Q6 P vasrw PR(Word64 Rss, Word32 Rt) 568 vavgh Rd=vavgh(Rs,Rt) Word32 Q6 R vavgh RR(Word32 Rs, Word32 Rt) 149 Rd=vavgh(Rs,Rt):rnd Word32 Q6_R_vavgh_RR_rnd(Word32 Rs, Word32 Rt) 149 Rdd=vavgh(Rss,Rtt) Word64 Q6 P vavgh PP(Word64 Rss, Word64 Rtt) 347 Rdd=vavgh(Rss,Rtt):crnd Word64 Q6_P_vavgh_PP_crnd(Word64 Rss, Word64 Rtt) 347 Rdd=vavgh(Rss,Rtt):rnd Word64 Q6_P_vavgh_PP_rnd(Word64 Rss, Word64 Rtt) 347 vavgub Rdd=vavgub(Rss,Rtt) Word64 Q6_P_vavgub_PP(Word64 Rss, Word64 Rtt) 348 Rdd=vavgub(Rss,Rtt):rnd Word64 Q6_P_vavgub_PP_rnd(Word64 Rss, Word64 Rtt) 348 vavguh Rdd=vavguh(Rss,Rtt) Word64 Q6_P_vavguh_PP(Word64 Rss, Word64 Rtt) 347 Rdd=vavguh(Rss,Rtt):rnd Word64 Q6_P_vavguh_PP_rnd(Word64 Rss, Word64 Rtt) 347 vavguw Rdd=vavguw(Rss,Rtt) ``` ``` Word64 Q6 P vavguw PP(Word64 Rss, Word64 Rtt) 350 Rdd=vavguw(Rss,Rtt):rnd Word64 Q6_P_vavguw_PP_rnd(Word64 Rss, Word64 Rtt) 350 vavgw Rdd=vavgw(Rss,Rtt) Word64 Q6_P_vavgw_PP(Word64 Rss, Word64 Rtt) 350 Rdd=vavgw(Rss,Rtt):crnd Word64 Q6_P_vavgw_PP_crnd(Word64 Rss, Word64 Rtt) 350 Rdd=vavgw(Rss,Rtt):rnd Word64 Q6 P vavgw PP rnd(Word64 Rss, Word64 Rtt) 350 vclip Rdd=vclip(Rss,#u5) Word64 Q6 P vclip PI(Word64 Rss, Word32 lu5) 351 vcmpb.eq Pd=!any8(vcmpb.eq(Rss,Rtt)) Byte Q6_p_not_any8_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) 530 Pd=any8(vcmpb.eq(Rss,Rtt)) Byte Q6 p any8 vcmpb eq PP(Word64 Rss, Word64 Rtt) 530 Pd=vcmpb.eq(Rss,#u8) Byte Q6_p_vcmpb_eq_PI(Word64 Rss, Word32 lu8) 532 Pd=vcmpb.eq(Rss,Rtt) Byte Q6_p_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) 532 vcmpb.gt Pd=vcmpb.gt(Rss,#s8) Byte Q6_p_vcmpb_gt_PI(Word64 Rss, Word32 Is8) 532 Pd=vcmpb.gt(Rss,Rtt) Byte Q6_p_vcmpb_gt_PP(Word64 Rss, Word64 Rtt) 532 vcmpb.gtu Pd=vcmpb.gtu(Rss,#u7) Byte Q6_p_vcmpb_gtu_PI(Word64 Rss, Word32 Iu7) 532 Pd=vcmpb.gtu(Rss,Rtt) Byte Q6 p vcmpb gtu PP(Word64 Rss, Word64 Rtt) 532 vcmph.eq Pd=vcmph.eq(Rss,#s8) Byte Q6_p_vcmph_eq_PI(Word64 Rss, Word32 Is8) 529 Pd=vcmph.eq(Rss,Rtt) Byte Q6_p_vcmph_eq_PP(Word64 Rss, Word64 Rtt) 529 vcmph.gt Pd=vcmph.gt(Rss,#s8) Byte Q6_p_vcmph_gt_PI(Word64 Rss, Word32 Is8) 529 Pd=vcmph.gt(Rss,Rtt) Byte Q6_p_vcmph_gt_PP(Word64 Rss, Word64 Rtt) 529 vcmph.gtu Pd=vcmph.gtu(Rss,#u7) Byte Q6 p vcmph gtu PI(Word64 Rss, Word32 lu7) 529 Pd=vcmph.gtu(Rss,Rtt) Byte Q6 p vcmph gtu PP(Word64 Rss, Word64 Rtt) 529 vcmpw.eq Pd=vcmpw.eq(Rss,#s8) Byte Q6 p vcmpw eq PI(Word64 Rss, Word32 Is8) 534 Pd=vcmpw.eq(Rss,Rtt) Byte Q6 p vcmpw eq PP(Word64 Rss, Word64 Rtt) 534 vcmpw.gt ``` ``` Pd=vcmpw.gt(Rss,#s8) Byte Q6_p_vcmpw_gt_PI(Word64 Rss, Word32 Is8) 534 Pd=vcmpw.gt(Rss,Rtt) Byte Q6_p_vcmpw_gt_PP(Word64 Rss, Word64 Rtt) 534 vcmpw.gtu Pd=vcmpw.gtu(Rss,#u7) Byte Q6_p_vcmpw_gtu_PI(Word64 Rss, Word32 Iu7) 534 Pd=vcmpw.gtu(Rss,Rtt) Byte Q6 p vcmpw gtu PP(Word64 Rss, Word64 Rtt) 534 vcmpyi Rdd=vcmpyi(Rss,Rtt):<<1:sat Word64 Q6 P vcmpyi PP s1 sat(Word64 Rss, Word64 Rtt) 409 Rdd=vcmpvi(Rss,Rtt):sat Word64 Q6 P vcmpyi PP sat(Word64 Rss, Word64 Rtt) 409 Rxx+=vcmpyi(Rss,Rtt):sat Word64 Q6_P_vcmpyiacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 409 vcmpyr Rdd=vcmpyr(Rss,Rtt):<<1:sat Word64 Q6_P_vcmpyr_PP_s1_sat(Word64 Rss, Word64 Rtt) 409 Rdd=vcmpyr(Rss,Rtt):sat Word64 Q6_P_vcmpyr_PP_sat(Word64 Rss, Word64 Rtt) 409 Rxx+=vcmpyr(Rss,Rtt):sat Word64 Q6 P vcmpyracc PP sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 409 vcnegh Rdd=vcnegh(Rss,Rt) Word64 Q6_P_vcnegh_PR(Word64 Rss, Word32 Rt) 352 vconi Rdd=vconj(Rss):sat Word64 Q6 P vconj P sat(Word64 Rss) 411 vcrotate Rdd=vcrotate(Rss,Rt) Word64 Q6 P vcrotate PR(Word64 Rss, Word32 Rt) 413 vdmpy Rd=vdmpy(Rss,Rtt):<<1:rnd:sat Word32 Q6_R_vdmpy_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 476 Rd=vdmpy(Rss,Rtt):rnd:sat Word32 Q6 R vdmpy PP rnd sat(Word64 Rss, Word64 Rtt) 476 Rdd=vdmpy(Rss,Rtt):<<1:sat Word64 Q6_P_vdmpy_PP_s1_sat(Word64 Rss, Word64 Rtt) 474 Rdd=vdmpy(Rss,Rtt):sat Word64 Q6 P vdmpy PP sat(Word64 Rss, Word64 Rtt) 474 Rxx+=vdmpy(Rss,Rtt):<<1:sat Word64 Q6_P_vdmpyacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 474 Rxx+=vdmpy(Rss,Rtt):sat Word64 Q6_P_vdmpyacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 474 vdmpybsu Rdd=vdmpybsu(Rss,Rtt):sat Word64 Q6_P_vdmpybsu_PP_sat(Word64 Rss, Word64 Rtt) 480 Rxx+=vdmpybsu(Rss,Rtt):sat Word64 Q6_P_vdmpybsuacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 480 vitpack Rd=vitpack(Ps,Pt) Word32 Q6_R_vitpack_pp(Byte Ps, Byte Pt) 535 ``` ``` vIsIh Rdd=vlslh(Rss,Rt) Word64 Q6 P vlslh PR(Word64 Rss, Word32 Rt) 565 vlslw Rdd=vlslw(Rss,Rt) Word64 Q6_P_vlslw_PR(Word64 Rss, Word32 Rt) 568 vlsrh Rdd=vlsrh(Rss,#u4) Word64 Q6_P_vlsrh_PI(Word64 Rss, Word32 lu4) 560 Rdd=vlsrh(Rss,Rt) Word64 Q6_P_vlsrh_PR(Word64 Rss, Word32 Rt) 565 vlsrw Rdd=vlsrw(Rss,#u5) Word64 Q6_P_vlsrw_PI(Word64 Rss, Word32 lu5) 566 Rdd=vlsrw(Rss,Rt) Word64 Q6_P_vlsrw_PR(Word64 Rss, Word32 Rt) 568 vmaxb Rdd=vmaxb(Rtt,Rss) Word64 Q6_P_vmaxb_PP(Word64 Rtt, Word64 Rss) 353 vmaxh Rdd=vmaxh(Rtt,Rss) Word64 Q6_P_vmaxh_PP(Word64 Rtt, Word64 Rss) 354 vmaxub Rdd=vmaxub(Rtt,Rss) Word64 Q6_P_vmaxub_PP(Word64 Rtt, Word64 Rss) 353 vmaxuh Rdd=vmaxuh(Rtt,Rss) Word64 Q6_P_vmaxuh_PP(Word64 Rtt, Word64 Rss) 354 vmaxuw Rdd=vmaxuw(Rtt,Rss) Word64 Q6_P_vmaxuw_PP(Word64 Rtt, Word64 Rss) 358 vmaxw Rdd=vmaxw(Rtt,Rss) Word64 Q6_P_vmaxw_PP(Word64 Rtt, Word64 Rss) 358 vminb Rdd=vminb(Rtt,Rss) Word64 Q6_P_vminb_PP(Word64 Rtt, Word64 Rss) 359 vminh Rdd=vminh(Rtt,Rss) Word64 Q6_P_vminh_PP(Word64 Rtt, Word64 Rss) 360 vminub Rdd=vminub(Rtt,Rss) Word64 Q6_P_vminub_PP(Word64 Rtt, Word64 Rss) 359 vminuh Rdd=vminuh(Rtt,Rss) Word64 Q6 P vminuh PP(Word64 Rtt, Word64 Rss) 360 vminuw Rdd=vminuw(Rtt,Rss) Word64 Q6_P_vminuw_PP(Word64 Rtt, Word64 Rss) 365 vminw Rdd=vminw(Rtt,Rss) Word64 Q6_P_vminw_PP(Word64 Rtt, Word64 Rss) 365 ``` ``` vmpybsu Rdd=vmpybsu(Rs,Rt) Word64 Q6 P vmpybsu RR(Word32 Rs, Word32 Rt) 491 Rxx+=vmpybsu(Rs,Rt) Word64 Q6 P vmpybsuacc RR(Word64 Rxx, Word32 Rs, Word32 Rt) 491 vmpybu Rdd=vmpybu(Rs,Rt) Word64 Q6 P vmpybu RR(Word32 Rs, Word32 Rt) 491 Rxx+=vmpybu(Rs,Rt) Word64 Q6_P_vmpybuacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 491 vmpyeh Rdd=vmpyeh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyeh_PP_s1_sat(Word64 Rss, Word64 Rtt) 482 Rdd=vmpyeh(Rss,Rtt):sat Word64 Q6_P_vmpyeh_PP_sat(Word64 Rss, Word64 Rtt) 482 Rxx+=vmpyeh(Rss,Rtt) Word64 Q6 P vmpyehacc PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 482 Rxx+=vmpyeh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyehacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 482 Rxx+=vmpyeh(Rss,Rtt):sat Word64 Q6_P_vmpyehacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 482 vmpyh Rd=vmpyh(Rs,Rt):<<1:rnd:sat Word32 Q6_R_vmpyh_RR_s1_rnd_sat(Word32 Rs, Word32 Rt) 486 Rd=vmpyh(Rs,Rt):rnd:sat Word32 Q6_R_vmpyh_RR_rnd_sat(Word32 Rs, Word32 Rt) 486 Rdd=vmpyh(Rs,Rt):<<1:sat Word64 Q6 P vmpyh RR s1 sat(Word32 Rs, Word32 Rt) 484 Rdd=vmpyh(Rs,Rt):sat Word64 Q6 P vmpyh RR sat(Word32 Rs, Word32 Rt) 484 Rxx+=vmpyh(Rs,Rt) Word64 Q6_P_vmpyhacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 484 Rxx+=vmpyh(Rs,Rt):<<1:sat Word64 Q6 P vmpyhacc RR s1 sat(Word64 Rxx, Word32 Rs, Word32 Rt) 484 Rxx+=vmpyh(Rs,Rt):sat Word64 Q6_P_vmpyhacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 484 vmpyhsu Rdd=vmpyhsu(Rs,Rt):<<1:sat Word64 Q6 P vmpyhsu RR s1 sat(Word32 Rs, Word32 Rt) 487 Rdd=vmpyhsu(Rs,Rt):sat Word64 Q6_P_vmpyhsu_RR_sat(Word32 Rs, Word32 Rt) 487 Rxx+=vmpyhsu(Rs,Rt):<<1:sat Word64 Q6 P vmpyhsuacc RR s1 sat(Word64 Rxx, Word32 Rs, Word32 Rt) 487 Rxx+=vmpyhsu(Rs,Rt):sat Word64 Q6 P vmpyhsuacc RR sat(Word64 Rxx, Word32 Rs, Word32 Rt) 487 vmpyweh Rdd=vmpyweh(Rss,Rtt):<<1:rnd:sat Word64 Q6_P_vmpyweh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 447 Rdd=vmpyweh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyweh_PP_s1_sat(Word64 Rss, Word64 Rtt) 447 Rdd=vmpyweh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpyweh_PP_rnd_sat(Word64 Rss, Word64 Rtt) 447 Rdd=vmpyweh(Rss,Rtt):sat ``` ``` Word64 Q6 P vmpyweh PP sat(Word64 Rss, Word64 Rtt) 448 Rxx+=vmpyweh(Rss,Rtt):<<1:rnd:sat Word64 Q6_P_vmpywehacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 Rxx+=vmpyweh(Rss,Rtt):<<1:sat Word64 Q6 P vmpywehacc PP s1 sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 Rxx+=vmpyweh(Rss,Rtt):rnd:sat Word64 Q6 P vmpywehacc PP rnd sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 Rxx+=vmpyweh(Rss,Rtt):sat Word64 Q6 P vmpywehacc PP sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 vmpvweuh Rdd=vmpyweuh(Rss,Rtt):<<1:rnd:sat Word64 Q6_P_vmpyweuh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 451 Rdd=vmpyweuh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyweuh_PP_s1_sat(Word64 Rss, Word64 Rtt) 451 Rdd=vmpyweuh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpyweuh_PP_rnd_sat(Word64 Rss, Word64 Rtt) 451 Rdd=vmpyweuh(Rss,Rtt):sat Word64 Q6_P_vmpyweuh_PP_sat(Word64 Rss, Word64 Rtt) 452 Rxx+=vmpyweuh(Rss,Rtt):<<1:rnd:sat Word64 Q6_P_vmpyweuhacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 Rxx+=vmpyweuh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyweuhacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 Rxx+=vmpyweuh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpyweuhacc_PP_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 Rxx+=vmpyweuh(Rss,Rtt):sat Word64 Q6_P_vmpyweuhacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 vmpywoh Rdd=vmpywoh(Rss,Rtt):<<1:rnd:sat Word64 Q6 P vmpywoh PP s1 rnd sat(Word64 Rss, Word64 Rtt) 448 Rdd=vmpywoh(Rss,Rtt):<<1:sat Word64 Q6 P vmpywoh PP s1 sat(Word64 Rss, Word64 Rtt) 448 Rdd=vmpywoh(Rss,Rtt):rnd:sat Word64 Q6 P vmpywoh PP rnd sat(Word64 Rss, Word64 Rtt) 448 Rdd=vmpywoh(Rss,Rtt):sat Word64 Q6 P vmpywoh PP sat(Word64 Rss, Word64 Rtt) 448 Rxx+=vmpywoh(Rss,Rtt):<<1:rnd:sat Word64 Q6 P vmpywohacc PP s1 rnd sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 Rxx+=vmpywoh(Rss,Rtt):<<1:sat Word64 Q6 P vmpywohacc PP s1 sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 Rxx+=vmpywoh(Rss,Rtt):rnd:sat Word64 Q6 P vmpywohacc PP rnd sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 Rxx+=vmpywoh(Rss,Rtt):sat Word64 Q6 P vmpywohacc PP sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448 vmpywouh Rdd=vmpywouh(Rss,Rtt):<<1:rnd:sat Word64 Q6_P_vmpywouh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 452 Rdd=vmpywouh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpywouh_PP_s1_sat(Word64 Rss, Word64 Rtt) 452 Rdd=vmpywouh(Rss,Rtt):rnd:sat Word64 Q6 P vmpywouh PP rnd sat(Word64 Rss, Word64 Rtt) 452 Rdd=vmpywouh(Rss,Rtt):sat Word64 Q6 P vmpywouh PP sat(Word64 Rss, Word64 Rtt) 452 Rxx+=vmpywouh(Rss,Rtt):<<1:rnd:sat ``` ``` Word64 Q6_P_vmpywouhacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 Rxx+=vmpywouh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpywouhacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 Rxx+=vmpywouh(Rss,Rtt):rnd:sat Word64 Q6 P vmpywouhacc PP rnd sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 Rxx+=vmpywouh(Rss,Rtt):sat Word64 Q6 P vmpywouhacc PP sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 452 vmux Rdd=vmux(Pu,Rss,Rtt) Word64 Q6 P vmux pPP(Byte Pu, Word64 Rss, Word64 Rtt) 536 vnavgh Rd=vnavgh(Rt,Rs) Word32 Q6 R vnavgh RR(Word32 Rt, Word32 Rs) 149 Rdd=vnavgh(Rtt,Rss) Word64 Q6 P vnavgh PP(Word64 Rtt, Word64 Rss) 347 Rdd=vnavgh(Rtt,Rss):crnd:sat Word64 Q6_P_vnavgh_PP_crnd_sat(Word64 Rtt, Word64 Rss) 347 Rdd=vnavgh(Rtt,Rss):rnd:sat Word64 Q6_P_vnavgh_PP_rnd_sat(Word64 Rtt, Word64 Rss) 347 vnavgw Rdd=vnavgw(Rtt,Rss) Word64 Q6_P_vnavgw_PP(Word64 Rtt, Word64 Rss) 350 Rdd=vnavgw(Rtt,Rss):crnd:sat Word64 Q6_P_vnavgw_PP_crnd_sat(Word64 Rtt, Word64 Rss) 350 Rdd=vnavgw(Rtt,Rss):rnd:sat Word64 Q6_P_vnavgw_PP_rnd_sat(Word64 Rtt, Word64 Rss) 350 vpmpyh Rdd=vpmpyh(Rs,Rt) Word64 Q6_P_vpmpyh_RR(Word32 Rs, Word32 Rt) 493 Rxx^=vpmpyh(Rs,Rt) Word64 Q6_P_vpmpyhxacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 493 vraddh Rd=vraddh(Rss,Rtt) Word32 Q6 R vraddh PP(Word64 Rss, Word64 Rtt) 342 vraddub Rdd=vraddub(Rss,Rtt) Word64 Q6 P vraddub PP(Word64 Rss, Word64 Rtt) 340 Rxx+=vraddub(Rss.Rtt) Word64 Q6_P_vraddubacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 340 vradduh Rd=vradduh(Rss,Rtt) Word32 Q6 R vradduh PP(Word64 Rss, Word64 Rtt) 342 vrcmpys Rd=vrcmpys(Rss,Rt):<<1:rnd:sat Word32 Q6 R vrcmpys PR s1 rnd sat(Word64 Rss, Word32 Rt) 418 Rdd=vrcmpys(Rss,Rt):<<1:sat Word64 Q6 P vrcmpys PR s1 sat(Word64 Rss, Word32 Rt) 415 Rxx+=vrcmpys(Rss,Rt):<<1:sat Word64 Q6 P vrcmpysacc PR s1 sat(Word64 Rxx, Word64 Rss, Word32 Rt) 415 vrcnegh Rxx+=vrcnegh(Rss,Rt) Word64 Q6 P vrcneghacc PR(Word64 Rxx, Word64 Rss, Word32 Rt) 352 vrcrotate ``` ``` Rdd=vrcrotate(Rss,Rt,#u2) Word64 Q6 P vrcrotate PRI(Word64 Rss, Word32 Rt, Word32 Iu2) 421 Rxx+=vrcrotate(Rss,Rt,#u2) Word64 Q6 P vrcrotateacc PRI(Word64 Rxx, Word64 Rss, Word32 Rt, Word32 Iu2) 421 vrmaxh Rxx=vrmaxh(Rss,Ru) Word64 Q6_P_vrmaxh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 355 Rxx=vrmaxuh(Rss,Ru) Word64 Q6 P vrmaxuh PR(Word64 Rxx, Word64 Rss, Word32 Ru) 355 vrmaxuw Rxx=vrmaxuw(Rss,Ru) Word64 Q6_P_vrmaxuw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 357 vrmaxw Rxx=vrmaxw(Rss,Ru) Word64 Q6_P_vrmaxw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 357 vrminh Rxx=vrminh(Rss,Ru) Word64 Q6 P vrminh PR(Word64 Rxx, Word64 Rss, Word32 Ru) 361 Rxx=vrminuh(Rss,Ru) Word64 Q6 P vrminuh PR(Word64 Rxx, Word64 Rss, Word32 Ru) 361 vrminuw Rxx=vrminuw(Rss,Ru) Word64 Q6 P vrminuw PR(Word64 Rxx, Word64 Rss, Word32 Ru) 363 vrminw Rxx=vrminw(Rss,Ru) Word64 Q6_P_vrminw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 363 vrmpybsu Rdd=vrmpybsu(Rss,Rtt) Word64 Q6_P_vrmpybsu_PP(Word64 Rss, Word64 Rtt) 478 Rxx+=vrmpybsu(Rss,Rtt) Word64 Q6 P vrmpybsuacc PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 478 vrmpybu Rdd=vrmpybu(Rss,Rtt) Word64 Q6_P_vrmpybu_PP(Word64 Rss, Word64 Rtt) 478 Rxx+=vrmpybu(Rss,Rtt) Word64 Q6 P vrmpybuacc PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 478 vrmpyh Rdd=vrmpyh(Rss,Rtt) Word64 Q6 P vrmpyh PP(Word64 Rss, Word64 Rtt) 488 Rxx+=vrmpyh(Rss,Rtt) Word64 Q6 P vrmpyhacc PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 488 vrmpyweh Rdd=vrmpyweh(Rss,Rtt) Word64 Q6_P_vrmpyweh_PP(Word64 Rss, Word64 Rtt) 468 Rdd=vrmpyweh(Rss,Rtt):<<1 Word64 Q6 P vrmpyweh PP s1(Word64 Rss, Word64 Rtt) 468 Rxx+=vrmpyweh(Rss,Rtt) Word64 Q6 P vrmpywehacc PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 468 Rxx+=vrmpyweh(Rss,Rtt):<<1 Word64 Q6_P_vrmpywehacc_PP_s1(Word64 Rxx, Word64 Rss, Word64 Rtt) 468 vrmpywoh ``` ``` Rdd=vrmpywoh(Rss,Rtt) Word64 Q6_P_vrmpywoh_PP(Word64 Rss, Word64 Rtt) 468 Rdd=vrmpywoh(Rss,Rtt):<<1 Word64 Q6_P_vrmpywoh_PP_s1(Word64 Rss, Word64 Rtt) 468 Rxx+=vrmpywoh(Rss,Rtt) Word64 Q6_P_vrmpywohacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 468 Rxx+=vrmpywoh(Rss,Rtt):<<1 Word64 Q6_P_vrmpywohacc_PP_s1(Word64 Rxx, Word64 Rss, Word64 Rtt) 468 vrndwh Rd=vrndwh(Rss) Word32 Q6_R_vrndwh_P(Word64 Rss) 500 Rd=vrndwh(Rss):sat Word32 Q6_R_vrndwh_P_sat(Word64 Rss) 500 vrsadub Rdd=vrsadub(Rss,Rtt) Word64 Q6_P_vrsadub_PP(Word64 Rss, Word64 Rtt) 366 Rxx+=vrsadub(Rss,Rtt) Word64 Q6 P vrsadubacc PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 366 vsathb Rd=vsathb(Rs) Word32 Q6_R_vsathb_R(Word32 Rs) 503 Rd=vsathb(Rss) Word32 Q6 R vsathb P(Word64 Rss) 503 Rdd=vsathb(Rss) Word64 Q6 P vsathb P(Word64 Rss) 505 vsathub Rd=vsathub(Rs) Word32 Q6 R vsathub R(Word32 Rs) 503 Rd=vsathub(Rss) Word32 Q6 R vsathub P(Word64 Rss) 503 Rdd=vsathub(Rss) Word64 Q6_P_vsathub_P(Word64 Rss) 505 vsatwh Rd=vsatwh(Rss) Word32 Q6_R_vsatwh_P(Word64 Rss) 503 Rdd=vsatwh(Rss) Word64 Q6 P vsatwh P(Word64 Rss) 505 vsatwuh Rd=vsatwuh(Rss) Word32 Q6_R_vsatwuh_P(Word64 Rss) 503 Rdd=vsatwuh(Rss) Word64 Q6_P_vsatwuh_P(Word64 Rss) 505 vsplatb Rd=vsplatb(Rs) Word32 Q6_R_vsplatb_R(Word32 Rs) 508 Rdd=vsplatb(Rs) Word64 Q6_P_vsplatb_R(Word32 Rs) 508 vsplath Rdd=vsplath(Rs) Word64 Q6_P_vsplath_R(Word32 Rs) 509 vspliceb Rdd=vspliceb(Rss,Rtt,#u3) Word64 Q6_P_vspliceb_PPI(Word64 Rss, Word64 Rtt, Word32 lu3) 510 ``` ``` Rdd=vspliceb(Rss,Rtt,Pu) Word64 Q6_P_vspliceb_PPp(Word64 Rss, Word64 Rtt, Byte Pu) 510 vsubb Rdd=vsubb(Rss,Rtt) Word64 Q6 P vsubb PP(Word64 Rss, Word64 Rtt) 369 vsubh Rd=vsubh(Rt,Rs) Word32 Q6 R vsubh RR(Word32 Rt, Word32 Rs) 150 Rd=vsubh(Rt,Rs):sat Word32 Q6_R_vsubh_RR_sat(Word32 Rt, Word32 Rs) 150 Rdd=vsubh(Rtt,Rss) Word64 Q6_P_vsubh_PP(Word64 Rtt, Word64 Rss) 368 Rdd=vsubh(Rtt,Rss):sat Word64 Q6 P vsubh PP sat(Word64 Rtt, Word64 Rss) 368 vsubub Rdd=vsubub(Rtt,Rss) Word64 Q6 P vsubub PP(Word64 Rtt, Word64 Rss) 369 Rdd=vsubub(Rtt,Rss):sat Word64 Q6_P_vsubub_PP_sat(Word64 Rtt, Word64 Rss) 369 vsubuh Rd=vsubuh(Rt,Rs):sat Word32 Q6 R vsubuh RR sat(Word32 Rt, Word32 Rs) 150 Rdd=vsubuh(Rtt,Rss):sat Word64 Q6_P_vsubuh_PP_sat(Word64 Rtt, Word64 Rss) 368 vsubw Rdd=vsubw(Rtt,Rss) Word64 Q6 P vsubw PP(Word64 Rtt, Word64 Rss) 370 Rdd=vsubw(Rtt,Rss):sat Word64 Q6 P vsubw PP sat(Word64 Rtt, Word64 Rss) 370 vsxtbh Rdd=vsxtbh(Rs) Word64 Q6 P vsxtbh R(Word32 Rs) 511 vsxthw Rdd=vsxthw(Rs) Word64 Q6 P vsxthw R(Word32 Rs) 511 vtrunehb Rd=vtrunehb(Rss) Word32 Q6 R vtrunehb P(Word64 Rss) 514 Rdd=vtrunehb(Rss,Rtt) Word64 Q6 P vtrunehb PP(Word64 Rss, Word64 Rtt) 514 vtrunewh Rdd=vtrunewh(Rss,Rtt) Word64 Q6_P_vtrunewh_PP(Word64 Rss, Word64 Rtt) 514 vtrunohb Rd=vtrunohb(Rss) Word32 Q6 R vtrunohb P(Word64 Rss) 514 Rdd=vtrunohb(Rss,Rtt) Word64 Q6_P_vtrunohb_PP(Word64 Rss, Word64 Rtt) 514 vtrunowh Rdd=vtrunowh(Rss,Rtt) Word64 Q6_P_vtrunowh_PP(Word64 Rss, Word64 Rtt) 514 vxaddsubh Rdd=vxaddsubh(Rss,Rtt):rnd:>>1:sat ``` ``` Word64 Q6_P_vxaddsubh_PP_rnd_rs1_sat(Word64 Rss, Word64 Rtt) 391 Rdd=vxaddsubh(Rss,Rtt):sat Word64 Q6_P_vxaddsubh_PP_sat(Word64 Rss, Word64 Rtt) 391 vxaddsubw Rdd=vxaddsubw(Rss.Rtt):sat Word64 Q6_P_vxaddsubw_PP_sat(Word64 Rss, Word64 Rtt) 393 vxsubaddh Rdd=vxsubaddh(Rss,Rtt):rnd:>>1:sat Word64 Q6 P vxsubaddh PP rnd rs1 sat(Word64 Rss, Word64 Rtt) 391 Rdd=vxsubaddh(Rss,Rtt):sat Word64 Q6 P vxsubaddh PP sat(Word64 Rss, Word64 Rtt) 391 vxsubaddw Rdd=vxsubaddw(Rss,Rtt):sat Word64 Q6 P vxsubaddw PP sat(Word64 Rss, Word64 Rtt) 393 vzxtbh Rdd=vzxtbh(Rs) Word64 Q6 P vzxtbh R(Word32 Rs) 515 vzxthw Rdd=vzxthw(Rs) Word64 Q6 P vzxthw R(Word32 Rs) 515 Χ xor Pd=xor(Ps,Pt) Byte Q6 p xor pp(Byte Ps, Byte Pt) 181 Rd=xor(Rs,Rt) Word32 Q6 R xor RR(Word32 Rs, Word32 Rt) 139 Rdd=xor(Rss,Rtt) Word64 Q6_P_xor_PP(Word64 Rss, Word64 Rtt) 312 Rx\&=xor(Rs,Rt) Word32 Q6 R xorand RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx^=xor(Rs,Rt) Word32 Q6 R xorxacc RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rx | =xor(Rs,Rt) Word32 Q6_R_xoror_RR(Word32 Rx, Word32 Rs, Word32 Rt) 314 Rxx^=xor(Rss,Rtt) Word64 Q6 P xorxacc PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 313 Ζ zxtb Rd=zxtb(Rs) Word32 Q6_R_zxtb_R(Word32 Rs) 151 zxth Rd=zxth(Rs) Word32 Q6_R_zxth_R(Word32 Rs) 151 ```