All the architectures below have instruction subsets and groups where the bit manipulation is provided in hardware. From the list it can be seen that
DSPs and
embedded microcontrollers have at least test/set/clear bit, however there are much more comprehensive instructions such as
Count leading zeros,
Popcount,
Galois field arithmetic,
binary-coded decimal, bit-matrix multiply and transpose, byte-permute, bit permute including
bit-reversal, specialised cryptographic instructions and many more.
Intel and AMD (x86) • The x86 instruction core set contains: • Bit Scan Reverse - Returns bit index of highest set bit in input, effectively backwards count leading zeros, not defined for 0. • Bit Scan Forward - Returns bit index of lowest set bit in input, effectively count trailing zeros, but not defined for 0. •
SSE4 and the
BMI instruction set extensions contains instructions for: • Count leading zeros - • Count trailing zeros - • Population count - • Bit extract/bit deposit - / • Bit test - and , given two inputs, do both an operation and an operation between them, and set the ZF and CF EFLAGS bits on whether the results of the AND and ANDN, respectively, are 0. This can be used to test if all masked bits are zero, all masked bits are set, or a mix. • The
AVX-512 ternary extension includes a
bitwise ternary logic instruction, . Also noteworthy is a conflict detection instruction,
VPCONFLICTD • Also present in the AVX/
AVX-512 GFNI subset is bit-matrix affine transformation and its inverse: is effectively an 8x8 bit-matrix multiply in the
Galois field GF(2^8). • AVX-512 BITALG besides AVX-512 version of existing bit manipulation instruction, also added which is a bit-level shuffle instruction, that picks bits from one source using indexes in the second source. • An Intel GNFI technology guide on that AVX/AVX512 GNFI Extension also lists numerous uses including parallel byte-wise set/clear/invert bitmanipulation, 5-bit sign-extension and points out the potential is much greater. •
Intel BCD opcodes Power ISA Power ISA has a large range of bit manipulation instructions, largely due to its history and relationship with IBM mainframes and the
z/Architecture: •
Count leading zeros and trailing, and masked versions of the same. There is a mixture of
popcount parity and
SWAR-style instructions, but not a full set of each: is SWAR byte-level 8x8-bit but there is no 4x16-bit yet there is 2x32-bit and 64-bit scalar . Likewise, is SWAR half-word 4x16-bit but there is no • masked bit-extract and bit-deposit these drop and distribute bits in place according to a mask instead of the more usual technique of a offset and a length.; An unusual centrifuge instruction which moves masked-bits to the left and unmasked bits to the right, preserving their relative order in both instances. Most ISAs would have an operand expressing the number of sequential bits to extract, plus the length: combines these into one general-purpose bitmask. • 8x8-bit transpose which treats a 64-bit quantity as an 8x8 2D matrix, and performs a matrix transpose operation. Each bit 0 of each byte therefore becomes the first byte, each bit 1 of each byte becomes the second and so on. • a strange but very useful indexing instruction, () which allows selection of up to eight individual bits from a 64-bit source, by treating each byte of a second 64-bit register as bit-indices into the first. • Ternary 8-bit
bitwise ternary logic instruction similar to
AVX-512 • strategic instructions for accelerating
packed BCD • Power v3.1 also introduced a number of additional bit manipulation instructions including swapping the order of bytes within half-words, words, and the whole 64-bit register.
Cray supercomputers Cray patented BMM (Bit matrix multiply) in 1990 which could cope with up to 64x64-bit operands. The closest equivalent today is the 8x8 GF(2) Affine Transform instruction of AVX512.
IBM System/360 through z/Architecture IBM System/360 The
IBM System/360 has RR, RX and SI instructions for bit-wise and, exclusive or and or, RS arithmetic and logical shift instructions, an SI test under mask and an atomic RX test and set instruction. These instructions and their extensions remain available through z/Architecture.
IBM System/370 Toward the end of the
S/370 life cycle, IBM made move characters inverse, previously an RPQ, a standard instruction.
IBM S/370, S/370-XA, ESA/370, and ESA/390 vector operations The
IBM 3090 introduced an optional
vector facility to the
System/370-XA and
Enterprise Systems Architecture/370 instruction sets. In addition to integer and floating-point vector arithmetic and logical operations on multiple integer and floating-point values, it introduced vector bit manipulation operations
count leading zeros and
population count .
ESA/390 Towards the end of the
ESA/390 life cycle, IBM introduced some z/Architecture instructions in ESA/390. These included the rotate left single logical, load reversed and store reversed instructions.
z/Architecture scalar z/Architecture inherited all of the bit manipulation instructions of its predecessors, and added 64-bit (
grande) and long (20-bit) displacement versions of some. • General-instructions-extension facility adds • ROTATE THEN AND SELECTED BITS • ROTATE THEN EXCLUSIVE OR SELECTED BITS • ROTATE THEN INSERT SELECTED BITS • ROTATE THEN OR SELECTED BITS • high-word facility adds • ROTATE THEN INSERT SELECTED BITS HIGH • ROTATE THEN INSERT SELECTED BITS Low • Interlocked-Access Facility 1 adds • LOAD AND AND (LAN, LANG) • LOAD AND EXCLUSIVE OR (LAX, LAXG) • LOAD AND OR (LAO, LAOG) (LAX, LAXG) • Miscellaneous-Instruction-Extensions Facility 1 adds • ROTATE THEN INSERT SELECTED BITS (RISBGN) • Miscellaneous-instruction-extensions facility 3 adds • AND WITH COMPLEMENT (NCRK, NCGRK) • MOVE RIGHT TO LEFT • NAND (NNRK, NNGRK) • NOT EXCLUSIVE OR (NXRK, NXGRK) • NOR (NORK, NOGRK) • OR WITH COMPLEMENT (OCRK, OCGRK) • SELECT (SELR, SELGR) • SELECT HIGH (SELFHR) • Miscellaneous-Instruction-Extensions Facility 4 adds • BIT DEPOSIT (BDEPG) • BIT EXTRACT (BEXTG) • COUNT LEADING ZEROS (CLZG) • COUNT TRAILING ZEROS (CTZG)
z/Architecture vector operations z/Architecture does not support the previous vector facility. However, starting with the 11th edition of the z/Architecture Principles of Operation: it supports the following instructions: • Vector
count leading zeros ,
count trailing zeros and vector
population count • Vector test under mask - sets a Condition Code based on comparing
all elements of one register against a second vector as a mask: if all masked-comparisons are all-zero, if all are all-ones or a mix of both. • Vector
GF(2) multiply and multiply-accumulate, , known as
carryless multiply • comprehensive
packed BCD. • memory-based test-and-set and various masked-test set/clear bit operations, which move or copy a single bit into Condition Codes.
DEC PDP-10 The DEC
PDP-6 and
PDP-10 had logical operations covering the full suite of 2-operand
hardware lookup table (LUT2)
Boolean functions (rather than the 3-operand functions that AVX512 and Power ISA have). Later models of the PDP-10 had instructions to convert between
packed BCD and binary. Also present is unusual (variable-bit-length) byte load and store instructions that use
byte pointers for memory operands: in modern terminology these are bit-field insert and extract. In addition to a word address, the bit length (S) and the bit offset (P) of the byte from which to load or into which to store are specified. These instructions can specify a byte size of 0-36, but a byte may not straddle a word boundary. The string manipulation, BCD/binary conversion, and string editing instructions in later models use byte pointers and have the same restrictions.
GE-600 series The
GE-600 series and its successors had Gray-to-binary conversion; without such an instruction,
converting from Gray code requires multiple steps. Binary-to-Gray is simply and does not justify a dedicated instruction. Gray coding has significant
practical applications.
ARM •
ARM11 has bitwise test-ANDed (a bitmasked test) and test-XOR, standard logical
bitwise operations including OR-complement; byte halfword and bit-reversing, and conditional byte-selection/merging. Shift and rotate are available on Operand2. • ARM Cortex-A has bit-field set, clear, extract and reverse. • ARM A64 has
SWAR-style half-word byte-swapping, bit-field insert and extract, and bit-reversing.
RISC-V In the standard extensions RISC-V has scalar
bitwise operations including shift and arithmetic shift, but no rotate. The omissions are compensated for with additional extensions. •
RISC-V Zb* extensions contain a significant number of bit manipulation instructions. The four groups are broken down into useful categories (the integer subset has min/max, rotate and
popcount for example), and have very well-researched justifications for their inclusion and the improvements they bring. • The RISC-V Vector Extension (RVV) has instructions that qualify as hardware-level bit manipulation, but on Vector masks rather than Scalar registers as is normally the case. For example, a Vector-mask
Popcount is available. RVV also has per-element
bitwise operations.
Embedded microcontrollers Intel • The
8086 has , as well as
bitwise operations • The
8051 has , and - set clear and invert bit instructions - and a considerable percentage of its instructions are bit manipulation. Also included is Or-complement and And-complement, present in RISC-V Zb*.
Zilog Z80 • The
Zilog Z80 instruction set includes , , and instructions. These test, reset, and set individual bits in registers or memory pointed to by HL, IX, or IY.
MOS 6502 • The
WDC 65C02 added
bit-manipulation:test and set (TSB) and test and reset (TRB) on individual bits. • Rockwell added similar extensions (RMB, SMB, BBR and BBS) to the R65C00 series
Microchip PICs • The
Microchip Technology PIC range also has
bitwise operations and set, clear and test bit, listed in the
instructions.
Others •
Texas Instruments DSPs such as the
TMS320C6000 series have set, clear, invert, test, extract and insert bit (or bit-field) instructions. • The
TX-2 from 1958 had
"skip on bit" predication, as well as set, clear, invert and permute bits, and shift and other
bitwise operations. •
SuperH has comprehensive memory-based bit manipulation including And-complement and Or-complement, but also has standard register-based test/set/clear and an unusual instruction that replaces bit N (in the range 0 to 7) and copies the replaced bit into the Test register. • The
Signetics 8X300 is a microprocessor introduced in 1976. The processor normally manipulates 8-bit data bytes, but the mask and rotate units makes it possible to manipulate single or multiple bits, making this a variable data-length processor. • The
DEC PDP-11 architecture from 1970 supports bit testing, setting, and clearing on both words BITBISBIC and bytes BITBBISBBICB. The very similar
WD16 supports only the word forms of these instructions plus BISB. The WD16 additionally supports faster byte-addressed flags with its TSTBSETBCLRB and COMB (compliment) instructions. The PDP-11 is missing the SETB instruction. • The
Motorola 68000 supports bit test and manipulation of memory or data registers. The bit number may either be an immediate or a value in a data register. The instructions are: BSET (set to 1), BCLR (clear to 0), BCHG (invert) and BTST (no change). All of these instructions first test the destination bit and set the CCR Z bit if the destination bit is 0. == Notes ==