A program running on a computer that uses word addressing can still work with smaller units of memory by emulating an access to the smaller unit. For a load, this requires loading the enclosing word and then extracting the desired bits. For a store, this requires loading the enclosing word, shifting the new value into place, overwriting the desired bits, and then storing the enclosing word. Suppose that four consecutive code points from a UTF-8 string need to be packed into a 32-bit word. The first code point might occupy bits 0–7, the second 8–15, the third 16–23, and the fourth 24–31. (If the memory were byte-addressable, this would be a
little endian byte order.) In order to clearly elucidate the code necessary for sub-word accesses without tying the example too closely to any particular word-addressed architecture, the following examples use
MIPS assembly. In reality, MIPS is a byte-addressed architecture with direct support for loading and storing 8-bit and 16-bit values, but the example will pretend that it only provides 32-bit loads and stores and that offsets within a 32-bit word must be stored separately from an address. MIPS has been chosen because it is a simple
assembly language with no specialized facilities that would make these operations more convenient. Suppose that a program wishes to read the third code point into register r1 from the word at an address in register r2. In the absence of any other support from the instruction set, the program must load the full word, right-shift by 16 to drop the first two code points, and then mask off the fourth code point: ldw $r1, 0($r2) # Load the full word srl $r1, $r1, 16 # Shift right by 16 andi $r1, $r1, 0xFF # Mask off other code points If the offset is not known statically, but instead a bit-offset is stored in the register r3, a slightly more complex approach is required: ldw $r1, 0($r2) # Load the full word srlv $r1, $r1, $r3 # Shift right by the bit offset andi $r1, $r1, 0xFF # Mask off other code points Suppose instead that the program wishes to assign the code point in register r1 to the third code point in the word at the address in r2. In the absence of any other support from the instruction set, the program must load the full word, mask off the old value of that code point, shift the new value into place, merge the values, and store the full word back: sll $r1, $r1, 16 # Shift the new value left by 16 lhi $r5, 0x00FF # Construct a constant mask to select the third byte nor $r5, $r5, $zero # Flip the mask so that it clears the third byte ldw $r4, 0($r2) # Load the full word and $r4, $r5, $r4 # Clear the third byte from the word or $r4, $r4, $r1 # Merge the new value into the word stw $r4, 0($r2) # Store the result as the full word Again, if the offset is instead stored in r3, a more complex approach is required: sllv $r1, $r1, $r3 # Shift the new value left by the bit offset llo $r5, 0x00FF # Construct a constant mask to select a byte sllv $r5, $r5, $r3 # Shift the mask left by the bit offset nor $r5, $r5, $zero # Flip the mask so that it clears the selected byte ldw $r4, 0($r2) # Load the full word and $r4, $r5, $r4 # Clear the selected byte from the word or $r4, $r4, $r1 # Merge the new value into the word stw $r4, 0($r2) # Store the result as the full word This code sequence assumes that another thread cannot modify other bytes in the word concurrently. If concurrent modification is possible, then one of the modifications might be lost. To solve this problem, the last few instructions must be turned into an atomic compare-exchange loop so that a concurrent modification will simply cause it to repeat the operation with the new value. No memory barriers are required in this case. A pair of a word address and an offset within the word is called a
wide address (also known as a
fat address or
fat pointer). (This should not be confused with other uses of wide addresses for storing other kinds of supplemental data, such as the bounds of an array.) The stored offset may be either a bit offset or a byte offset. The code sequences above benefit from the offset being denominated in bits because they use it as a shift count; an architecture with direct support for selecting bytes might prefer to just store a byte offset. In these code sequences, the additional offset would have to be stored alongside the
base address, effectively doubling the overall storage requirements of an address. This is not always true on word machines, primarily because addresses themselves are often not packed with other data to make accesses more efficient. For example, the
Cray X1 uses 64-bit words, but addresses are only 32 bits; when an address is stored in memory, it is stored in its own word, and so the byte offset can be placed in the upper 32 bits of the word. The inefficiency of using wide addresses on that system is just all the extra logic to manipulate this offset and extract and insert bytes within words; it has no memory-use impact. ==Related concepts==