MarketChipkill
Company Profile

Chipkill

Chipkill is IBM's trademark for a form of advanced error checking and correcting (ECC) computer memory technology that protects memory systems from single memory chip failures and multi-bit errors from any portion of a single memory chip.

Equivalent, derived, and similar systems
An equivalent system from Sun Microsystems is called Extended ECC, while equivalent systems from HP are called Advanced ECC and Chipspare. Intel has two similar systems: • Single-device data correction (SxEC-DxED, where x is 4 or 8, the width of a single DRAM chip). In S4EC-D4ED, 36-bit SECDED words are used, achieving one-bit-per-chip on a single DRAM with 36 memory chips. • Lockstep memory provides double-device data correction (DDDC) functionality, where the chips across two memory modules (sticks) are pooled together to scatter the bits. The downside is that the channels now work in lockstep, causing higher latency. Similar systems from Micron, called redundant array of independent NAND (RAIN), and from SandForce, called RAISE level 2, protect data stored on SSDs from any single NAND flash chip failure. == Evaluation ==
Evaluation
A 2009 paper using data from Google's data centers provided evidence demonstrating that in observed Google systems, DRAM errors were recurrent at the same location, and that 8% of DIMMs were affected each year. Specifically, "In more than 85% of the cases a correctable error is followed by at least one more correctable error in the same month." DIMMs with Chipkill error correction showed a lower fraction of DIMMs reporting uncorrectable errors compared to DIMMs with error-correcting codes that can only correct single-bit errors. A 2010 paper from the University of Rochester also showed that Chipkill memory resulted in substantially fewer memory errors, using both real-world memory traces and simulations. == See also ==
tickerdossier.comtickerdossier.substack.com