Data Matrix symbols are made up of modules arranged within a perimeter finder and timing pattern. It can encode up to 3,116 bytes, by default interpreted as characters from an
ASCII character set. The symbol consists of data regions which contain modules set out in a regular array. Large symbols contain several regions. Each data region is delimited by a finder pattern, and this is surrounded on all four sides by a quiet zone border (margin). (Note: The modules may be round or square- no specific shape is defined in the standard. For example, dot-peened cells are generally round.) Data Matrix has an error rate of less than 1 in 10 million characters scanned. Data Matrix is designed to be readable in inverted color.
Data Matrix ECC 000–140 Older versions of Data Matrix include ECC 000, ECC 050, ECC 080, ECC 100, ECC 140. Instead of using
Reed–Solomon codes like ECC 200, ECC 000–140 use a convolution-based error correction. Each varies in the amount of error correction it offers, with ECC 000 offering none, and ECC 140 offering the greatest. For error detection at decode time, even in the case of ECC 000, each of these versions also encode a
cyclic redundancy check (CRC) on the bit pattern. As an added measure, the placement of each bit in the code is determined by bit-placement tables included in the specification. These older versions always have an odd number of modules, and can be made in sizes ranging from 9 × 9 to 49 × 49. All symbols utilizing the ECC 000 through 140 error correction can be recognized by the upper-right corner module being the inverse of the background color. (binary 1). According to ISO/IEC 16022, "ECC 000–140 should only be used in closed applications where a single party controls both the production and reading of the symbols and is responsible for overall system performance."
Data Matrix ECC 200 ECC 200, the newer version of Data Matrix, uses
Reed–Solomon codes for error and erasure recovery. ECC 200 allows the routine reconstruction of the entire encoded data string when the symbol has sustained 30% damage, assuming the matrix can still be accurately located. Symbols have an even number of rows and an even number of columns. Most of the symbols are square with sizes from 10 × 10 to 144 × 144. Some symbols however are rectangular with sizes from 8×18 to 16×48 (even values only). All symbols using the ECC 200 error correction can be recognized by the upper-right corner module being the same as the background color. (binary 0). Additional capabilities that differentiate ECC 200 symbols from the earlier standards include: • Specification of the character set (via
Extended Channel Interpretations) • Rectangular symbols • Structured append (linking of up to 16 symbols to encode larger amounts of data)
Error correction ECC 200 codes use
Reed–Solomon error correction over the
finite field \mathbb{F}_{256} (or ), the elements of which are encoded as
bytes of 8 bits; the byte b_7b_6b_5b_4b_3b_2b_1b_0 with a standard numerical value \textstyle\sum_{i=0}^7 b_i 2^i encodes the field element \textstyle\sum_{i=0}^7 b_i \alpha^i where \alpha \in \mathbb{F}_{256} is taken to be a primitive element satisfying \alpha^8 + \alpha^5 + \alpha^3 + \alpha^2 + 1 = 0. The primitive polynomial is x^8 + x^5 + x^3 + x^2 + 1 , corresponding to the polynomial number 301, with initial root = 1 to obtain generator polynomials. The Reed–Solomon code uses different generator polynomials over \mathbb{F}_{256}, depending on how many error correction bytes the code adds. The number of bytes added is equal to the degree of the generator polynomial. For example, in the 10 × 10 symbol, there are 3 data bytes and 5 error correction bytes. The generator polynomial is obtained as: g(x)=(x+\alpha)(x+\alpha^2)(x+\alpha^3)(x+\alpha^4)(x+\alpha^5), which gives: g(x)=x^5+\alpha^{235}x^4+\alpha^{207}x^3+\alpha^{210}x^2+\alpha^{244}x+\alpha^{15}, or with decimal coefficients: g(x)=x^5 +62x^4 + 111x^3 +15x^2 + 48x + 228.
Byte encoding The encoding process is described in the
ISO/IEC standard 16022:2006. Open-source software for encoding and decoding the ECC-200 variant of Data Matrix has been published. The diagrams below illustrate the placement of the message data within a Data Matrix symbol. The message is "Wikipedia", and it is arranged in a somewhat complicated diagonal pattern starting near the upper-left corner. Some characters are split in two pieces, such as the initial W, and the third 'i' is in "corner pattern 2" rather than the usual L-shaped arrangement. Also shown are the end-of-message code (marked End), the padding (P) and error correctionc (E) bytes, and four modules of unused space (X). The symbol is of size 16×16 (14×14 data area), with 12 data bytes (including 'End' and padding) and 12 error correction bytes. A (255,243,6) Reed Solomon code shortened to (24,12,6) is used. It can correct up to 6 byte errors or erasures. To obtain the error correction bytes, the following procedure may be carried out: The generator polynomial specified for the (24,12,6) code, is: g(x)=x^{12} + 242x^{11} + 100x^{10} + 178x^9 + 97x^8 + 213x^7 + 142x^6 + 42x^5 + 61x^4 + 91x^3 + 158x^2 +153x + 41, which may also be written in the form of a matrix of decimal coefficients:
[1 242 100 178 97 213 142 42 61 91 158 153 41] The 12-byte long message "Wikipedia" including 'End', P1 and P2, in decimal coefficients (see the diagrams below for the computation method using ASCII values), is:
[88 106 108 106 113 102 101 106 98 129 251 147] Using the procedure for
Reed-Solomon systematic encoding, the 12 error correction bytes obtained (E1 through E12 in decimal) in the form of the remainder after polynomial division are:
[104 216 88 39 233 202 71 217 26 92 25 232] These error correction bytes are then appended to the original message. The resulting coded message has 24 bytes, and is in the form: '''[W i k i p e d i a 'End' P1 P2 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12]''' or in decimal coefficients:
[88 106 108 106 113 102 101 106 98 129 251 147 104 216 88 39 233 202 71 217 26 92 25 232] and in hexadecimal coefficients:
[58 6A 6C 6A 71 66 65 6A 62 81 FB 93 68 D8 58 27 E9 CA 47 D9 1A 5C 19 E8] (Note the 58 6A 6C 6A 71 66 65 6A is exactly off by one from the
ASCII encoding of "Wikipedia" 57 69 6B 69 70 65 64 69 61. This is due to the data encoding, described in the next section.)
Data encoding Data matrix operates in units of 8-bit codewords ("bytes"). The default mode stores one
ASCII character per 8-bit codeword and allows for using control codes to switch to other modes. Other modes return to the default mode when they are done encoding the data. By default, encoded values are interpreted as text in ASCII and
ISO-8859-1. For example, to encode the string Wikipédia, one first converts to ISO-8859-1 57 69 6B 69 70 E9 64 69 61 20, then use a combination of ASCII and Base 256 encodings to arrive at 58 6A 6C 6A 71
01 ea 65 6A 62 (base-256 italicized). This is then fed to the byte-to-image encoding routine. The
Extended Channel Interpretation code may be used to change the interpretation: for example, prepending f1 01 (ECI: \0) would cause the bytestring to instead be decoded as
code page 437, where E9 is instead Θ.
Text modes The C40, Text and
X12 modes are potentially more compact for storing text messages. They are similar to
DEC Radix-50, using character codes in the range 0–39, and three of these codes are combined to make a number up to 403=64000, which is packed into two bytes (maximum value 65536) as follows: :V = C1×1600 + C2×40 + C3 + 1 :B1 = floor(V/256) :B2 = V mod 256 The resulting value of B1 is in the range 0–250. The special value 254 is used to return to ASCII encoding mode. Character code interpretations are shown in the table below. The C40 and Text modes have four separate sets. Set 0 is the default, and contains codes that temporarily select a different set for the next character. The only difference is that they reverse upper-and lower-case letters. C40 is primarily upper-case, with lower-case letters in set 3; Text is the other way around. Set 1, containing ASCII control codes, and set 2, containing punctuation symbols are identical in C40 and Text mode.
EDIFACT mode EDIFACT mode uses six bits per character, with four characters packed into three bytes. It can store digits, upper-case letters, and many punctuation marks, but has no support for lower-case letters.
Base 256 mode Base 256 mode data starts with a length indicator, followed by a number of data bytes. A length of 1 to 249 is encoded as a single byte, and longer lengths are stored as two bytes. :L1 = floor(length / 250) + 249, L2 = length mod 250 It is desirable to avoid long strings of zeros in the coded message, because they become large blank areas in the Data Matrix symbol, which may cause a scanner to lose synchronization. (The default ASCII encoding does not use zero for this reason.) In order to make that less likely, the length and data bytes are obscured by adding a pseudorandom value R(n), where n is the position in the byte stream. :R(n) = (149 × n) mod 255 + 1 == History ==