The aim of JPEG 2000 is not only improving compression performance over JPEG but also adding (or improving) features such as scalability and editability. JPEG 2000's improvement in compression performance relative to the original JPEG standard is actually rather modest and should not ordinarily be the primary consideration for evaluating the design. Very low and very high compression rates are supported in JPEG 2000. The ability of the design to handle a very large range of effective bit rates is one of the strengths of JPEG 2000. For example, to reduce the number of bits for a picture below a certain amount, the advisable thing to do with the first JPEG standard is to reduce the resolution of the input image before encoding it. That is unnecessary when using JPEG 2000, because JPEG 2000 already does this automatically through its multi-resolution decomposition structure. The following sections describe the algorithm of JPEG 2000. According to the
Royal Library of the Netherlands, "the current JP2 format specification leaves room for multiple interpretations when it comes to the support of ICC profiles, and the handling of grid resolution information".
Color components transformation Initially images have to be transformed from the RGB
color space to another color space, leading to three
components that are handled separately. There are two possible choices: • Irreversible Color Transform (ICT) uses the well known BT.601
YCC color space. It is called "irreversible" because it has to be implemented in floating or fix-point and causes round-off errors. The ICT shall be used only with the 9/7 wavelet transform. • Reversible Color Transform (RCT) uses a modified YUV color space (almost the same as
YCC) that does not introduce quantization errors, so it is fully reversible. Proper implementation of the RCT requires that numbers be rounded as specified and cannot be expressed exactly in matrix form. The RCT shall be used only with the 5/3 wavelet transform. The transformations are: :: \begin{array}{rl} Y &=& \left\lfloor \frac{R+2G+B}{4} \right\rfloor ; \\ C_B &=& B - G ; \\ C_R &=& R - G ; \end{array} \qquad \begin{array}{rl} G &=& Y - \left\lfloor \frac{C_B + C_R}{4} \right\rfloor ; \\ R &=& C_R + G ; \\ B &=& C_B + G. \end{array} If R, G, and B are normalized to the same precision, then numeric precision of C and C is one bit greater than the precision of the original components. This increase in precision is necessary to ensure reversibility. The
chrominance components can be, but do not necessarily have to be, downscaled in resolution; in fact, since the wavelet transformation already separates images into scales, downsampling is more effectively handled by dropping the finest wavelet scale. This step is called
multiple component transformation in the JPEG 2000 language since its usage is not restricted to the
RGB color model.
Tiling After color transformation, the image is split into so-called
tiles, rectangular regions of the image that are transformed and encoded separately. Tiles can be any size, and it is also possible to consider the whole image as one single tile. Once the size is chosen, all the tiles will have the same size (except optionally those on the right and bottom borders). Dividing the image into tiles is advantageous in that the decoder will need less memory to decode the image and it can opt to decode only selected tiles to achieve a partial decoding of the image. The disadvantage of this approach is that the quality of the picture decreases due to a lower
peak signal-to-noise ratio. Using many tiles can create a blocking effect similar to the older
JPEG 1992 standard.
Wavelet transform 5/3 wavelet used for lossless compression . These tiles are then
wavelet-transformed to an arbitrary depth, in contrast to JPEG 1992 which uses an 8×8 block-size
discrete cosine transform. JPEG 2000 uses two different
wavelet transforms: •
irreversible: the
CDF 9/7 wavelet transform (developed by
Ingrid Daubechies). It is said to be "irreversible" because it introduces quantization noise that depends on the precision of the decoder. •
reversible: a rounded version of the biorthogonal Le Gall–Tabatabai (LGT) 5/3 wavelet transform (developed by Didier Le Gall and Ali J. Tabatabai). It uses only integer coefficients, so the output does not require rounding (quantization) and so it does not introduce any quantization noise. It is used in lossless coding. The wavelet transforms are implemented by the
lifting scheme or by
convolution.
Quantization After the wavelet transform, the coefficients are scalar-
quantized to reduce the number of bits to represent them, at the expense of quality. The output is a set of integer numbers which have to be encoded bit-by-bit. The parameter that can be changed to set the final quality is the quantization step: the greater the step, the greater is the compression and the loss of quality. With a quantization step that equals 1, no quantization is performed (it is used in lossless compression).
Coding The result of the previous process is a collection of
sub-bands which represent several approximation scales. A sub-band is a set of
coefficients—
real numbers which represent aspects of the image associated with a certain frequency range as well as a spatial area of the image. The quantized sub-bands are split further into
precincts, rectangular regions in the wavelet domain. They are typically sized so that they provide an efficient way to access only part of the (reconstructed) image, though this is not a requirement. Precincts are split further into
code blocks. Code blocks are in a single sub-band and have equal sizes—except those located at the edges of the image. The encoder has to encode the bits of all quantized coefficients of a code block, starting with the most significant bits and progressing to less significant bits by a process called the
EBCOT scheme.
EBCOT here stands for
Embedded Block Coding with Optimal Truncation. In this encoding process, each
bit plane of the code block gets encoded in three so-called
coding passes, first encoding bits (and signs) of insignificant coefficients with significant neighbors (i.e., with 1-bits in higher bit planes), then refinement bits of significant coefficients and finally coefficients without significant neighbors. The three passes are called
Significance Propagation,
Magnitude Refinement and
Cleanup pass, respectively. In lossless mode all bit planes have to be encoded by the EBCOT, and no bit planes can be dropped. The bits selected by these coding passes then get encoded by a context-driven binary
arithmetic coder, namely the binary MQ-coder (as also employed by
JBIG2). The context of a coefficient is formed by the state of its eight neighbors in the code block. The result is a bit-stream that is split into
packets where a
packet groups selected passes of all code blocks from a precinct into one indivisible unit. Packets are the key to quality scalability (i.e., packets containing less significant bits can be discarded to achieve lower bit rates and higher distortion). Packets from all sub-bands are then collected in so-called
layers. The way the packets are built up from the code-block coding passes, and thus which packets a layer will contain, is not defined by the JPEG 2000 standard, but in general a codec will try to build layers in such a way that the image quality will increase monotonically with each layer, and the image distortion will shrink from layer to layer. Thus, layers define the progression by image quality within the codestream. The problem is now to find the optimal packet length for all code blocks which minimizes the overall distortion in a way that the generated target bitrate equals the demanded bit rate. While the standard does not define a procedure as to how to perform this form of
rate–distortion optimization, the general outline is given in one of its many appendices: For each bit encoded by the EBCOT coder, the improvement in image quality, defined as mean square error, gets measured; this can be implemented by an easy table-lookup algorithm. Furthermore, the length of the resulting codestream gets measured. This forms for each code block a graph in the rate–distortion plane, giving image quality over bitstream length. The optimal selection for the truncation points, thus for the packet-build-up points is then given by defining critical
slopes of these curves, and picking all those coding passes whose curve in the rate–distortion graph is steeper than the given critical slope. This method can be seen as a special application of the method of
Lagrange multiplier which is used for optimization problems under constraints. The
Lagrange multiplier, typically denoted by λ, turns out to be the critical slope, the constraint is the demanded target bitrate, and the value to optimize is the overall distortion. Packets can be reordered almost arbitrarily in the JPEG 2000 bit-stream; this gives the encoder as well as image servers a high degree of freedom. Already encoded images can be sent over networks with arbitrary bit rates by using a layer-progressive encoding order. On the other hand, color components can be moved back in the bit-stream; lower resolutions (corresponding to low-frequency sub-bands) could be sent first for image previewing. Finally, spatial browsing of large images is possible through appropriate tile or partition selection. All these operations do not require any re-encoding but only byte-wise copy operations.
Compression ratio , and
HEIF at similar file sizes Compared to the previous JPEG standard, JPEG 2000 delivers a typical compression gain in the range of 20%, depending on the image characteristics. Higher-resolution images tend to benefit more, where JPEG 2000's spatial-redundancy prediction can contribute more to the compression process. In very low-bitrate applications, studies have shown JPEG 2000 to be outperformed by the intra-frame coding mode of H.264.
Computational complexity and performance JPEG 2000 is much more complicated in terms of computational complexity in comparison with JPEG standard. Tiling, color component transform, discrete wavelet transform, and quantization could be done pretty fast, though entropy codec is time-consuming and quite complicated. EBCOT context modelling and arithmetic MQ-coder take most of the time of JPEG 2000 codec. Fast JPEG 2000 encoding on the CPU makes use of hardware-acceleration with AVX/SSE and uses multithreading to process each tile in a separate thread. The fastest solutions utilize both the CPU and GPU to achieve high performance benchmarks. ==File format and codestream==