MPEG-1

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

History

The predecessor of MPEG-1 for video coding was the H.261 standard produced by the CCITT (now known as the ITU-T). The basic architecture established in H.261 was the motion-compensated DCT hybrid video coding structure. It uses macroblocks of size 16×16 with block-based motion estimation in the encoder and motion compensation using encoder-selected motion vectors in the decoder, with residual difference coding using a discrete cosine transform (DCT) of size 8×8, scalar quantization, and variable-length codes (like Huffman codes) for entropy coding. H.261 was the first practical video coding standard, and all of its described design elements were also used in MPEG-1. Modeled on the successful collaborative approach and the compression technologies developed by the Joint Photographic Experts Group and CCITT's Experts Group on Telephony (creators of the JPEG image compression standard and the H.261 standard for video conferencing respectively), the Moving Picture Experts Group (MPEG) working group was established in January 1988, by the initiative of Hiroshi Yasuda (Nippon Telegraph and Telephone) and Leonardo Chiariglione (CSELT). MPEG was formed to address the need for standard video and audio formats, and to build on H.261 to get better quality through the use of somewhat more complex encoding methods (e.g., supporting higher precision for motion vectors). Development of the MPEG-1 standard began in May 1988. Fourteen video and fourteen audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at data rates of 1.5 Mbit/s. This specific bitrate was chosen for transmission over T-1/E-1 lines and as the approximate data rate of audio CDs. The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated in the process. After 20 meetings of the full group in various cities around the world, and 4½ years of development and testing, the final standard (for parts 1–3) was approved in early November 1992 and published a few months later. The reported completion date of the MPEG-1 standard varies greatly: a largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced. The standard was finished with the 6 November 1992 meeting. The Berkeley Plateau Multimedia Research Group developed an MPEG-1 decoder in November 1992. In July 1990, before the first draft of the MPEG-1 standard had even been written, work began on a second standard, MPEG-2, intended to extend MPEG-1 technology to provide full broadcast-quality video (as per CCIR 601) at high bitrates (3–15 Mbit/s) and support for interlaced video. Due in part to the similarity between the two codecs, the MPEG-2 standard includes full backwards compatibility with MPEG-1 video, so any MPEG-2 decoder can play MPEG-1 videos. Notably, the MPEG-1 standard very strictly defines the bitstream, and decoder function, but does not define how MPEG-1 encoding is to be performed, although a reference implementation is provided in ISO/IEC-11172-5. ==Patents==

Patents

Due to its age, MPEG-1 is no longer covered by any essential patents and can thus be used without obtaining a licence or paying any fees. The ISO patent database lists one patent for ISO 11172, US 4,472,747, which expired in 2003. The near-complete draft of the MPEG-1 standard was publicly available as ISO CD 11172 Neither the July 2008 Kuro5hin article "Patent Status of MPEG-1, H.261 and MPEG-2", nor an August 2008 thread on the gstreamer-devel mailing list were able to list a single unexpired MPEG-1 Video and MPEG-1 Audio Layer I/II patent. A May 2009 discussion on the whatwg mailing list mentioned US 5,214,678 patent as possibly covering MPEG-1 Audio Layer II. Filed in 1990 and published in 1993, this patent is now expired. A full MPEG-1 decoder and encoder, with "Layer III audio", could not be implemented royalty free since there were companies that required patent fees for implementations of MPEG-1 Audio Layer III, as discussed in the MP3 article. All patents in the world connected to MP3 expired 30 December 2017, which makes this format totally free for use. On 23 April 2017, Fraunhofer IIS stopped charging for Technicolor's MP3 licensing program for certain MP3 related patents and software. Former patent holders The following corporations filed declarations with ISO saying they held patents for the MPEG-1 Video (ISO/IEC-11172-2) format, although all such patents have since expired. • BBC • Daimler Benz AG • Fujitsu • IBM • Matsushita Electric Industrial Co., Ltd. • Mitsubishi Electric • NEC • NHK • Philips • Pioneer Corporation • Qualcomm • Ricoh • Sony • Texas Instruments • Thomson Multimedia • Toppan Printing • Toshiba • Victor Company of Japan ==Applications==

Applications

• Most popular software for video playback includes MPEG-1 decoding, in addition to any other supported formats. • The popularity of MP3 audio has established a massive installed base of hardware that can play back MPEG-1 Audio (all three layers). • "Virtually all digital audio devices" can play back MPEG-1 Audio. • The widespread popularity of MPEG-2 with broadcasters means MPEG-1 is playable by most digital cable and satellite set-top boxes, and digital disc and tape players, due to backwards compatibility. • MPEG-1 was used for full-screen video on Green Book CD-i, and on Video CD (VCD). • The Super Video CD standard, based on VCD, uses MPEG-1 audio exclusively, as well as MPEG-2 video. • The DVD-Video format uses MPEG-2 video primarily, but MPEG-1 support is explicitly defined in the standard. • The DVD-Video standard originally required MPEG-1 Audio Layer II for PAL countries, but was changed to allow AC-3/Dolby Digital-only discs. MPEG-1 Audio Layer II is still allowed on DVDs, although newer extensions to the format, like MPEG Multichannel, are rarely supported. • Most DVD players also support Video CD and MP3 CD playback, which use MPEG-1. • The international Digital Video Broadcasting (DVB) standard primarily uses MPEG-1 Audio Layer II, and MPEG-2 video. • The international Digital Audio Broadcasting (DAB) standard uses MPEG-1 Audio Layer II exclusively, due to its especially high quality, modest decoder performance requirements, and tolerance of errors. • The Digital Compact Cassette uses PASC (Precision Adaptive Sub-band Coding) to encode its audio. PASC is an early version of MPEG-1 Audio Layer I with a fixed bit rate of 384 kilobits per second. ==Part 1: Systems==

Part 1: Systems

Part 1 of the MPEG-1 standard covers systems, and is defined in ISO/IEC-11172-1. MPEG-1 Systems specifies the logical layout and methods used to store the encoded audio, video, and other data into a standard bitstream, and to maintain synchronization between the different contents. This file format is specifically designed for storage on media, and transmission over communication channels, that are considered relatively reliable. Only limited error protection is defined by the standard, and small errors in the bitstream may cause noticeable defects. This structure was later named an MPEG program stream: "The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure." This terminology is more popular, precise (differentiates it from an MPEG transport stream) and will be used here. Elementary streams, packets, and clock references • Elementary Streams (ES) are the raw bitstreams of MPEG-1 audio and video encoded data (output from an encoder). These files can be distributed on their own, such as is the case with MP3 files. • Packetized Elementary Streams (PES) are elementary streams packetized into packets of variable lengths, i.e., divided ES into independent chunks where cyclic redundancy check (CRC) checksum was added to each packet for error detection. • System Clock Reference (SCR) is a timing value stored in a 33-bit header of each PES, at a frequency/precision of 90 kHz, with an extra 9-bit extension that stores additional timing data with a precision of 27 MHz. These are inserted by the encoder, derived from the system time clock (STC). Simultaneously encoded audio and video streams will not have identical SCR values, however, due to buffering, encoding, jitter, and other delay. Program streams Program Streams (PS) are concerned with combining multiple packetized elementary streams (usually just one audio and video PES) into a single stream, ensuring simultaneous delivery, and maintaining synchronization. The PS structure is known as a multiplex, or a container format. Presentation time stamps (PTS) exist in PS to correct the inevitable disparity between audio and video SCR values (time-base correction). 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values. Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded. PTS handling can be problematic. Decoders must accept multiple program streams that have been concatenated (joined sequentially). This causes PTS values in the middle of the video to reset to zero, which then begin incrementing again. Such PTS wraparound disparities can cause timing issues that must be specially handled by the decoder. Decoding Time Stamps (DTS), additionally, are required because of B-frames. With B-frames in the video stream, adjacent frames have to be encoded and decoded out-of-order (re-ordered frames). DTS is quite similar to PTS, but instead of just handling sequential frames, it contains the proper time-stamps to tell the decoder when to decode and display the next B-frame (types of frames explained below), ahead of its anchor (P- or I-) frame. Without B-frames in the video, PTS and DTS values are identical. Multiplexing To generate the PS, the multiplexer will interleave the (two or more) packetized elementary streams. This is done so the packets of the simultaneous streams can be transferred over the same channel and are guaranteed to both arrive at the decoder at precisely the same time. This is a case of time-division multiplexing. Determining how much data from each stream should be in each interleaved segment (the size of the interleave) is complicated, yet an important requirement. Improper interleaving will result in buffer underflows or overflows, as the receiver gets more of one stream than it can store (e.g. audio), before it gets enough data to decode the other simultaneous stream (e.g. video). The MPEG Video Buffering Verifier (VBV) assists in determining if a multiplexed PS can be decoded by a device with a specified data throughput rate and buffer size. This offers feedback to the multiplexer and the encoder, so that they can change the multiplex size or adjust bitrates as needed for compliance. ==Part 2: Video==

Part 2: Video

Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC-11172-2. The design was heavily influenced by H.261. MPEG-1 Video exploits perceptual compression methods to significantly reduce the data rate required by a video stream. It reduces or completely discards information in certain frequencies and areas of the picture that the human eye has limited ability to fully perceive. It also exploits temporal (over time) and spatial (across a picture) redundancy common in video to achieve better data compression than would be possible otherwise. (See: Video compression) Color space Before encoding video to MPEG-1, the color-space is transformed to Y′CbCr (Y′=Luma, Cb=Chroma Blue, Cr=Chroma Red). Luma (brightness, resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components. The chroma is also subsampled to 4:2:0, meaning it is reduced to half resolution vertically and half resolution horizontally, i.e., to just one quarter the number of samples used for the luma component of the video. The length between I-frames is known as the group of pictures (GOP) size. MPEG-1 most commonly uses a GOP size of 15–18. i.e. 1 I-frame for every 14-17 non-I-frames (some combination of P- and B- frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit. Partial macroblocks, and black borders/bars encoded into the video that do not fall exactly on a macroblock boundary, cause havoc with motion prediction. The block padding/border information prevents the macroblock from closely matching with any other area of the video, and so, significantly larger prediction error information must be encoded for every one of the several dozen partial macroblocks along the screen border. DCT encoding and quantization (see below) also isn't nearly as effective when there is large/sharp picture contrast in a block. An even more serious problem exists with macroblocks that contain significant, random, edge noise, where the picture transitions to (typically) black. All the above problems also apply to edge noise. In addition, the added randomness is simply impossible to compress significantly. All of these effects will lower the quality (or increase the bitrate) of the video substantially. DCT Each 8×8 block is encoded by first applying a forward discrete cosine transform (FDCT) and then a quantization process. The FDCT process (by itself) is theoretically lossless, and can be reversed by applying an Inverse DCT (IDCT) to reproduce the original values (in the absence of any quantization and rounding errors). In reality, there are some (sometimes large) rounding errors introduced both by quantization in the encoder (as described in the next section) and by IDCT approximation error in the decoder. The minimum allowed accuracy of a decoder IDCT approximation is defined by ISO/IEC 23002-1. (Prior to 2006, it was specified by IEEE 1180-1990.) The FDCT process converts the 8×8 block of uncompressed pixel values (brightness or color difference values) into an 8×8 indexed array of frequency coefficient values. One of these is the (statistically high in variance) "DC coefficient", which represents the average value of the entire 8×8 block. The other 63 coefficients are the statistically smaller "AC coefficients", which have positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient. An example of an encoded 8×8 FDCT block: : \begin{bmatrix} -415 & -30 & -61 & 27 & 56 & -20 & -2 & 0 \\ 4 & -22 & -61 & 10 & 13 & -7 & -9 & 5 \\ -47 & 7 & 77 & -25 & -29 & 10 & 5 & -6 \\ -49 & 12 & 34 & -15 & -10 & 6 & 2 & 2 \\ 12 & -7 & -13 & -4 & -2 & 2 & -3 & 3 \\ -8 & 3 & 2 & -6 & -2 & 1 & 4 & 2 \\ -1 & 0 & 0 & -2 & -1 & -3 & 4 & -1 \\ 0 & 0 & -1 & -4 & -1 & 0 & 1 & 2 \end{bmatrix} Since the DC coefficient value is statistically correlated from one block to the next, it is compressed using DPCM encoding. Only the (smaller) amount of difference between each DC value and the value of the DC coefficient in the block to its left needs to be represented in the final bitstream. Additionally, the frequency conversion performed by applying the DCT provides a statistical decorrelation function to efficiently concentrate the signal into fewer high-amplitude values prior to applying quantization (see below). Quantization Quantization is, essentially, the process of reducing the accuracy of a signal, by dividing it by some larger step size and rounding to an integer value (i.e. finding the nearest multiple, and discarding the remainder). The frame-level quantizer is a number from 0 to 31 (although encoders will usually omit/disable some of the extreme values) which determines how much information will be removed from a given frame. The frame-level quantizer is typically either dynamically selected by the encoder to maintain a certain user-specified bitrate, or (much less commonly) directly specified by the user. A "quantization matrix" is a string of 64 numbers (ranging from 0 to 255) which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image. An example quantization matrix: : \begin{bmatrix} 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61 \\ 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55 \\ 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56 \\ 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62 \\ 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77 \\ 24 & 35 & 55 & 64 & 81 & 104 & 113 & 92 \\ 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101 \\ 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99 \end{bmatrix} Quantization is performed by taking each of the 64 frequency values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix. Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more strongly quantized (drastically reduced). MPEG-1 actually uses two separate quantization matrices, one for intra-blocks (I-blocks) and one for inter-block (P- and B- blocks) so quantization of different block types can be done independently, and so, more effectively. ==Part 3: Audio==

Part 3: Audio

Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC-11172-3. MPEG-1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that it deduces that the human ear can't hear, either because they are in frequencies where the ear has limited sensitivity, or are masked by other (typically louder) sounds. • Layer II: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s • Layer III: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s MPEG-1 Audio is divided into 3 layers. Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous. Decoding MP2 audio is computationally simple relative to MP3, AAC, etc. History/MUSICAM MPEG-1 Audio Layer II was derived from the MUSICAM (Masking pattern adapted Universal Subband Integrated Coding And Multiplexing) audio codec, developed by Centre commun d'études de télévision et télécommunications (CCETT), Philips, and Institut für Rundfunktechnik (IRT/CNET) as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Audio Layer II standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons. Technical details MP2 is a time-domain encoder. It uses a low-delay 32 sub-band polyphased filter bank for time-frequency mapping; having overlapping ranges (i.e. polyphased) to prevent aliasing. Layer II can also optionally use intensity stereo coding, a form of joint stereo. This means that the frequencies above 6 kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound. That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual entropy, at just over 1:8. Achieving much higher compression is simply not possible without discarding some perceptible information. MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause. This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate. The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from the latter test. Layer II audio files typically use the extension ".mp2" or sometimes ".m2a". Layer III MPEG-1 Audio Layer III (the first version of MP3) is a lossy audio format designed to provide acceptable quality at about 64 kbit/s for monaural audio over single-channel (BRI) ISDN links, and 128 kbit/s for stereo sound. History/ASPEC , with encoder (below) and decoder MPEG-1 Audio Layer III was derived from the Adaptive Spectral Perceptual Entropy Coding (ASPEC) codec developed by Fraunhofer as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. ASPEC was adapted to fit in with the Layer II model (frame size, filter bank, FFT, etc.), to become Layer III. Technical details MP3 is a frequency-domain audio transform encoder. Even though it utilizes some of the lower layer functions, MP3 is quite different from MP2. MP3 works on 1152 samples like MP2, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place. MP3 uses pre-echo detection routines, and VBR encoding, which allows it to temporarily increase the bitrate during difficult passages, in an attempt to reduce this effect. It is also able to switch between the normal 36 sample quantization window, and instead using 3× short 12 sample windows instead, to reduce the temporal (time) length of quantization artifacts. Unlike Layers I and II, MP3 uses variable-length Huffman coding (after perceptual) to further reduce the bitrate, without any further quality loss. MPEG-2 Audio is defined in ISO/IEC 13818-3. • MPEG Multichannel – Backward compatible 5.1-channel surround sound. ==Part 4: Conformance testing==

Part 4: Conformance testing

Part 4 of the MPEG-1 standard covers conformance testing, and is defined in ISO/IEC-11172-4. Conformance: Procedures for testing conformance. Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG-1 audio and video decoders, as well as the bitstreams produced by an encoder. ==Part 5: Reference software==

Part 5: Reference software

Part 5 of the MPEG-1 standard includes reference software, and is defined in ISO/IEC TR 11172–5. Simulation: Reference software. C reference code for encoding and decoding of audio and video, as well as multiplexing and demultiplexing. This includes the ISO Dist10 audio encoder code, which LAME and TooLAME were originally based upon. == File extension ==

File extension

.mpg is one of a number of file extensions for MPEG-1 or MPEG-2 audio and video compression. MPEG-1 Part 2 video is rare nowadays, and this extension typically refers to an MPEG program stream (defined in MPEG-1 and MPEG-2) or MPEG transport stream (defined in MPEG-2). Other suffixes such as .m2ts also exist specifying the precise container, in this case MPEG-2 TS, but this has little relevance to MPEG-1 media. .mp3 is the most common extension for files containing MP3 audio (typically MPEG-1 Audio, sometimes MPEG-2 Audio). An MP3 file is typically an uncontained stream of raw audio; the conventional way to tag MP3 files is by writing data to "garbage" segments of each frame, which preserve the media information but are discarded by the player. This is similar in many respects to how raw .AAC files are tagged (but this is less supported nowadays, e.g. iTunes). Note that although it would apply, .mpg does not normally append raw AAC or AAC in MPEG-2 Part 7 Containers. The .aac extension normally denotes these audio files. ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com