Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC-11172-3. MPEG-1 Audio utilizes
psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that it deduces that the human ear can't
hear, either because they are in frequencies where the ear has limited sensitivity, or are
masked by other (typically louder) sounds. • Layer II: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s • Layer III: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s MPEG-1 Audio is divided into 3 layers. Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous. Decoding MP2 audio is
computationally simple relative to MP3,
AAC, etc.
History/MUSICAM MPEG-1 Audio Layer II was derived from the MUSICAM (
Masking pattern adapted Universal Subband Integrated Coding And Multiplexing) audio codec, developed by
Centre commun d'études de télévision et télécommunications (CCETT),
Philips, and
Institut für Rundfunktechnik (IRT/CNET) as part of the
EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Audio Layer II standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons.
Technical details MP2 is a time-domain encoder. It uses a low-delay 32 sub-band
polyphased filter bank for time-frequency mapping; having overlapping ranges (i.e. polyphased) to prevent aliasing. Layer II can also optionally use
intensity stereo coding, a form of joint stereo. This means that the frequencies above 6 kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound. That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual
entropy, at just over 1:8. Achieving much higher compression is simply not possible without discarding some perceptible information. MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause. This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate. The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from the latter test. Layer II audio files typically use the extension ".mp2" or sometimes ".m2a".
Layer III MPEG-1 Audio Layer III (the first version of
MP3) is a
lossy audio format designed to provide acceptable quality at about 64 kbit/s for monaural audio over single-channel (
BRI)
ISDN links, and 128 kbit/s for stereo sound.
History/ASPEC , with encoder (below) and decoder MPEG-1 Audio Layer III was derived from the
Adaptive Spectral Perceptual Entropy Coding (ASPEC) codec developed by Fraunhofer as part of the
EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. ASPEC was adapted to fit in with the Layer II model (frame size, filter bank, FFT, etc.), to become Layer III.
Technical details MP3 is a frequency-domain audio
transform encoder. Even though it utilizes some of the lower layer functions, MP3 is quite different from MP2. MP3 works on 1152 samples like MP2, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place. MP3 uses pre-echo detection routines, and VBR encoding, which allows it to temporarily increase the bitrate during difficult passages, in an attempt to reduce this effect. It is also able to switch between the normal 36 sample quantization window, and instead using 3× short 12 sample windows instead, to reduce the temporal (time) length of quantization artifacts. Unlike Layers I and II, MP3 uses variable-length
Huffman coding (after perceptual) to further reduce the bitrate, without any further quality loss. MPEG-2 Audio is defined in ISO/IEC 13818-3. •
MPEG Multichannel – Backward compatible 5.1-channel
surround sound. ==Part 4: Conformance testing==