File structure An MP3 file is made up of MP3 frames, which consist of a header and a data block. This sequence of frames is called an
elementary stream. Due to the "bit reservoir", frames are not independent items and cannot usually be extracted on arbitrary frame boundaries. The MP3 Data blocks contain the (compressed) audio information in terms of frequencies and amplitudes. The diagram shows that the MP3 Header consists of a
sync word, which is used to identify the beginning of a valid frame. This is followed by a bit indicating that this is the
MPEG standard and two bits that indicate that layer 3 is used; hence MPEG-1 Audio Layer 3 or MP3. After this, the values will differ, depending on the MP3 file.
ISO/IEC 11172-3 defines the range of values for each section of the header along with the specification of the header. Most MP3 files today contain
ID3 metadata, which precedes or follows the MP3 frames, as noted in the diagram. The data stream can contain an optional
checksum.
Joint stereo is done only on a frame-to-frame basis. MP3 also allows the use of shorter blocks in a granule, down to a size of 192 samples; this feature is used when a
transient is detected. Doing so limits the temporal spread of quantization noise accompanying the transient (see
psychoacoustics). Frequency resolution is limited by the small long block window size, which decreases coding efficiency. Decoding, on the other hand, is carefully defined in the standard. Most
decoders are "
bitstream compliant", which means that the decompressed output that they produce from a given MP3 file will be the same, within a specified degree of
rounding tolerance, as the output specified mathematically in the ISO/IEC high standard document (ISO/IEC 11172-3). Therefore, the comparison of decoders is usually based on how computationally efficient they are (i.e., how much
memory or
CPU time they use in the decoding process). Over time this concern has become less of an issue as
CPU clock rates transitioned from MHz to GHz. Encoder/decoder overall delay is not defined, which means there is no official provision for
gapless playback. However, some encoders such as LAME can attach additional metadata that will allow players that can handle it to deliver seamless playback.
Quality When performing lossy audio encoding, such as creating an MP3 data stream, there is a trade-off between the amount of data generated and the sound quality of the results. The person generating an MP3 selects a bit rate, which specifies how many
kilobits per second of audio is desired. The higher the bit rate, the larger the MP3 data stream will be, and, generally, the closer it will sound to the original recording. With too low a bit rate,
compression artifacts (i.e., sounds that were not present in the original recording) may be audible in the reproduction. Some audio is hard to compress because of its randomness and sharp attacks. When this type of audio is compressed, artifacts such as ringing or
pre-echo are usually heard. A sample of applause or a
triangle instrument with a relatively low bit rate provides good examples of compression artifacts. Most subjective testings of perceptual codecs tend to avoid using these types of sound materials, however, the artifacts generated by percussive sounds are barely perceptible due to the specific temporal masking feature of the 32 sub-band filterbank of Layer II on which the format is based. The
MPEG-1 standard does not include a precise specification for an MP3 encoder but does provide examples of psychoacoustic models, rate loops, and the like in the non-normative part of the original standard. MPEG-1 frames contain the most detail in mode, the highest allowable bit rate setting, with silence and simple tones still requiring . MPEG-2 frames can capture up to 12 kHz sound reproductions needed up to . A sample rate of 44.1 kHz is commonly used for music reproduction because this is also used for
CD audio, the main source used for creating MP3 files. A great variety of bit rates are used on the Internet. A bit rate of is commonly used, Early MPEG Layer III encoders used what is now called
constant bit rate (CBR). The software was only able to use a uniform bit rate on all frames in an MP3 file. Later more sophisticated MP3 encoders were able to use the bit reservoir to target an
average bit rate selecting the encoding rate for each frame based on the complexity of the sound in that portion of the recording. A more sophisticated MP3 encoder can produce variable bit rate audio. MPEG audio may use bit rate switching on a per-frame basis, but only layer III decoders must support it. VBR is used when the goal is to achieve a fixed level of quality. The final file size of a VBR encoding is less predictable than with constant bit rate. Average bit rate is a type of VBR implemented as a compromise between the two: the bit rate is allowed to vary for more consistent quality, but is controlled to remain near an average value chosen by the user, for predictable file sizes. Although an MP3 decoder must support VBR to be standards compliant, historically some decoders have bugs with VBR decoding, particularly before VBR encoders became widespread. Layer III audio can also use a "bit reservoir", a partially full frame's ability to hold part of the next frame's audio data, allowing temporary changes in effective bit rate, even in a constant bit rate stream. Internal handling of the bit reservoir increases encoding delay. There is no scale factor band 21 (sfb21) for frequencies above approx 16
kHz, forcing the encoder to choose between less accurate representation in band 21 or less efficient storage in all bands below band 21, the latter resulting in wasted bit rate in VBR encoding.
Ancillary data The ancillary data field can be used to store user-defined data. The ancillary data is optional and the number of bits available is not explicitly given. The ancillary data is located after the Huffman code bits and ranges to where the next frame's main_data_begin points to. Encoder
mp3PRO used ancillary data to encode extra information which could improve audio quality when decoded with its algorithm.
Metadata A "tag" in an audio file is a section of the file that contains
metadata such as the title, artist, album, track number, or other information about the file's contents. The MP3 standards do not define tag formats for MP3 files, nor is there a standard
container format that would support metadata and obviate the need for tags. However, several
de facto standards for tag formats exist. As of 2010, the most widespread are
ID3v1 and ID3v2, and the more recently introduced
APEv2. These tags are normally embedded at the beginning or end of MP3 files, separate from the actual MP3 frame data. MP3 decoders either extract information from the tags or just treat them as ignorable, non-MP3 junk data. Playing and editing software often contains tag editing functionality, but there are also
tag editor applications dedicated to the purpose. Aside from metadata about the audio content, tags may also be used for
DRM.
ReplayGain is a standard for measuring and storing the loudness of an MP3 file (
audio normalization) in its metadata tag, enabling a ReplayGain-compliant player to automatically adjust the overall playback volume for each file.
MP3Gain may be used to reversibly modify files based on ReplayGain measurements so that adjusted playback can be achieved on players without ReplayGain capability. == Licensing, ownership, and legislation ==