Internally, parametric audio coding algorithms operate on 10 ms PCM frames using a model of the human voice. Each of these audio segments is declared
voiced (vowel) or unvoiced (consonant). Codec 2 uses
sinusoidal coding to model speech, which is closely related to that of
multi-band excitation codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called
Line spectral pairs, or LSP, on top of a determined
fundamental frequency of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the
harmonics are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the
Linear Predictive Coding (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters. The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally
gray coded before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing Booleans, LSP's, etc.). For example, Mode 3200, has 20 ms of audio converted to 64 bits. So 64 bits will be output every 20 ms (50 times a second), for a minimum data rate of 3200 bit/s. These 64 bits are sent as 8 bytes to the application, which has to unwrap the bit fields, or send the bytes over a data channel. Another example is Mode 1300, which is sent 40 ms of audio, and outputs 52 bits every 40 ms (25 times a second), for a minimum rate of 1300 bit/s. These 52 bits are sent as 7 bytes to the application or data channel. == Adoption ==