The
human voice consists of sounds generated by the periodic opening and closing of the
glottis by the
vocal cords, which produces an acoustic waveform with many
harmonics. This initial sound is then
filtered by movements in the nose, mouth and throat (a complicated
resonant piping system known as the
vocal tract) to produce fluctuations in harmonic content (
formants) in a controlled way, creating the wide variety of sounds used in speech. There is another set of sounds, known as the
unvoiced and
plosive sounds, which are created or modified by a variety of sound-generating disruptions of airflow occurring in the
vocal tract. The vocoder analyzes speech by measuring how its
spectral energy distribution characteristics fluctuate across time. This analysis results in a set of temporally parallel
envelope signals, each representing the individual
frequency band amplitudes of the user's speech. Put another way, the voice signal is divided into a number of
frequency bands (the larger this number, the more accurate the analysis) and the level of signal present at each frequency band, occurring simultaneously, is measured by an
envelope follower, representing the spectral energy distribution across time. This set of envelope amplitude signals is called the
"modulator". To recreate speech, the vocoder reverses the analysis process, variably filtering an initial broadband noise (referred to alternately as the "source" or "carrier"), by passing it through a set of
band-pass filters, whose individual envelope amplitude levels are controlled, in real time, by the set of envelope amplitude signals from the modulator. The digital encoding process involves a periodic analysis of each of the modulator's multiband set of envelope amplitudes. This analysis results in a set of digital
pulse code modulation stream readings. Then the pulse code modulation stream outputs of each band are transmitted to a decoder. The decoder applies the pulse code modulations as control signals to the corresponding amplifiers of the output filter channels. Information about the
fundamental frequency of the initial voice signal (as distinct from its spectral characteristic) is discarded; it was not important to preserve this for the vocoder's original use as an encryption aid. It is this dehumanizing aspect of the vocoding process that has made it useful in creating special voice effects in popular music and audio entertainment. Instead of a point-by-point recreation of the waveform, the vocoder process sends only the parameters of the vocal model over the communication link. Since the parameters change slowly compared to the original speech waveform, the bandwidth required to transmit speech can be reduced. This allows more speech channels to utilize a given
communication channel, such as a radio channel or a
submarine cable. Analog vocoders typically analyze an incoming signal by splitting the signal into multiple tuned frequency bands or ranges. To reconstruct the signal, a
carrier signal is sent through a series of these tuned band-pass filters. In the example of a typical robot voice the carrier is noise or a
sawtooth waveform. There are usually between 8 and 20 bands. The amplitude of the modulator for each of the individual analysis bands generates a voltage that is used to control amplifiers for each of the corresponding carrier bands. The result is that frequency components of the modulating signal are mapped onto the carrier signal as discrete amplitude changes in each of the frequency bands. Often there is an unvoiced band or
sibilance channel. This is for frequencies that are outside the analysis bands for typical speech but are still important in speech. Examples are words that start with the letters
s,
f,
ch or any other sibilant sound. Using this band produces recognizable speech, although somewhat mechanical sounding. Vocoders often include a second system for generating unvoiced sounds, using a
noise generator instead of the fundamental frequency. This is mixed with the carrier output to increase clarity. In the channel vocoder algorithm, among the two components of an
analytic signal, considering only the
amplitude component and simply ignoring the
phase component tends to result in an unclear voice; on methods for rectifying this, see
phase vocoder. ==History==