To varying degrees, different
phonemes can be distinguished by the properties of their source(s) and their
spectral shape. Voiced sounds (e.g., vowels) have at least one source due to mostly periodic glottal excitation, which can be approximated by an
impulse train in the time domain and by harmonics in the frequency domain, and a filter that depends on, for example, tongue position and lip protrusion. On the other hand,
fricatives, such as and , have at least one source due to turbulent noise produced at a constriction in the oral cavity or
pharynx. So-called
voiced fricatives, such as and , have two sources - one at the glottis and one at the supra-glottal constriction.
Speech synthesis In implementation of the source–filter model of speech production, the sound source, or excitation signal, is often modelled as a periodic impulse train, for voiced speech, or white noise for unvoiced speech. The vocal tract filter is, in the simplest case, approximated by an all-pole filter, where the coefficients are obtained by performing linear prediction to minimize the mean-squared error in the speech signal to be reproduced. Convolution of the excitation signal with the filter response then produces the synthesised speech.
Modeling human speech production In human speech production, the sound source is the
vocal folds, which can produce a periodic sound when constricted or an aperiodic (white noise) sound when relaxed. The filter is the rest of the vocal tract, which can change shape through manipulation of the
pharynx, mouth, and nasal cavity.
Fant roughly compares the source and filter to
phonation and
articulation, respectively. The source produces a number of
harmonics of varying
amplitudes, which travel through the vocal tract and are either amplified or
attenuated to produce a speech sound. ==See also==