Psychoacoustics

Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound, including noise, speech, and music. Psychoacoustics is an interdisciplinary field including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

Background

Hearing is not a purely mechanical phenomenon of wave propagation, but is also a sensory and perceptual event. When a person hears something, that something arrives at the ear as a mechanical sound wave traveling through the air, but within the ear it is transformed into neural action potentials. These nerve pulses then travel to the brain where they are perceived. Hence, in many problems in acoustics, such as for audio processing, it is advantageous to take into account not just the mechanics of the environment, but also the fact that both the ear and the brain are involved in a person's listening experience. The inner ear, for example, does significant signal processing in converting sound waveforms into neural stimuli, this processing renders certain differences between waveforms imperceptible. Data compression techniques, such as MP3, make use of this fact. In addition, the ear has a nonlinear response to sounds of different intensity levels; this nonlinear response is called loudness. Telephone networks and audio noise reduction systems make use of this fact by nonlinearly compressing data samples before transmission and then expanding them for playback. Another effect of the ear's nonlinear response is that sounds that are close in frequency produce phantom beat notes, or intermodulation distortion products. There are at least five features to identify effective psychoacoustic practices: Loudness (the measures of perceived volume), Roughness (sensory dissonance), Sharpness (spectral distribution), Tonalness (the ratio of tonal spectral peaks), and Spaciousness (to predict perceived Spaciousness). Another procedure for recognizing music genres or recommending music is to remove a wide range of objective features that are not entirely related to human perception. However, there are some low level features that aren't related to human/physical perception, but can improve the discovery of psychoacoustics. The first one, Root Mean Square (RMS), is another way how sound can be measured, specifically loudness. RMS is a significant process of measurement because it helps people monitor volume. Spectral Rolloff helps guide the frequency to a balance. Spectral Flatness is considered to outline how loud or quite a noise range. Lastly, Inter Channel Cross Correlation estimates the relationship between how one ear perceived sound in relation to the other ear. == Limits of perception ==

Limits of perception

. Note peak sensitivity around , in the middle of the voice frequency band. The human ear can nominally hear sounds in the range . The upper limit tends to decrease with age; most adults are unable to hear above . Under ideal laboratory conditions, the lowest frequency that has been identified as a musical tone is 12 Hz. Tones between 4 and 16 Hz can be perceived via the body's sense of touch. Human perception of audio signal time separation has been measured to be less than . This does not mean that frequencies above are audible, but that time discrimination is not directly coupled with frequency range. Frequency resolution of the ear is about 3.6 Hz within the octave of That is, changes in pitch larger than 3.6 Hz can be perceived in a clinical setting. The ATH is the lowest of the equal-loudness contours. Equal-loudness contours indicate the sound pressure level (dB SPL), over the range of audible frequencies that are perceived as being of equal loudness. Equal-loudness contours were first measured by Fletcher and Munson at Bell Labs in 1933 using pure tones reproduced via headphones, and the data they collected are called Fletcher–Munson curves. Because subjective loudness was difficult to measure, the Fletcher–Munson curves were averaged over many subjects. Robinson and Dadson refined the process in 1956 to obtain a new set of equal-loudness curves for a frontal sound source measured in an anechoic chamber. The Robinson-Dadson curves were standardized as in 1986. In 2003, was revised using data collected from 12 international studies. ==Sound localization==

Sound localization

Sound localization is the process of determining the location of a sound source. The brain utilizes subtle differences in loudness, tone and timing between the two ears to allow us to localize sound sources. Localization can be described in terms of three-dimensional position: the azimuth or horizontal angle, the zenith or vertical angle, and the distance (for static sounds) or velocity (for moving sounds). Humans, as most four-legged animals, are adept at detecting direction in the horizontal, but less so in the vertical directions due to the ears being placed symmetrically. Some species of owls have their ears placed asymmetrically and can detect sound in all three planes, an adaptation to hunt small mammals in the dark. == Masking effects ==

Masking effects

Suppose a listener can hear a given acoustical signal under silent conditions. When a signal is playing while another sound is being played, the signal has to be stronger for the listener to hear it. The interfering signal is known as the masker and the impeded listening, masking. The masker does not need to have the frequency components of the original signal for masking to happen. A masked signal can be heard even though it is weaker than the masker. Masking happens when a signal and a masker are played together—for instance, when one person whispers while another person shouts—and the listener doesn't hear the weaker signal as it has been masked by the louder masker. Masking can also happen to a signal before a masker starts or after a masker stops. For example, a sudden loud clap sound can make sounds inaudible immediately preceding or following. The effect of backward masking is weaker than forward masking. The masking effect has been widely studied in psychoacoustical research and are exploited in lossy audio encoding, such as MP3. == Missing fundamental ==

Missing fundamental

When presented with a harmonic series of frequencies in the relationship 2f, 3f, 4f, 5f, etc. (where f is a specific frequency), humans tend to perceive that the pitch is f. An audible example can be found on YouTube. == Music ==

Music

Psychoacoustics includes topics and studies that are relevant to music psychology and music therapy. Theorists such as Benjamin Boretz consider some of the results of psychoacoustics to be meaningful only in a musical context. Irv Teibel's Environments series LPs (1969–79) are an early example of commercially available sounds released expressly for enhancing psychological abilities. == Applied psychoacoustics ==

Applied psychoacoustics

Psychoacoustics has long enjoyed a symbiotic relationship with computer science. Internet pioneers J. C. R. Licklider and Bob Taylor both completed graduate-level work in psychoacoustics, while BBN Technologies originally specialized in consulting on acoustics issues before it began building the first packet-switched network. Licklider wrote a paper entitled "A duplex theory of pitch perception". Psychoacoustics is applied within many fields of software development, where developers map proven and experimental mathematical patterns in digital signal processing. Many audio compression codecs such as MP3 and Opus use a psychoacoustic model to increase compression ratios. The success of conventional audio systems for the reproduction of music in theatres and homes can be attributed to psychoacoustics and psychoacoustic considerations gave rise to novel audio systems, such as psychoacoustic sound field synthesis. Furthermore, scientists have experimented with limited success in creating new acoustic weapons, which emit frequencies that may impair, harm, or kill. Psychoacoustics are also leveraged in sonification to make multiple independent data dimensions audible and easily interpretable. This enables auditory guidance without the need for spatial audio and in sonification computer games and other applications, such as drone flying and image-guided surgery. It is also applied today within music, where musicians and artists continue to create new auditory experiences by masking unwanted frequencies of instruments, causing other frequencies to be enhanced. Yet another application is in the design of small or lower-quality loudspeakers, which can use the phenomenon of missing fundamentals to give the effect of bass notes at lower frequencies than the loudspeakers are physically able to produce (see references). Automobile manufacturers engineer their engines and even doors to have a certain sound. == Perceptual audio coding == The psychoacoustic model provides for high-quality lossy signal compression by describing which parts of a given digital audio signal can be removed or reproduced with reduced quality without significant loss in the perceived quality of the sound. This provides great benefit to the overall compression ratio, and psychoacoustic analysis routinely leads to compressed music files that are one-tenth to one-twelfth the size of high-quality masters, but with discernibly less proportional quality loss. Such compression is a feature of nearly all modern lossy audio compression formats. Some of these formats include Dolby Digital (AC-3), MP3, Opus, Ogg Vorbis, AAC, WMA, MPEG-1 Layer II (used for digital audio broadcasting in several countries), and ATRAC, the compression used in MiniDisc and some Walkman models. Psychoacoustics is based heavily on human anatomy, especially the ear's limitations in perceiving sound as outlined previously. To summarize, the main limitations are: • High-frequency limit • Absolute threshold of hearing • Temporal masking (forward masking, backward masking) • Simultaneous masking (also known as spectral masking) A compression algorithm can assign a lower priority to sounds outside the range of human hearing and reduce the precision of different frequencies according to the predicted masking level. By carefully shifting bits away from the unimportant components and toward the important ones, the algorithm ensures that the sounds a listener is most likely to perceive are most accurately represented. Audio encoders analyse audio using a perceptual model (psychoacoustic model), in order to compute the required precision per frequency band or temporal section. Results of this computation are then used to adjust coding precision as a function of frequency and time through a set of coding tools that are dependent on the audio encoding format, as different formats support different coding tools. Examples of such coding tools are: • Frequency filtering (lowpass, highpass) • Transform window selection (size and model) • Joint stereo coding • Parametric stereo • Sample requantization • Non-linear quantization • Vector quantization • Temporal noise shaping (TNS) • Perceptual noise substitution (PNS) • Spectral band replication (SBR) In many encoders, a rate control algorithm ensures that the resulting bitrate of the coded audio is within defined limits. If transparent coding can't be achieved at the target bitrate, then the rate control algorithms will adjust coding precision (and thus introduce distortion) in various parts of the sound spectrum, using guidance from data computed by the psychoacoustic model, until the target bitrate can be matched. ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com