“Psychoacoustics explains the subjective response to everything we hear. It is the ultimate arbitrator in acoustical concerns because it is only our response to sound that fundamentally matters. Psychoacoustics seeks to reconcile acoustical stimuli and all the scientific, objective, and physical properties that surround them, with the physiological and psychological responses evoked by them.”. Psychoacoustics studies the relationship between acoustic sound signals, the auditory system physiology, and the psychological perception to sound, in order to explain the auditory behavioral responses of human listeners, the abilities and limitations of the human ear, and the auditory complex processes that occur inside the brain. Hearing involves a behavioral response to the physical attributes of sound including intensity, frequency, and time-based characteristics that permit the auditory system to find clues that determine distance, direction, loudness, pitch, and tone of many individual sounds simultaneously.
The human ear has three main subdivisions: the outer ear which amplifies incoming air vibrations, the middle ear that transduces these vibrations into mechanical vibrations, and the inner ear that filters and transduces these mechanical vibrations in hydrodynamic, and electro-chemical vibrations, with the result that electrochemical signals are transmitted through nerves to the brain. These three subdivisions are collectively classified as the peripheral auditory system. The Figure 2.1 shows a simplified view of the human ear.
The outer and middle ear structures enhance the sensitivity of hearing acting as a preamplifier of the sound energy spread out from its sources. Inside the outer ear the pinna captures more of the wave and hence more sound energy than the ear canal would receive without it. The auditory canal acts as a half closed tube resonator enhancing sounds in the range of 2-5 KHz.
Inside the middle ear the tympanic membrane or eardrum receives vibrations traveling up the auditory canal and transfers them through the ossicles to the oval window, the port into the inner ear. The ossicles (hammer, anvil, and stapes) achieve a multiplication of force by lever action and amplification when listening to soft sounds, but they can also be adjusted by muscle action to attenuate the sound signal for protection against loud sounds.
The inner ear consists of the semicircular canals that serve as the balance organ of the body and the cochlea that contains the basilar membrane and organ of Corti, which together form the complicated mechanisms that transduce vibrations into neural signal codes. The organ of Corti is the sensitive element in the inner ear, and it is located on the basilar membrane in one of the three compartments of the cochlea. It contains four rows of hair cells. Above them is the tectoral membrane that can move in response to pressure variations in the fluid-filled tympanic and vestibular canals. There are some 16,000 – 20,000 of the hair cells distributed along the basilar membrane that follows the spiral of the cochlea and can resolve about 1500 separate pitches.
According to the place theory, pitch is determined by the place along this collection of hair cells at which the maximum excitation occurs along the basilar membrane. On the other hand, timing and frequency theory states that the basilar membrane is assumed to move up and down in synchrony with the pressure variation of the sound wave by the movement of the stapes at the oval window. Each up and down movement results in one neural firing, so that frequency is coded directly by the rate of firing. For example, a 400 Hz tone results in hair cells firing 400 Hz per second. When the rate is above 1000 Hz, the frequency cannot be represented with individual cells, and the firings of many cells are integrated to create the correct firing rate.
Further complex auditory processing occurs in the brain, using information contained in the neural signals passed on to the brain via the auditory nerve. The auditory nerve, by taking electrical impulses from the cochlea and the semicircular canals, makes connections with both auditory areas of the brain.
In addition, physiologically, the left and right ears do not differ in their capacity for detecting sound, but the left and right brain halves do. The left cerebral hemisphere processes most speech (verbal) information; thus, the right ear that is wired to this brain side may be perceptually superior for spoken words. On the other hand, the left ear may be better at perceiving melodies because it is connected to the right brain half that processes melodic (no verbal) information.
Psychoacoustics demonstrates how remarkable the human auditory system is in terms of absolute sensitivity and in terms of the range of intensities to which it can respond. The ratio between the powers of the faintest sound we can detect and the loudest sound we can hear without damaging our ears is 1,000,000,000,000:1. This shows that the ear can accommodate a very wide dynamic range of sound intensity, and it responds to increasing sound intensity in a logarithmic relationship.
For very soft sounds, near the threshold of hearing, the ear strongly discriminates against low frequencies. For mid-range sounds around 60 phons, the discrimination is not so pronounced and for very loud sounds in the neighborhood of 120 phons, the hearing response is nearly flat. This aspect of human hearing implies that the ear will perceive a progressive loss of bass frequencies as a given sound becomes softer and softer. Figure 2.2 shows the frequency – intensity regions for auditory experience.
Binaural hearing is related to the fact that the ears are some distance apart allowing the localization of sound by registering the slight differences in time, phase, and intensity of the sound striking each ear. The ear can detect a time difference as slight as 30 msec. Both the comparison of left and right ear receptions and the evaluation of the sound’s intensity are done automatically, without any conscious thought, allowing us to identify the approximate location of the origin of a sound.
In psychoacoustics, the term pitch is considered to be the psychological perception of frequency. Much research has been done to find correlations between pitch and frequency, in which pitch is understood as a response pattern to the frequency of a sound. In music, pitch is defined as the position of a single sound in the complete range of sound. It is the feature of a sound by which listeners can arrange sounds on a scale from "lowest" to "highest." Sounds are higher or lower in pitch according to the frequency of vibration of the sound waves producing them.
Musical notation uses a logarithmic measuring scale due to the logarithmic response to frequency of the ear. For example, two different octaves are heard as the same duration interval even though the frequency range of one octave is between 100 and 200 Hz and another octave is between 1,000 and 2,000 Hz. The audible frequency range is roughly between 20 and 20,000 Hz, the most sensitive region being from 1,000 to 5,000 Hz.
Loudness is a subjective perception of the intensity of a sound, in terms of which sounds may be ordered on a scale extending from quiet to loud. Intensity is defined as the sound power per unit area. In Figure 2.3 equal loudness curves for the human ear are shown.
Each curve describes a range of frequencies that are perceived to be equally loud. The curves are rated in phons, measuring the SPL of a curve at 1,000 Hz. These curves show that the ear is less sensitive to low frequencies, and also that the maximum sensitivity region for human hearing is around 1,000 to 5,000 Hz. The dotted curve represents the threshold of hearing.
Figure 2.3 Equal loudness curves for the human ear. 
The standard threshold of hearing at 1,000 Hz is nominally taken to be 0 dB, but the actual curves show the measured threshold at 1,000 Hz to be about 4 dB.
Sounds may be generally characterized by pitch, loudness, and sound quality or timbre. Timbre is that attribute of auditory sensation in terms of which a listener can distinguish two similar sounds that have the same pitch and loudness. Timbre is mainly determined by the harmonic content and the dynamic characteristics of sound such as vibrato and the attack-decay envelope. Timbre is the characteristic that allows us to discriminate sounds produced by different instruments playing at the same time.
Simultaneous masking is a property of the human auditory system where some sounds vanish in the presence of louder sounds. For example, in the presence of very strong white noise, many weaker sounds get masked, or a tone of 500 Hz can mask a softer tone of 600 Hz. The strong sound is called the masker and the softer sound is called the maskee. This aspect of human hearing has important implications for the design of audio perceptual coders.
The goal of audio perceptual coders is to minimize the amount of data to be coded without degradation or loss of information for the listener. Data reduction can be achieved in accordance with psycho acoustical algorithms based on the concepts of critical bands, minimum threshold of hearing, and the masking phenomena. The sound signals to be coded are compared to the minimum hearing threshold and the masking curve. When a sound signal falls bellow the minimum threshold of coding, the signal is coded by using the minimal quantity of bits, and signals that fall below the threshold are discarded, because the ear cannot hear them.
A critical band is the smallest band of frequencies that activate the same part of the basilar membrane. The ear can distinguish tones a few hertz apart at low frequencies and tones must differ by hundreds of hertz at high frequencies to be differentiated. In any case, hair cells respond to the strongest stimulation in their local region that is termed a critical band. The concept of critical band was introduced by Fletcher in 1940 and has been widely tested. Experiments show that critical bands are much narrower at low frequencies than at high frequencies; three-fourths of the critical bands are below 5,000 Hz. Critical bands are analogous to a spectrum analyzer with variable center frequencies and any tone will create a critical band centered on it.
Critical bands also can be explained in another way, when two sounds of equal loudness sounded separately are close together in pitch, their combined loudness when sounded together will be only slightly louder than one of them alone. They may be said to be in the same critical band where they are competing for the same nerve endings on the basilar membrane of the inner ear. If the two sounds are widely separated in pitch, the perceived loudness of the combined tones will be considerably greater because they do not overlap on the basilar membrane and compete for the same hair cells. If the tones are far apart in frequency (not within a critical band), the combined sound may be perceived as twice as loud as one alone. The theory of critical bands is an important auditory concept because they show that the ear discriminates between energy in the band, and energy outside the band, the former promotes masking.