“Psychoacoustics explains
the subjective response to everything we hear. It is the ultimate arbitrator in
acoustical concerns because it is only our response to sound that fundamentally
matters. Psychoacoustics seeks to
reconcile acoustical stimuli and all the scientific, objective, and physical
properties that surround them, with the physiological and psychological
responses evoked by them.”[25].
Psychoacoustics studies the relationship between acoustic sound signals,
the auditory system physiology, and the psychological perception to sound, in
order to explain the auditory behavioral responses of human listeners, the
abilities and limitations of the human ear, and the auditory complex processes
that occur inside the brain. Hearing
involves a behavioral response to the physical attributes of sound including
intensity, frequency, and time-based characteristics that permit the auditory
system to find clues that determine distance, direction, loudness, pitch, and
tone of many individual sounds simultaneously.
The human ear has three main
subdivisions: the outer ear which amplifies incoming air vibrations, the middle
ear that transduces these vibrations into mechanical vibrations, and the inner
ear that filters and transduces these mechanical vibrations in hydrodynamic,
and electro-chemical vibrations, with the result that electrochemical signals
are transmitted through nerves to the brain.
These three subdivisions are collectively classified as the peripheral
auditory system. The Figure 2.1 shows a
simplified view of the human ear.
|
|
The outer and middle ear structures
enhance the sensitivity of hearing acting as a preamplifier of the sound energy
spread out from its sources. Inside the
outer ear the pinna captures more of the wave and hence more sound energy than
the ear canal would receive without it.
The auditory canal acts as a half closed tube resonator enhancing sounds
in the range of 2-5 KHz.
Inside the middle ear the tympanic
membrane or eardrum receives vibrations traveling up the auditory canal and
transfers them through the ossicles to the oval window, the port into the inner
ear. The ossicles (hammer, anvil, and
stapes) achieve a multiplication of force by lever action and amplification
when listening to soft sounds, but they can also be adjusted by muscle action
to attenuate the sound signal for protection against loud sounds.
The inner ear consists of the
semicircular canals that serve as the balance organ of the body and the cochlea
that contains the basilar membrane and organ of Corti, which together form the
complicated mechanisms that transduce vibrations into neural signal codes. The organ of Corti is the sensitive element
in the inner ear, and it is located on the basilar membrane in one of the three
compartments of the cochlea. It
contains four rows of hair cells. Above
them is the tectoral membrane that can move in response to pressure variations
in the fluid-filled tympanic and vestibular canals. There are some 16,000 – 20,000 of the hair cells distributed
along the basilar membrane that follows the spiral of the cochlea and can
resolve about 1500 separate pitches.
According to the place theory, pitch is
determined by the place along this collection of hair cells at which the
maximum excitation occurs along the basilar membrane. On the other hand, timing and frequency theory states that the
basilar membrane is assumed to move up and down in synchrony with the pressure
variation of the sound wave by the movement of the stapes at the oval
window. Each up and down movement
results in one neural firing, so that frequency is coded directly by the rate
of firing. For example, a 400 Hz tone results in hair cells firing 400 Hz per
second. When the rate is above 1000 Hz,
the frequency cannot be represented with individual cells, and the firings of
many cells are integrated to create the correct firing rate.
Further complex auditory processing
occurs in the brain, using information contained in the neural signals passed
on to the brain via the auditory nerve.
The auditory nerve, by taking electrical impulses from the cochlea and
the semicircular canals, makes connections with both auditory areas of the
brain.
In addition, physiologically, the left
and right ears do not differ in their capacity for detecting sound, but the
left and right brain halves do. The
left cerebral hemisphere processes most speech (verbal) information; thus, the
right ear that is wired to this brain side may be perceptually superior for
spoken words. On the other hand, the
left ear may be better at perceiving melodies because it is connected to the
right brain half that processes melodic (no verbal) information.
Psychoacoustics demonstrates how
remarkable the human auditory system is in terms of absolute sensitivity and in
terms of the range of intensities to which it can respond. The ratio between the powers of the faintest
sound we can detect and the loudest sound we can hear without damaging our ears
is 1,000,000,000,000:1. This shows that
the ear can accommodate a very wide dynamic range of sound intensity, and it
responds to increasing sound intensity in a logarithmic relationship.
For very soft sounds, near the threshold
of hearing, the ear strongly discriminates against low frequencies. For mid-range sounds around 60 phons, the
discrimination is not so pronounced and for very loud sounds in the
neighborhood of 120 phons, the hearing response is nearly flat. This aspect of human hearing implies that
the ear will perceive a progressive loss of bass frequencies as a given sound
becomes softer and softer. Figure 2.2
shows the frequency – intensity regions for auditory experience.
|
Figure 2.2 Frequency – Intensity
regions for auditory experience. [13] |
Binaural hearing is related to the fact
that the ears are some distance apart allowing the localization of sound by
registering the slight differences in time, phase, and intensity of the sound
striking each ear. The ear can detect a
time difference as slight as 30 msec.
Both the comparison of left and right ear receptions and the evaluation
of the sound’s intensity are done automatically, without any conscious thought,
allowing us to identify the approximate location of the origin of a sound.
In psychoacoustics, the term pitch is
considered to be the psychological perception of frequency. Much research has been done to find
correlations between pitch and frequency, in which pitch is understood as a
response pattern to the frequency of a sound.
In music, pitch is defined as the position of a single sound in the
complete range of sound. It is the
feature of a sound by which listeners can arrange sounds on a scale from
"lowest" to "highest."
Sounds are higher or lower in pitch according to the frequency of
vibration of the sound waves producing them.
Musical notation uses a logarithmic
measuring scale due to the logarithmic response to frequency of the ear. For example, two different octaves are heard
as the same duration interval even though the frequency range of one octave is
between 100 and 200 Hz and another octave is between 1,000 and 2,000 Hz. The audible frequency range is roughly
between 20 and 20,000 Hz, the most sensitive region being from 1,000 to 5,000
Hz.
Loudness is a subjective
perception of the intensity of a sound, in terms of which sounds may be ordered
on a scale extending from quiet to loud.
Intensity is defined as the sound power per unit area. In Figure 2.3 equal loudness curves for the
human ear are shown.
Each curve describes a range
of frequencies that are perceived to be equally loud. The curves are rated in phons, measuring the SPL of a curve at
1,000 Hz. These curves show that the
ear is less sensitive to low frequencies, and also that the maximum sensitivity
region for human hearing is around 1,000 to 5,000 Hz. The dotted curve represents the threshold of hearing.
|
Figure 2.3 Equal loudness curves for the human ear. [25] |
The standard threshold of
hearing at 1,000 Hz is nominally taken to be 0 dB, but the actual curves show
the measured threshold at 1,000 Hz to be about 4 dB.
Sounds may be generally
characterized by pitch, loudness, and sound quality or timbre. Timbre is that attribute of auditory
sensation in terms of which a listener can distinguish two similar sounds that
have the same pitch and loudness.
Timbre is mainly determined by the harmonic content and the dynamic
characteristics of sound such as vibrato and the attack-decay envelope. Timbre is the characteristic that allows us
to discriminate sounds produced by different instruments playing at the same
time.
Simultaneous masking is a property of the
human auditory system where some sounds vanish in the presence of louder
sounds. For example, in the presence of
very strong white noise, many weaker sounds get masked, or a tone of 500 Hz can
mask a softer tone of 600 Hz. The
strong sound is called the masker and the softer sound is called the
maskee. This aspect of human hearing
has important implications for the design of audio perceptual coders.
The goal of audio perceptual coders is to
minimize the amount of data to be coded without degradation or loss of
information for the listener. Data
reduction can be achieved in accordance with psycho acoustical algorithms based
on the concepts of critical bands, minimum threshold of hearing, and the
masking phenomena. The sound signals to
be coded are compared to the minimum hearing threshold and the masking
curve. When a sound signal falls bellow
the minimum threshold of coding, the signal is coded by using the minimal
quantity of bits, and signals that fall below the threshold are discarded,
because the ear cannot hear them.
A critical band is the
smallest band of frequencies that activate the same part of the basilar
membrane. The ear can distinguish tones
a few hertz apart at low frequencies and tones must differ by hundreds of hertz
at high frequencies to be differentiated.
In any case, hair cells respond to the strongest stimulation in their
local region that is termed a critical band.
The concept of critical band was introduced by Fletcher in 1940 and has
been widely tested. Experiments show
that critical bands are much narrower at low frequencies than at high
frequencies; three-fourths of the critical bands are below 5,000 Hz. Critical bands are analogous to a spectrum
analyzer with variable center frequencies and any tone will create a critical
band centered on it.
Critical
bands also can be explained in another way, when two sounds of equal loudness
sounded separately are close together in pitch, their combined loudness when
sounded together will be only slightly louder than one of them alone. They may be said to be in the same critical
band where they are competing for the same nerve endings on the basilar
membrane of the inner ear. If the two
sounds are widely separated in pitch, the perceived loudness of the combined
tones will be considerably greater because they do not overlap on the basilar
membrane and compete for the same hair cells.
If the tones are far apart in frequency (not within a critical band),
the combined sound may be perceived as twice as loud as one alone. The theory of critical bands is an important
auditory concept because they show that the ear discriminates between energy in
the band, and energy outside the band, the former promotes masking.