The major problem in visual perception is that what people see is not simply a translation of retinal stimuli (i.e., the image on the retina), with the brain altering the basic information taken in. Thus people interested in perception have long struggled to explain what
visual processing does to create what is actually seen.
Early studies (green) and
ventral stream (purple) are shown. Much of the human
cerebral cortex is involved in vision. There were two major
ancient Greek schools, providing a primitive explanation of how vision works. The first was the "
emission theory" of vision which maintained that vision occurs when rays emanate from the eyes and are intercepted by visual objects. If an object was seen directly it was by 'means of rays' coming out of the eyes and again falling on the object. A refracted image was, however, seen by 'means of rays' as well, which came out of the eyes, traversed through the air, and after refraction, fell on the visible object which was sighted as the result of the movement of the rays from the eye. This theory was championed by scholars who were followers of
Euclid's
Optics and
Ptolemy's
Optics. The second school advocated the so-called 'intromission' approach which sees vision as coming from something entering the eyes representative of the object. With its main propagator
Aristotle (
De Sensu), and his followers, but also introduced experimental methods that influenced later European scholars such as
Roger Bacon,
Kepler, and eventually
Newton. Both schools of thought relied upon the principle that "like is only known by like", and thus upon the notion that the eye was composed of some "internal fire" that interacted with the "external fire" of visible light and made vision possible.
Plato makes this assertion in his dialogue
Timaeus (45b and 46b), as does
Empedocles (as reported by Aristotle in his
De Sensu,
DK frag. B17). He was the first person to explain that vision occurs when light bounces on an object and then is directed to one's eyes.
Leonardo da Vinci (1452–1519) is believed to be the first to recognize the special optical qualities of the eye. He wrote "The function of the human eye ... was described by a large number of authors in a certain way. But I found it to be completely different." His main experimental finding was that there is only a distinct and clear vision at the line of sight—the optical line that ends at the
fovea. Although he did not use these words literally he actually is the father of the modern distinction between foveal and
peripheral vision.
Isaac Newton (1642–1726/27) was the first to discover through experimentation, by isolating individual colors of the spectrum of light passing through a
prism, that the visually perceived color of objects appeared due to the character of light the objects reflected, and that these divided colors could not be changed into any other color, which was contrary to scientific expectation of the day.
Unconscious inference Hermann von Helmholtz is often credited with the first modern study of visual perception. Helmholtz examined the human eye and concluded that it was incapable of producing a high-quality image. Insufficient information seemed to make vision impossible. He, therefore, concluded that vision could only be the result of some form of "unconscious inference", coining that term in 1867. He proposed the brain was making assumptions and conclusions from incomplete data, based on previous experiences. Inference requires prior experience of the world. Examples of well-known assumptions, based on visual experience, are: • light comes from above; • objects are normally not viewed from below; • faces are seen (and recognized) upright; • closer objects can block the view of more distant objects, but not vice versa; and • figures (i.e., foreground objects) tend to have convex borders. The study of
visual illusions (cases when the inference process goes wrong) has yielded much insight into what sort of assumptions the visual system makes. Another type of unconscious inference hypothesis (based on probabilities) has recently been revived in so-called
Bayesian studies of visual perception. Proponents of this approach consider that the visual system performs some form of
Bayesian inference to derive a perception from sensory data. However, it is not clear how proponents of this view derive, in principle, the relevant probabilities required by the Bayesian equation. Models based on this idea have been used to describe various visual perceptual functions, such as the
perception of motion, the
perception of depth, and
figure-ground perception. The "wholly empirical theory of perception" is a related and newer approach that rationalizes visual perception without explicitly invoking Bayesian formalisms.
Gestalt theory Gestalt psychologists working primarily in the 1930s and 1940s raised many of the research questions that are studied by vision scientists today. The Gestalt Laws of Organization have guided the study of how people perceive visual components as organized patterns or wholes, instead of many different parts. "Gestalt" is a German word that partially translates to "configuration or pattern" along with "whole or emergent structure". According to this theory, there are eight main factors that determine how the visual system automatically groups elements into patterns: Proximity, Similarity, Closure, Symmetry, Common Fate (i.e. common motion), Continuity as well as Good Gestalt (pattern that is regular, simple, and orderly) and Past Experience.
Language model Following in the footsteps of
George Berkeley, the Australian philosopher
Colin Murray Turbayne argued in favor of an alternative to the classical "geometric model," of visual perception by asserting that aspects of it have needlessly clouded our understanding of vision since the time of Euclid. Quoting the sculptor
Naum Gabo he notes: "Lines, shapes, color and movement have a language of their own, but reading takes time. It is not enough to look. you must see and "see" means "read". Turbayne argued that a "language model peculiarly illuminates this ancient problem of how we see, shedding a bright light on dark areas dimly light by its great rival." Specifically, he highlighted the limitations found within a purely
mechanistic explanation of vision by arguing that several cases of "visual illusion" can be more adequately explained through the utilization of the terms found within such a language model. With this in mind, he presented a comparative analysis of specific examples of visual distortion including: the "Barrovian Case", the case of the "Horizontal Moon" and the case of the "Inverted Retinal Image."
Analysis of eye movement , 1967) During the 1960s, technical development permitted the continuous registration of eye movement during reading, in picture viewing, and later, in visual problem solving, and when headset-cameras became available, also during driving. The picture to the right shows what may happen during the first two seconds of visual inspection. While the background is out of focus, representing the
peripheral vision, the first eye movement goes to the boots of the man (just because they are very near the starting fixation and have a reasonable contrast). Eye movements serve the function of
attentional selection, i.e., to select a fraction of all visual inputs for deeper processing by the brain. The following fixations jump from face to face. They might even permit comparisons between faces. It may be concluded that the icon
face is a very attractive search icon within the peripheral field of vision. The
foveal vision adds detailed information to the peripheral
first impression. It can also be noted that there are different types of eye movements:
fixational eye movements (
microsaccades, ocular drift, and tremor), vergence movements, saccadic movements and pursuit movements.
Fixations are comparably static points where the eye rests. However, the eye is never completely still, and gaze position will drift. These drifts are in turn corrected by microsaccades, very small fixational eye movements.
Vergence movements involve the cooperation of both eyes to allow for an image to fall on the same area of both retinas. This results in a single focused image.
Saccadic movements is the type of eye movement that makes jumps from one position to another position and is used to rapidly scan a particular scene/image. Lastly,
pursuit movement is smooth eye movement and is used to follow objects in motion.
Face and object recognition There is considerable evidence that face and
object recognition are accomplished by distinct systems. For example,
prosopagnosic patients show deficits in face, but not object processing, while object
agnosic patients (most notably,
patient C.K.) show deficits in object processing with spared face processing. Behaviorally, it has been shown that faces, but not objects, are subject to inversion effects, leading to the claim that faces are "special". Further, face and object processing recruit distinct neural systems. Notably, some have argued that the apparent specialization of the human brain for face processing does not reflect true domain specificity, but rather a more general process of expert-level discrimination within a given class of stimulus, though this latter claim is the subject of
substantial debate. Using fMRI and electrophysiology Doris Tsao and colleagues described brain regions and a mechanism for
face recognition in macaque monkeys. The
inferotemporal cortex has a key role in the task of recognition and differentiation of different objects. A study by MIT shows that subset regions of the IT cortex are in charge of different objects. By selectively shutting off neural activity of many small areas of the cortex, the animal gets alternately unable to distinguish between certain particular pairments of objects. This shows that the IT cortex is divided into regions that respond to different and particular visual features. In a similar way, certain particular patches and regions of the cortex are more involved in face recognition than other object recognition. Some studies tend to show that rather than the uniform global image, some particular features and regions of interest of the objects are key elements when the brain needs to recognise an object in an image. In this way, the human vision is vulnerable to small particular changes to the image, such as disrupting the edges of the object, modifying texture or any small change in a crucial region of the image. Studies of people whose sight has been restored after a long blindness reveal that they cannot necessarily recognize objects and faces (as opposed to color, motion, and simple geometric shapes). Some hypothesize that being blind during childhood prevents some part of the visual system necessary for these higher-level tasks from developing properly. The general belief that a
critical period lasts until age 5 or 6 was challenged by a 2007 study that found that older patients could improve these abilities with years of exposure. == Cognitive and computational approaches ==