Each of the application areas described above employ a range of computer vision tasks; more or less well-defined measurement problems or processing problems, which can be solved using a variety of methods. Some examples of typical computer vision tasks are presented below. Computer vision tasks include methods for
acquiring,
processing,
analyzing and understanding digital images, and extraction of
high-dimensional data from the real world in order to produce numerical or symbolic information,
e.g., in the forms of decisions. •
Object recognition (also called
object classification)one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. Blippar,
Google Goggles, and LikeThat provide stand-alone programs that illustrate this functionality. •
Identificationan individual instance of an object is recognized. Examples include identification of a specific person's face or fingerprint,
identification of handwritten digits, or the identification of a specific vehicle. •
Detectionthe image data are scanned for specific objects along with their locations. Examples include the detection of an obstacle in the car's field of view and possible abnormal cells or tissues in medical images or the detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation. Currently, the best algorithms for such tasks are based on
convolutional neural networks. An illustration of their capabilities is given by the
ImageNet Large Scale Visual Recognition Challenge; this is a benchmark in object classification and detection, with millions of images and 1000 object classes used in the competition. Performance of convolutional neural networks on the ImageNet tests is now close to that of humans. •
Emotion recognition''''''a subset of facial recognition, emotion recognition refers to the process of classifying human
emotions. Psychologists caution, however, that internal emotions cannot be reliably detected from faces. •
Shape Recognition Technology (SRT) in
people counter systems differentiating human beings (head and shoulder patterns) from objects. •
Human activity recognition - deals with recognizing the activity from a series of video frames, such as, if the person is picking up an object or walking.
Motion analysis Several tasks relate to motion estimation, where an image sequence is processed to produce an estimate of the velocity either at each points in the image or in the 3D scene or even of the camera that produces the images. Examples of such tasks are: •
Egomotiondetermining the 3D rigid motion (rotation and translation) of the camera from an image sequence produced by the camera. •
Trackingfollowing the movements of a (usually) smaller set of interest points or objects (
e.g., vehicles, objects, humans or other organisms) in the image sequence. This has vast industry applications as most high-running machinery can be monitored in this way. •
Optical flowto determine, for each point in the image, how that point is moving relative to the image plane,
i.e., its apparent motion. This motion is a result of both how the corresponding 3D point is moving in the scene and how the camera is moving relative to the scene.
Scene reconstruction Given one or (typically) more images of a scene, or a video, scene reconstruction aims at
computing a 3D model of the scene. In the simplest case, the model can be a set of 3D points. More sophisticated methods produce a complete 3D surface model. The advent of 3D imaging not requiring motion or scanning, and related processing algorithms is enabling rapid advances in this field. Grid-based 3D sensing can be used to acquire 3D images from multiple angles. Algorithms are now available to stitch multiple 3D images together into point clouds and 3D models.
Image restoration Image restoration comes into the picture when the original image is degraded or damaged due to some external factors like lens wrong positioning, transmission interference, low lighting or motion blurs, etc., which is referred to as noise. When the images are degraded or damaged, the information to be extracted from them also gets damaged. Therefore, we need to recover or restore the image as it was intended to be. The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images. The simplest possible approach for noise removal is various types of filters, such as low-pass filters or median filters. More sophisticated methods assume a model of how the local image structures look to distinguish them from noise. By first analyzing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis step, a better level of noise removal is usually obtained compared to the simpler approaches. An example in this field is
inpainting. ==System methods==