Publication

FASA: Fast, Accurate, and Size-Aware Salient Object Detection

Related concepts (32)

In and computer vision, image segmentation is the process of partitioning a into multiple image segments, also known as image regions or image objects (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Object co-segmentation

In computer vision, object co-segmentation is a special case of , which is defined as jointly segmenting semantically similar objects in multiple images or video frames. It is often challenging to extract segmentation masks of a target/object from a noisy collection of images or video frames, which involves object discovery coupled with . A noisy collection implies that the object/target is present sporadically in a set of images or the object/target disappears intermittently throughout the video of interest.

Computer vision

Computer vision tasks include methods for , , and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images (the input to the retina in the human analog) into descriptions of the world that make sense to thought processes and can elicit appropriate action.

Image compression

Image compression is a type of data compression applied to s, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data. Image compression may be lossy or lossless. Lossless compression is preferred for archival purposes and often for medical imaging, technical drawings, clip art, or comics.

Feature (computer vision)

In computer vision and , a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

Computer stereo vision

Computer stereo vision is the extraction of 3D information from digital images, such as those obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examining the relative positions of objects in the two panels. This is similar to the biological process of stereopsis. In traditional stereo vision, two cameras, displaced horizontally from one another, are used to obtain two differing views on a scene, in a manner similar to human binocular vision.

Digital image processing

Digital image processing is the use of a digital computer to process s through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over . It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions (perhaps more) digital image processing may be modeled in the form of multidimensional systems.

Data compression

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information.

Machine vision

Machine vision (MV) is the technology and methods used to provide -based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to many technologies, software and hardware products, integrated systems, actions, methods and expertise. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of computer science.

Pyramid (image processing)

Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampling. Pyramid representation is a predecessor to scale-space representation and multiresolution analysis. There are two main types of pyramids: lowpass and bandpass. A lowpass pyramid is made by smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of 2 along each coordinate direction.

Harris affine region detector

In the fields of computer vision and , the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas. The Harris affine detector can identify similar regions between images that are related through affine transformations and have different illuminations.

Lossy compression

In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat on this page show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression (reversible data compression) which does not degrade the data.

Lossless compression

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates (and therefore reduced media sizes).

Salience (neuroscience)

Salience (also called saliency) is that property by which some thing stands out. Salient events are an attentional mechanism by which organisms learn and survive; those organisms can focus their limited perceptual and cognitive resources on the pertinent (that is, salient) subset of the sensory data available to them. Saliency typically arises from contrasts between items and their neighborhood. They might be represented, for example, by a red dot surrounded by white dots, or by a flickering message indicator of an answering machine, or a loud noise in an otherwise quiet environment.

High-definition video

High-definition video (HD video) is video of higher resolution and quality than standard-definition. While there is no standardized meaning for high-definition, generally any video image with considerably more than 480 vertical scan lines (North America) or 576 vertical lines (Europe) is considered high-definition. 480 scan lines is generally the minimum even though the majority of systems greatly exceed that. Images of standard resolution captured at rates faster than normal (60 frames/second North America, 50 fps Europe), by a high-speed camera may be considered high-definition in some contexts.

High-definition television

High-definition television (HD or HDTV) describes a television system which provides a substantially higher than the previous generation of technologies. The term has been used since 1936; in more recent times, it refers to the generation following standard-definition television (SDTV), often abbreviated to HDTV or HD-TV. It is the current de facto standard video format used in most broadcasts: terrestrial broadcast television, cable television, satellite television and Blu-ray Discs.

Statistical model

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process. When referring specifically to probabilities, the corresponding term is probabilistic model. A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables.

Ultra-high-definition television

Ultra-high-definition television (also known as Ultra HD television, Ultra HD, UHDTV, UHD and Super Hi-Vision) today includes 4K UHD and 8K UHD, which are two digital video formats with an of 16:9. These were first proposed by NHK Science & Technology Research Laboratories and later defined and approved by the International Telecommunication Union (ITU). The Consumer Electronics Association announced on October 17, 2012, that "Ultra High Definition", or "Ultra HD", would be used for displays that have an aspect ratio of 16:9 or wider and at least one digital input capable of carrying and presenting native video at a minimum resolution of .

Active contour model

Active contour model, also called snakes, is a framework in computer vision introduced by Michael Kass, Andrew Witkin, and Demetri Terzopoulos for delineating an object outline from a possibly 2D . The snakes model is popular in computer vision, and snakes are widely used in applications like object tracking, shape recognition, , edge detection and stereo matching. A snake is an energy minimizing, deformable spline influenced by constraint and image forces that pull it towards object contours and internal forces that resist deformation.

Web colors

Web colors are colors used in displaying web pages on the World Wide Web, as well as the methods for describing and specifying those colors. Colors may be specified as an RGB triplet or in hexadecimal format (a hex triplet) or according to their common English names in some cases. A color tool or other graphics software is often used to generate color values. In some uses, hexadecimal color codes are specified with notation using a leading number sign (#).