Publication

Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

Related concepts (25)

The Semantic Web, sometimes known as Web 3.0 (not to be confused with Web3), is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL) are used. These technologies are used to formally represent metadata. For example, ontology can describe concepts, relationships between entities, and categories of things.

Semantic network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

Object detection

Object detection is a computer technology related to computer vision and that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including and video surveillance. It is widely used in computer vision tasks such as , vehicle counting, activity recognition, face detection, face recognition, video object co-segmentation.

Semantic triple

A semantic triple, or RDF triple or simply triple, is the atomic data entity in the Resource Description Framework (RDF) data model. As its name indicates, a triple is a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions (e.g., "Bob is 35", or "Bob knows John"). This format enables knowledge to be represented in a machine-readable way. Particularly, every part of an RDF triple is individually addressable via unique URIs — for example, the statement "Bob knows John" might be represented in RDF as: http://example.

Object co-segmentation

In computer vision, object co-segmentation is a special case of , which is defined as jointly segmenting semantically similar objects in multiple images or video frames. It is often challenging to extract segmentation masks of a target/object from a noisy collection of images or video frames, which involves object discovery coupled with . A noisy collection implies that the object/target is present sporadically in a set of images or the object/target disappears intermittently throughout the video of interest.

Semantic technology

The ultimate goal of semantic technology is to help machines understand data. To enable the encoding of semantics with the data, well-known technologies are RDF (Resource Description Framework) and OWL (Web Ontology Language). These technologies formally represent the meaning involved in information. For example, ontology can describe concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources.

Image segmentation

In and computer vision, image segmentation is the process of partitioning a into multiple image segments, also known as image regions or image objects (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Semantic similarity

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.

Video game art

Video game art is a specialized form of computer art employing video games as the artistic medium. Video game art often involves the use of patched or modified video games or the repurposing of existing games or game structures, however it relies on a broader range of artistic techniques and outcomes than artistic modification and it may also include painting, sculpture, appropriation, in-game intervention and performance, sampling, etc. It may also include the creation of art games either from scratch or by modifying existing games.

Visual system

The visual system comprises the sensory organ (the eye) and parts of the central nervous system (the retina containing photoreceptor cells, the optic nerve, the optic tract and the visual cortex) which gives organisms the sense of sight (the ability to detect and process visible light) as well as enabling the formation of several non-image photo response functions. It detects and interprets information from the optical spectrum perceptible to that species to "build a representation" of the surrounding environment.

Large language model

A large language model (LLM) is a language model characterized by its large size. Their size is enabled by AI accelerators, which are able to process vast amounts of text data, mostly scraped from the Internet. The artificial neural networks which are built can contain from tens of millions and up to billions of weights and are (pre-)trained using self-supervised learning and semi-supervised learning. Transformer architecture contributed to faster training.

Electronic art

Electronic art is a form of art that makes use of electronic media. More broadly, it refers to technology and/or electronic media. It is related to information art, new media art, video art, digital art, interactive art, internet art, and electronic music. It is considered an outgrowth of conceptual art and systems art. The term electronic art is almost synonymous to computer art and digital art. The latter two terms, and especially the term computer-generated art are mostly used for visual artworks generated by computers.

Video editing

Video editing is the manipulation and arrangement of video shots. Video editing is used to structure and present all video information, including films and television shows, video advertisements and video essays. Video editing has been dramatically democratized in recent years by editing software available for personal computers. Editing video can be difficult and tedious, so several technologies have been produced to aid people in this task. Pen based video editing software was developed in order to give people a more intuitive and fast way to edit video.

Non-linear editing

Non-linear editing is a form of offline editing for audio, video, and . In offline editing, the original content is not modified in the course of editing. In non-linear editing, edits are specified and modified by specialized software. A pointer-based playlist, effectively an edit decision list (EDL), for video and audio, or a directed acyclic graph for still images, is used to keep track of edits. Each time the edited audio, video, or image is rendered, played back, or accessed, it is reconstructed from the original source and the specified editing steps.

Time series

In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average. A time series is very frequently plotted via a run chart (which is a temporal line chart).

Advanced Audio Coding

Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves higher sound quality than MP3 encoders at the same bit rate. AAC has been standardized by ISO and IEC as part of the MPEG-2 and MPEG-4 specifications. Part of AAC, HE-AAC ("AAC+"), is part of MPEG-4 Audio and is adopted into digital radio standards DAB+ and Digital Radio Mondiale, and mobile television standards DVB-H and ATSC-M/H.

Multisensory integration

Multisensory integration, also known as multimodal integration, is the study of how information from the different sensory modalities (such as sight, sound, touch, smell, self-motion, and taste) may be integrated by the nervous system. A coherent representation of objects combining modalities enables animals to have meaningful perceptual experiences. Indeed, multisensory integration is central to adaptive behavior because it allows animals to perceive a world of coherent perceptual entities.

Digital audio

Digital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, samples are taken 44,100 times per second, each with 16-bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form.

Digital art

Digital art refers to any artistic work or practice that uses digital technology as part of the creative or presentation process. It can also refer to computational art that uses and engages with digital media. Since the 1960s, various names have been used to describe digital art, including computer art, electronic art, multimedia art and new media art. John Whitney developed the first computer-generated art in the early 1960s by utilizing mathematical operations to create art.

Visual perception

Visual perception is the ability to interpret the surrounding environment through photopic vision (daytime vision), color vision, scotopic vision (night vision), and mesopic vision (twilight vision), using light in the visible spectrum reflected by objects in the environment. This is different from visual acuity, which refers to how clearly a person sees (for example "20/20 vision"). A person can have problems with visual perceptual processing even if they have 20/20 vision.