Publication

Learning bimodal structure in audio-visual data

Pierre Vandergheynst, Gianluca Monaci
2009
Journal Articles

Abstract

A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio- visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio-temporal visual basis function. To represent an audio-visual signal, the kernels can be positioned independently and arbitrarily in space and time. The proposed algorithm uses unsupervised learning to form dictionaries of bimodal kernels from audio- visual material. The basis functions that emerge during learning capture salient audio-visual data structures. In addition it is demonstrated that the learned dictionary can be used to locate sources of sound in the movie frame. Speciﬁcally, in sequences containing two speakers the algorithm can robustly localize a speaker even in the presence of severe acoustic and visual distracters.

Official source

https://infoscience.epfl.ch/entities/publication/9f872e17-2d83-436a-b22a-ec361f2faa11

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Pierre Vandergheynst, Gianluca Monaci
2009
Journal Articles

Abstract

Official source

https://infoscience.epfl.ch/entities/publication/9f872e17-2d83-436a-b22a-ec361f2faa11

About this result

Related concepts (22)

Audio mixing

Audio mixing is the process by which multiple sounds are combined into one or more channels. In the process, a source's volume level, frequency content, dynamics, and panoramic position are manipulated or enhanced. This practical, aesthetic, or otherwise creative treatment is done in order to produce a finished version that is appealing to listeners. Audio mixing is practiced for music, film, television and live sound. The process is generally carried out by a mixing engineer operating a mixing console or digital audio workstation.

Dictionary

A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by consonantal root for Semitic languages or radical and stroke for logographic languages), which may include information on definitions, usage, etymologies, pronunciations, translation, etc. It is a lexicographical reference that shows inter-relationships among the data. A broad distinction is made between general and specialized dictionaries.

Multimodal distribution

In statistics, a multimodal distribution is a probability distribution with more than one mode. These appear as distinct peaks (local maxima) in the probability density function, as shown in Figures 1 and 2. Categorical, continuous, and discrete data can all form multimodal distributions. Among univariate analyses, multimodal distributions are commonly bimodal. When the two modes are unequal the larger mode is known as the major mode and the other as the minor mode. The least frequent value between the modes is known as the antimode.

Related publications (32)

Topics in statistical physics of high-dimensional machine learning

Hugo Chao Cui

In the past few years, Machine Learning (ML) techniques have ushered in a paradigm shift, allowing the harnessing of ever more abundant sources of data to automate complex tasks. The technical workhorse behind these important breakthroughs arguably lies in ...

EPFL2024

The Current State of the OBI DICT Project: A Bilingual e-Dictionary of Oracle-Bone Inscriptions with AI Image Recognition

This article reports on the current state of the OBI DICT project, a bilingual e-dictionary of oracle-bone inscriptions (OBI), incorporating artificial intelligence (AI) image recognition technology. It first provides a brief overview of the development of ...

Buro Van Die Wat2024

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Mathieu Salzmann, Sabine Süsstrunk, Tong Zhang, Yi Wu

In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can creat ...

2024