Publication

Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization

Related concepts (32)

Binaural recording is a method of recording sound that uses two microphones, arranged with the intent to create a 3-D stereo sound sensation for the listener of actually being in the room with the performers or instruments. This effect is often created using a technique known as dummy head recording, wherein a mannequin head is fitted with a microphone in each ear. Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers.

Sound localization

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance. The sound localization mechanisms of the mammalian auditory system have been extensively studied. The auditory system uses several cues for sound source localization, including time difference and level difference (or intensity difference) between the ears, and spectral information.

Simultaneous localization and mapping

Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. While this initially appears to be a chicken or the egg problem, there are several algorithms known to solve it in, at least approximately, tractable time for certain environments. Popular approximate solution methods include the particle filter, extended Kalman filter, covariance intersection, and GraphSLAM.

Surround sound

Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener (surround channels). Its first application was in movie theaters. Prior to surround sound, theater sound systems commonly had three screen channels of sound that played from three loudspeakers (left, center, and right) located in front of the audience.

Microphone

A microphone, colloquially called mic (maɪk), is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and public events, motion picture production, live and recorded audio engineering, sound recording, two-way radios, megaphones, and radio and television broadcasting. They are also used in computers for recording voice, speech recognition, VoIP, and for other purposes such as ultrasonic sensors or knock sensors.

Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP).

Wire recording

Wire recording, also known as magnetic wire recording, was the first magnetic recording technology, an analog type of audio storage. It recorded sound signals on a thin steel wire using varying levels of magnetization. The first crude magnetic recorder was invented in 1898 by Valdemar Poulsen. The first magnetic recorder to be made commercially available anywhere was the Telegraphone, manufactured by the American Telegraphone Company, Springfield, Massachusetts in 1903.

Neural coding

Neural coding (or neural representation) is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activity of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is thought that neurons can encode both digital and analog information.

Acoustic location

Acoustic location is the use of sound to determine the distance and direction of its source or reflector. Location can be done actively or passively, and can take place in gases (such as the atmosphere), liquids (such as water), and in solids (such as in the earth). Active acoustic location involves the creation of sound in order to produce an echo, which is then analyzed to determine the location of the object in question.

Image segmentation

In and computer vision, image segmentation is the process of partitioning a into multiple image segments, also known as image regions or image objects (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Sensor fusion

Sensor fusion is the process of combining sensor data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually. For instance, one could potentially obtain a more accurate location estimate of an indoor object by combining multiple data sources such as video cameras and WiFi localization signals.

Stereophonic sound

Stereophonic sound, or more commonly stereo, is a method of sound reproduction that recreates a multi-directional, 3-dimensional audible perspective. This is usually achieved by using two independent audio channels through a configuration of two loudspeakers (or stereo headphones) in such a way as to create the impression of sound heard from various directions, as in natural hearing. Because the multi-dimensional perspective is the crucial aspect, the term stereophonic also applies to systems with more than two channels or speakers such as quadraphonic and surround sound.

Independent component analysis

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the "cocktail party problem" of listening in on one person's speech in a noisy room.

Image stitching

Image stitching or photo stitching is the process of combining multiple photographic s with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap.

Medical image computing

Medical image computing (MIC) is an interdisciplinary field at the intersection of computer science, information engineering, electrical engineering, physics, mathematics and medicine. This field develops computational and mathematical methods for solving problems pertaining to medical images and their use for biomedical research and clinical care. The main goal of MIC is to extract clinically relevant information or knowledge from medical images.

Image registration

Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, military automatic target recognition, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements.

Principal component analysis

Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. Formally, PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where (most of) the variation in the data can be described with fewer dimensions than the initial data.

Sound recording and reproduction

Sound recording and reproduction is the electrical, mechanical, electronic, or digital inscription and re-creation of sound waves, such as spoken voice, singing, instrumental music, or sound effects. The two main classes of sound recording technology are analog recording and digital recording. Sound recording is the transcription of invisible vibrations in air onto a storage medium such as a phonograph disc. The process is reversed in sound reproduction, and the variations stored on the medium are transformed back into sound waves.

Dictionary

A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by consonantal root for Semitic languages or radical and stroke for logographic languages), which may include information on definitions, usage, etymologies, pronunciations, translation, etc. It is a lexicographical reference that shows inter-relationships among the data. A broad distinction is made between general and specialized dictionaries.

Linear network coding

In computer networking, linear network coding is a program in which intermediate nodes transmit data from source nodes to sink nodes by means of linear combinations. Linear network coding may be used to improve a network's throughput, efficiency, and scalability, as well as reducing attacks and eavesdropping. The nodes of a network take several packets and combine for transmission. This process may be used to attain the maximum possible information flow in a network.