Publication

Low-latency speaker spotting with online diarization and detection

Related concepts (32)

Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).

Vector space

In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called vectors, may be added together and multiplied ("scaled") by numbers called scalars. Scalars are often real numbers, but can be complex numbers or, more generally, elements of any field. The operations of vector addition and scalar multiplication must satisfy certain requirements, called vector axioms. The terms real vector space and complex vector space are often used to specify the nature of the scalars: real coordinate space or complex coordinate space.

Euclidean vector

In mathematics, physics, and engineering, a Euclidean vector or simply a vector (sometimes called a geometric vector or spatial vector) is a geometric object that has magnitude (or length) and direction. Vectors can be added to other vectors according to vector algebra. A Euclidean vector is frequently represented by a directed line segment, or graphically as an arrow connecting an initial point A with a terminal point B, and denoted by . A vector is what is needed to "carry" the point A to the point B; the Latin word vector means "carrier".

Unit vector

In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in (pronounced "v-hat"). The term direction vector, commonly denoted as d, is used to describe a unit vector being used to represent spatial direction and relative direction. 2D spatial directions are numerically equivalent to points on the unit circle and spatial directions in 3D are equivalent to a point on the unit sphere.

Speech recognition

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Smart speaker

A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "hot word" (or several "hot words"). Some smart speakers can also act as a smart device that utilizes Wi-Fi, Bluetooth, and other protocol standards to extend usage beyond audio playback, such as to control home automation devices. This can include, but is not limited to, features such as compatibility across a number of services and platforms, peer-to-peer connection through mesh networking, virtual assistants, and others.

Speech processing

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.

Audio deepfake

An audio deepfake (also known as voice cloning) is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices (due to throat disease or other medical problems) to get them back. Commercially, it has opened the door to several opportunities.

Fire alarm system

A fire alarm system is a building system designed to detect and alert occupants and emergency forces of the presence of smoke, fire, carbon monoxide, or other fire-related emergencies. Fire alarms systems are required in most commercial buildings. They may include smoke detectors, heat detectors, and manual fire alarm activation devices, all of which are connected to a Fire Alarm Control Panel (FACP) normally found in an electrical room or panel room. Fire alarm systems generally use visual and audio signalization to warn the occupants of the building.

Open source

Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized software development model that encourages open collaboration. A main principle of open-source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public.

Vector area

In 3-dimensional geometry and vector calculus, an area vector is a vector combining an area quantity with a direction, thus representing an oriented area in three dimensions. Every bounded surface in three dimensions can be associated with a unique area vector called its vector area. It is equal to the surface integral of the surface normal, and distinct from the usual (scalar) surface area. Vector area can be seen as the three dimensional generalization of signed area in two dimensions.

Bivector

In mathematics, a bivector or 2-vector is a quantity in exterior algebra or geometric algebra that extends the idea of scalars and vectors. If a scalar is considered a degree-zero quantity, and a vector is a degree-one quantity, then a bivector can be thought of as being of degree two. Bivectors have applications in many areas of mathematics and physics. They are related to complex numbers in two dimensions and to both pseudovectors and quaternions in three dimensions.

Open-source software

Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open-source software may be developed in a collaborative, public manner. Open-source software is a prominent example of open collaboration, meaning any capable user is able to participate online in development, making the number of possible contributors indefinite.

Amazon Echo

Amazon Echo, often shortened to Echo, is an American brand of smart speakers developed by Amazon. Echo devices connect to the voice-controlled intelligent personal assistant service Alexa, which will respond when a user says "Alexa". Users may change this wake word to "Amazon", "Echo", "Computer", as well as some other options. The features of the device include voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, and playing audiobooks, in addition to providing weather, traffic and other real-time information.

Four-vector

In special relativity, a four-vector (or 4-vector) is an object with four components, which transform in a specific way under Lorentz transformations. Specifically, a four-vector is an element of a four-dimensional vector space considered as a representation space of the standard representation of the Lorentz group, the (1/2,1/2) representation. It differs from a Euclidean vector in how its magnitude is determined.

Smartphone

A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit. They are distinguished from older-design feature phones by their more advanced hardware capabilities and extensive mobile operating systems, which facilitate wider software, access to the internet (including web browsing over mobile broadband), and multimedia functionality (including music, video, cameras, and gaming), alongside core phone functions such as voice calls and text messaging.

Applications of artificial intelligence

Artificial intelligence (AI) has been used in applications to alleviate certain problems throughout industry and academia. AI, like electricity or computers, is a general purpose technology that has a multitude of applications. It has been used in fields of language translation, image recognition, credit scoring, e-commerce and other domains. Recommendation system A recommendation system predicts the "rating" or "preference" a user would give to an item.

Image segmentation

In and computer vision, image segmentation is the process of partitioning a into multiple image segments, also known as image regions or image objects (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Open-source license

Open-source licenses facilitate free and open-source software (FOSS) development. Intellectual property (IP) laws restrict the modification and sharing of creative works. Free and open-source software licenses use these existing legal structures for the inverse purpose of granting freedoms that promote sharing and collaboration. They grant the recipient the rights to use the software, examine the source code, modify it, and distribute the modifications. These licenses target computer software where source code can be necessary to create modifications.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.