Publication

Multi-parametric source-filter separation of speech and prosodic voice restoration

Related concepts (32)

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech perception

Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language.

Speech synthesis

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database.

Speech

Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if they are the same word, e.g., "role" or "hotel"), and using those words in their semantic character as words in the lexicon of a language according to the syntactic constraints that govern lexical words' function in a sentence. In speaking, speakers perform many different intentional speech acts, e.

Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP).

Human voice

The human voice consists of sound made by a human being using the vocal tract, including talking, singing, laughing, crying, screaming, shouting, humming or yelling. The human voice frequency is specifically a part of human sound production in which the vocal folds (vocal cords) are the primary sound source. (Other sound production mechanisms produced from the same general area of the body involve the production of unvoiced consonants, clicks, whistling and whispering.

Kalman filter

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

Speech processing

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.

Hoarse voice

A hoarse voice, also known as dysphonia or hoarseness, is when the voice involuntarily sounds breathy, raspy, or strained, or is softer in volume or lower in pitch. A hoarse voice can be associated with a feeling of unease or scratchiness in the throat. Hoarseness is often a symptom of problems in the vocal folds of the larynx. It may be caused by laryngitis, which in turn may be caused by an upper respiratory infection, a cold, or allergies.

Voice user interface

A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface. Voice user interfaces have been added to automobiles, home automation systems, computer operating systems, home appliances like washing machines and microwave ovens, and television remote controls.

Estimation theory

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements.

Prosody (linguistics)

In linguistics, prosody (ˈprɒsədi,_ˈprɒzədi) is the study of elements of speech that are not individual phonetic segments (vowels and consonants) but which are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals. Prosody may reflect features of the speaker or the utterance: their emotional state; the form of utterance (statement, question, or command); the presence of irony or sarcasm; emphasis, contrast, and focus.

Environmental noise

Environmental noise is an accumulation of noise pollution that occurs outside. This noise can be caused by transport, industrial, and recreational activities. Noise is frequently described as 'unwanted sound'. Within this context, environmental noise is generally present in some form in all areas of human, animal, or environmental activity. The effects in humans of exposure to environmental noise may vary from emotional to physiological and psychological. Noise at low levels is not necessarily harmful.

Electronic filter

Electronic filters are a type of signal processing filter in the form of electrical circuits. This article covers those filters consisting of lumped electronic components, as opposed to distributed-element filters. That is, using components and interconnections that, in analysis, can be considered to exist at a single point. These components can be in discrete packages or part of an integrated circuit. Electronic filters remove unwanted frequency components from the applied signal, enhance wanted ones, or both.

Euler method

In mathematics and computational science, the Euler method (also called the forward Euler method) is a first-order numerical procedure for solving ordinary differential equations (ODEs) with a given initial value. It is the most basic explicit method for numerical integration of ordinary differential equations and is the simplest Runge–Kutta method. The Euler method is named after Leonhard Euler, who first proposed it in his book Institutionum calculi integralis (published 1768–1870).

Iterative method

In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the n-th approximation is derived from the previous ones. A specific implementation with termination criteria for a given iterative method like gradient descent, hill climbing, Newton's method, or quasi-Newton methods like BFGS, is an algorithm of the iterative method.

Estimation

Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is derived from the best information available. Typically, estimation involves "using the value of a statistic derived from a sample to estimate the value of a corresponding population parameter".

Efficiency (statistics)

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

Chebyshev filter

Chebyshev filters are analog or digital filters that have a steeper roll-off than Butterworth filters, and have either passband ripple (type I) or stopband ripple (type II). Chebyshev filters have the property that they minimize the error between the idealized and the actual filter characteristic over the operating frequency range of the filter, but they achieve this with ripples in the passband. This type of filter is named after Pafnuty Chebyshev because its mathematical characteristics are derived from Chebyshev polynomials.

Airstream mechanism

In phonetics, the airstream mechanism is the method by which airflow is created in the vocal tract. Along with phonation and articulation, it is one of three main components of speech production. The airstream mechanism is mandatory for most sound production and constitutes the first part of this process, which is called initiation. The organ generating the airstream is called the initiator and there are three initiators used phonemically in non-disordered human oral languages: the diaphragm together with the ribs and lungs (pulmonic mechanisms), the glottis (glottalic mechanisms), and the tongue (lingual or "velaric" mechanisms).