Concept

Speech synthesis

Related concepts (32)

HAL 9000 (or simply HAL or Hal) is a fictional artificial intelligence character and the main antagonist in Arthur C. Clarke's Space Odyssey series. First appearing in the 1968 film 2001: A Space Odyssey, HAL (Heuristically Programmed Algorithmic Computer) is a sentient artificial general intelligence computer that controls the systems of the Discovery One spacecraft and interacts with the ship's astronaut crew. While part of HAL's hardware is shown toward the end of the film, he is mostly depicted as a camera lens containing a red and yellow dot, with such units located throughout the ship.

Linguistics

Linguistics is the scientific study of language. The modern-day scientific study of linguistics takes all aspects of language into account — i.e., the cognitive, the social, the cultural, the psychological, the environmental, the biological, the literary, the grammatical, the paleographical, and the structural. Linguistics is based on a theoretical as well as descriptive study of language, and is also interlinked with the applied fields of language studies and language learning, which entails the study of specific languages.

Vocoder

A vocoder (ˈvoʊkoʊdər, a portmanteau of voice and encoder) is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was invented in 1938 by Homer Dudley at Bell Labs as a means of synthesizing human speech. This work was developed into the channel vocoder which was used as a voice codec for telecommunications for speech coding to conserve bandwidth in transmission.

Linear predictive coding

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. LPC is the most widely used method in speech coding and speech synthesis. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

Speech processing

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Audio signal processing

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain.

Speech recognition

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech

Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if they are the same word, e.g., "role" or "hotel"), and using those words in their semantic character as words in the lexicon of a language according to the syntactic constraints that govern lexical words' function in a sentence. In speaking, speakers perform many different intentional speech acts, e.

Formant

In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmonic sounds, with this definition, the formant frequency is sometimes taken as that of the harmonic that is most augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself.

Robotics

Robotics is an interdisciplinary branch of electronics and communication, computer science and engineering. Robotics involves the design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrates fields of mechanical engineering, electrical engineering, information engineering, mechatronics engineering, electronics, biomedical engineering, computer engineering, control systems engineering, software engineering, mathematics, etc.

Screen reader

A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability. Screen readers are software applications that attempt to convey what people with normal eyesight see on a display to their users via non-visual means, like text-to-speech, sound icons, or a braille device.

Google Translate

Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, as well as an API that helps developers build browser extensions and software applications. As of 2022, Google Translate supports languages at various levels; it claimed over 500 million total users , with more than 100 billion words translated daily, after the company stated in May 2013 that it served over 200 million people daily.

Speaker recognition

Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).

Synthesizer

A synthesizer (also spelled synthesiser) is an electronic musical instrument that generates audio signals. Synthesizers typically create sounds by generating waveforms through methods including subtractive synthesis, additive synthesis and frequency modulation synthesis. These sounds may be altered by components such as filters, which cut or boost frequencies; envelopes, which control articulation, or how notes begin and end; and low-frequency oscillators, which modulate parameters such as pitch, volume, or filter characteristics affecting timbre.

Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP).

Electronic game

An electronic game is a game that uses electronics to create an interactive system with which a player can play. Video games are the most common form today, and for this reason the two terms are often used interchangeably. There are other common forms of electronic game including handheld electronic games, standalone systems (e.g. pinball, slot machines, or electro-mechanical arcade games), and exclusively non-visual products (e.g. audio games). Text-based game The earliest form of computer game to achieve any degree of mainstream use was the text-based Teletype game.

Recurrent neural network

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

Visual impairment

Visual impairment, also known as vision impairment, is a medical definition primarily measured based on an individual's better eye visual acuity; in the absence of treatment such as corrective eyewear, assistive devices, and medical treatment– visual impairment may cause the individual difficulties with normal daily tasks including reading and walking. Low vision is a functional definition of visual impairment that is chronic, uncorrectable with treatment or conventional corrective lenses, and impacts daily living.

Ebook

An ebook (short for electronic book), also known as an e-book or eBook, is a book publication made available in digital form, consisting of text, images, or both readable on the flat-panel display of computers or other electronic devices. Although sometimes defined as "an electronic version of a printed book", some e-books exist without a printed equivalent. E-books can be read on dedicated e-reader devices, but also on any computer device that features a controllable viewing screen, including desktop computers, laptops, tablets and smartphones.