Publication

Articulatory feature based continuous speech recognition using probabilistic lexical modeling

Related concepts (21)

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech perception

Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language.

Speech production

Speech production is the process by which thoughts are translated into speech. This includes the selection of words, the organization of relevant grammatical forms, and then the articulation of the resulting sounds by the motor system using the vocal apparatus. Speech production can be spontaneous such as when a person creates the words of a conversation, reactive such as when they name a picture or read aloud a written word, or imitative, such as in speech repetition.

English phonology

English phonology is the system of speech sounds used in spoken English. Like many other languages, English has wide variation in pronunciation, both historically and from dialect to dialect. In general, however, the regional dialects of English share a largely similar (but not identical) phonological system. Among other things, most dialects have vowel reduction in unstressed syllables and a complex set of phonological features that distinguish fortis and lenis consonants (stops, affricates, and fricatives).

Motor theory of speech perception

The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them.

Broca's area

Broca's area, or the Broca area (ˈbroʊkə, also UKˈbrɒkə, USˈbroʊkɑː), is a region in the frontal lobe of the dominant hemisphere, usually the left, of the brain with functions linked to speech production. Language processing has been linked to Broca's area since Pierre Paul Broca reported impairments in two patients. They had lost the ability to speak after injury to the posterior inferior frontal gyrus (pars triangularis) (BA45) of the brain.

Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP).

Emotion recognition

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

Distinctive feature

In linguistics, a distinctive feature is the most basic unit of phonological structure that distinguishes one sound from another within a language. For example, the feature [voice] distinguishes the two bilabial plosives: [p] and [b]. There are many different ways of defining and arranging features into feature systems: some deal with only one language while others are developed to apply to all languages. Distinctive features are grouped into categories according to the natural classes of segments they describe: major class features, laryngeal features, manner features, and place features.

Speech synthesis

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database.

Embodied cognition

Embodied cognition is the theory that many features of cognition, whether human or otherwise, are shaped by aspects of an organism's entire body. The cognitive features include high-level mental constructs (such as concepts and categories) and performance on various cognitive tasks (such as reasoning or judgment). The bodily aspects involve the motor system, the perceptual system, the bodily interactions with the environment (situatedness), and the assumptions about the world built the functional structure of organism's brain and body.

Current source

A current source is an electronic circuit that delivers or absorbs an electric current which is independent of the voltage across it. A current source is the dual of a voltage source. The term current sink is sometimes used for sources fed from a negative voltage supply. Figure 1 shows the schematic symbol for an ideal current source driving a resistive load. There are two types. An independent current source (or sink) delivers a constant current. A dependent current source delivers a current which is proportional to some other voltage or current in the circuit.

Sentence processing

Sentence processing takes place whenever a reader or listener processes a language utterance, either in isolation or in the context of a conversation or a text. Many studies of the human language comprehension process have focused on reading of single utterances (sentences) without context. Extensive research has shown that language comprehension is affected by context preceding a given utterance as well as many other factors. Sentence comprehension has to deal with ambiguity in spoken and written utterances, for example lexical, structural, and semantic ambiguities.

Pattern recognition

Pattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Speaker recognition

Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).

Speech

Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if they are the same word, e.g., "role" or "hotel"), and using those words in their semantic character as words in the lexicon of a language according to the syntactic constraints that govern lexical words' function in a sentence. In speaking, speakers perform many different intentional speech acts, e.

Affective computing

Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While some core ideas in the field may be traced as far back as to early philosophical inquiries into emotion, the more modern branch of computer science originated with Rosalind Picard's 1995 paper on affective computing and her book Affective Computing published by MIT Press.

Standard English

In an English-speaking country, Standard English (SE) is the variety of English that has undergone substantial regularisation and is associated with formal schooling, language assessment, and official print publications, such as public service announcements and newspapers of record, etc. All linguistic features are subject to the effects of standardisation, including morphology, phonology, syntax, lexicon, register, discourse markers, pragmatics, as well as written features such as spelling conventions, punctuation, capitalisation and abbreviation practices.

Old English phonology

Old English phonology is necessarily somewhat speculative since Old English is preserved only as a written language. Nevertheless, there is a very large corpus of the language, and the orthography apparently indicates phonological alternations quite faithfully, so it is not difficult to draw certain conclusions about the nature of Old English phonology. Old English had a distinction between short and long (doubled) consonants, at least between vowels (as seen in sunne "sun" and sunu "son", stellan "to put" and stelan "to steal"), and a distinction between short vowels and long vowels in stressed syllables.

Probabilistic context-free grammar

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics. PCFGs extend context-free grammars similar to how hidden Markov models extend regular grammars. Each production is assigned a probability.