Concept

Collation

Related concepts (24)

A digraph or digram (from the δίς , "double" and γράφω , "to write") is a pair of characters used in the orthography of a language to write either a single phoneme (distinct sound), or a sequence of phonemes that does not correspond to the normal values of the two characters combined. Some digraphs represent phonemes that cannot be represented with a single character in the writing system of a language, like the English sh in ship and fish. Other digraphs represent phonemes that can also be represented by single characters.

Latin script

The Latin script, also known as the Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern Italy (Magna Graecia). The Greek alphabet was adopted by the Etruscans, and subsequently their alphabet was adopted by the Romans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.

Unicode

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji (including in colours), and non-visual control and formatting codes.

Chinese characters

Chinese characters are logograms developed for the writing of Chinese. Chinese characters are the oldest continuously used system of writing in the world. By virtue of their widespread current use throughout East Asia and Southeast Asia, as well as their profound historic use throughout the Sinosphere, Chinese characters are among the most widely adopted writing systems in the world by number of users. The total number of Chinese characters ever to appear in a dictionary is in the tens of thousands, though most are graphic variants, were used historically and passed out of use, or are of a specialized nature.

Letter case

Letter case is the distinction between the letters that are in larger uppercase or capitals (or more formally majuscule) and smaller lowercase (or more formally minuscule) in the written representation of certain languages. The writing systems that distinguish between the upper- and lowercase have two parallel sets of letters: each in the majuscule set has a counterpart in the minuscule set. Some counterpart letters have the same shape, and differ only in size (e.g. {C,c} or {S,s}), but for others the shapes are different (e.

Ligature (writing)

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters æ and œ used in English and French, in which the letters 'a' and 'e' are joined for the first ligature and the letters 'o' and 'e' are joined for the second ligature. For stylistic and legibility reasons, 'f' and 'i' are often merged to create 'fi' (where the tittle on the 'i' merges with the hood of the 'f'); the same is true of 's' and 't' to create 'st'.

Alphabetical order

Alphabetical order is a system whereby character strings are placed in order based on the position of the characters in the conventional ordering of an alphabet. It is one of the methods of collation. In mathematics, a lexicographical order is the generalization of the alphabetical order to other data types, such as sequences of numbers or other ordered mathematical objects. When applied to strings or sequences that may contain digits, numbers or more elaborate types of elements, in addition to alphabetical characters, the alphabetical order is generally called a lexicographical order.

Japanese language

is the principal language of the Japonic language family spoken by the Japanese people. It has around 128 million speakers, primarily in Japan, the only country where it is the national language, and within Japanese diaspora worldwide. Japonic family also includes the Ryukyuan languages and the variously classified Hachijō language. There have been many attempts to group the Japonic languages with other families such as the Ainu, Austroasiatic, Koreanic, and the now-discredited Altaic, but none of these proposals has gained widespread acceptance.

Brahmic scripts

The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South, East and Southeast Asia: Indo-Aryan, Dravidian, Tibeto-Burman, Mongolic, Austroasiatic, Austronesian, and Tai. They were also the source of the dictionary order (gojūon) of Japanese kana.

Grapheme

In linguistics, a grapheme is the smallest functional unit of a writing system. The word grapheme is derived and the suffix -eme by analogy with phoneme and other names of emic units. The study of graphemes is called graphemics. The concept of graphemes is abstract and similar to the notion in computing of a character. By comparison, a specific shape that represents any particular grapheme in a given typeface is called a glyph. There are two main opposing grapheme concepts.

Hangul

The Korean alphabet, known as Hangul (ˈhɑːnguːl ; ) in South Korea and Chosŏn'gŭl () in North Korea, is the modern official writing system for the Korean language. The letters for the five basic consonants reflect the shape of the speech organs used to pronounce them, and they are systematically modified to indicate phonetic features; similarly, the vowel letters are systematically modified for related sounds, making Hangul a featural writing system.

Dictionary

A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by consonantal root for Semitic languages or radical and stroke for logographic languages), which may include information on definitions, usage, etymologies, pronunciations, translation, etc. It is a lexicographical reference that shows inter-relationships among the data. A broad distinction is made between general and specialized dictionaries.

Devanagari

Devanāgarī or Devanagari (ˌdeɪvəˈnɑːɡəri ; देवनागरी, , Sanskrit pronunciation: deːʋɐˈnaːɡɐriː), also called Nāgarī (), is a left-to-right abugida (a type of segmental writing system), based on the ancient Brāhmī script, used in the northern Indian subcontinent. It is one of the official scripts of the Republic of India and Nepal. It was developed and in regular use by the 7th century CE. The Devanāgarī script, composed of 48 primary characters, including 14 vowels and 34 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

Letter (alphabet)

A letter is a segmental symbol of a phonemic writing system. The inventory of all letters forms an alphabet. Letters broadly correspond to phonemes in the spoken form of the language, although there is rarely a consistent and exact correspondence between letters and phonemes. The word letter, borrowed from Old French letre, entered Middle English around 1200 AD, eventually displacing the Old English term bōcstæf (bookstaff). Letter is descended from the Latin littera, which may have descended from the Greek "διφθέρα" (, writing tablet), via Etruscan.

String (computer science)

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Umlaut (diacritic)

The umlaut (ˈʊmlaʊt) is the diacritical mark used to indicate in writing (as part of the letters , , and ) the result of the historical sound shift due to which former back vowels are now pronounced as front vowels (for example a, ɔ, and ʊ as ɛ, œ, and ʏ). (The term [Germanic] umlaut is also used for the underlying historical sound shift process.) In its contemporary printed form, the mark consists of two dots placed over the letter to represent the changed vowel sound.

Space (punctuation)

In writing, a space () is a blank area that separates words, sentences, syllables (in syllabification) and other written or printed glyphs (characters). Conventions for spacing vary among languages, and in some languages the spacing rules are complex. Inter-word spaces ease the reader's task of identifying words, and avoid outright ambiguities such as "now here" vs. "nowhere". They also provide convenient guides for where a human or program may start new lines.

Spanish language

Spanish (español or idioma español) or Castilian (castellano) is a Romance language of the Indo-European language family that evolved from colloquial Latin spoken on the Iberian Peninsula of Europe. Today, it is a global language with about 474.7 million native speakers, mainly in the Americas and Spain. Spanish is the official language of 20 countries. It is the world's second-most spoken native language after Mandarin Chinese; the world's fourth most spoken language overall after English, Mandarin Chinese, and Hindustani (Hindi-Urdu); and the world's most widely spoken Romance language.

Brahmi script

Brahmi (ˈbrɑːmi; ; ISO: Brāhmī; Sometimes also known as dhamma lipi) is a writing system of ancient South Asia that appeared as a fully developed script in the third century BCE. Its descendants, the Brahmic scripts, continue to be used today across South and Southeast Asia. Brahmi is an abugida which uses a system of diacritical marks to associate vowels with consonant symbols.

Welsh language

Welsh (Cymraeg kəmˈraːiɡ or y Gymraeg ə ɡəmˈraːiɡ) is a Celtic language of the Brittonic subgroup that is native to the Welsh people. Welsh is spoken natively in Wales, by some in England, and in Y Wladfa (the Welsh colony in Chubut Province, Argentina). It is spoken by smaller numbers of people in Canada and the United States descended from Welsh immigrants, within their households (especially in Nova Scotia). Historically, it has also been known in English as "British", "Cambrian", "Cambric" and "Cymric".