Character encoding

Related concepts (30)

ASCII (ˈæskiː ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are , which severely limited its scope. Many computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

UTF-8

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format - 8-bit. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes.

Unicode

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji (including in colours), and non-visual control and formatting codes.

Punctuation

Punctuation marks are marks indicating how a piece of written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from 9th century BC, consisting of points between the words and horizontal strokes between sections. The alphabet-based writing begun with no spaces, no capitalization, no vowels (see abjad), and with only a few punctuation marks, as it was mostly aimed at recording business transactions.

Slash (punctuation)

The slash is the oblique slanting line punctuation mark . Also known as a stroke, a solidus, a forward slash or several other historical or technical names including oblique and virgule. Once used to mark periods and commas, the slash is now used to represent division and fractions, exclusive 'or' and inclusive 'or', and as a date separator. A slash in the reverse direction is known as a backslash. Slashes may be found in early writing as a variant form of dashes, vertical strokes, etc.

Full stop

The full stop (Commonwealth English), period (North American English), or full point is a punctuation mark. It is used for several purposes, most often to mark the end of a declarative sentence (as distinguished from a question or exclamation). This sentence-ending use, alone, defines the strictest sense of full stop. Although full stop technically applies only when the mark is used to end a sentence, the distinction – drawn since at least 1897 – is not maintained by all modern style guides and dictionaries.

Dash

The dash is a punctuation mark consisting of a long horizontal line. It is similar in appearance to the hyphen but is longer and sometimes higher from the baseline. The most common versions are the en dash , generally longer than the hyphen but shorter than the minus sign; the em dash , longer than either the en dash or the minus sign; and the horizontal bar , whose length varies across typefaces but tends to be between those of the en and em dashes.

Colon (punctuation)

The colon, , is a punctuation mark consisting of two equally sized dots aligned vertically. A colon often precedes an explanation, a list, or a quoted sentence. It is also used between hours and minutes in time, between certain elements in medical journal citations, between chapter and verse in Bible citations, and, in the US, for salutations in business letters and other formal letter writing. In Ancient Greek, in rhetoric and prosody, the term κῶλον (kôlon, 'limb, member of a body') did not refer to punctuation, but to a member or section of a complete thought or passage; see also Colon (rhetoric).

ISO/IEC 8859-1

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa.

Newline

A newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a and the start of a new one. In the mid-1800s, long before the advent of teleprinters and teletype machines, Morse code operators or telegraphists invented and used Morse code prosigns to encode white space text formatting in formal written text messages.

Quotation mark

Quotation marks are punctuation marks used in pairs in various writing systems to set off direct speech, a quotation, or a phrase. The pair consists of an opening quotation mark and a closing quotation mark, which may or may not be the same character. Quotation marks have a variety of forms in different languages and in different media. The single quotation mark is traced to Ancient Greek practice, adopted and adapted by monastic copyists. Isidore of Seville, in his seventh century encyclopedia, Etymologiae, described their use of the Greek diplé (a chevron): [13] ⟩ Diple.

Semicolon

The semicolon or semi-colon is a symbol commonly used as orthographic punctuation. In the English language, a semicolon is most commonly used to link (in a single sentence) two independent clauses that are closely related in thought, such as when restating the preceding idea with a different expression. When a semicolon joins two or more ideas in one sentence, those ideas are then given equal rank. Semicolons can also be used in place of commas to separate items in a list, particularly when the elements of the list themselves have embedded commas.

Question mark

The question mark (also known as interrogation point, query, or eroteme in journalism) is a punctuation mark that indicates an interrogative clause or phrase in many languages. In the fifth century, Syriac Bible manuscripts used question markers, according to a 2011 theory by manuscript specialist Chip Coakley: he believes the zagwa elaya ("upper pair"), a vertical double dot over a word at the start of a sentence, indicates that the sentence is a question.

Keyboard layout

A keyboard layout is any specific physical, visual or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard. is the actual positioning of keys on a keyboard. is the arrangement of the legends (labels, markings, engravings) that appear on those keys. is the arrangement of the key-meaning association or keyboard mapping, determined in software, of all the keys of a keyboard; it is this (rather than the legends) that determines the actual response to a key press.

Ligature (writing)

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters æ and œ used in English and French, in which the letters 'a' and 'e' are joined for the first ligature and the letters 'o' and 'e' are joined for the second ligature. For stylistic and legibility reasons, 'f' and 'i' are often merged to create 'fi' (where the tittle on the 'i' merges with the hood of the 'f'); the same is true of 's' and 't' to create 'st'.

Bracket

A bracket, as used in British English, is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. Typically deployed in symmetric pairs, an individual bracket may be identified as a 'left' or 'right' bracket or, alternatively, an "opening bracket" or "closing bracket", respectively, depending on the directionality of the context. There are four primary types of brackets.

Character encoding

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map". Early character codes associated with the optical or electrical telegraph could only represent a subset of the characters used in written languages, sometimes restricted to upper case letters, numerals and some punctuation only.

Escape character

In computing and telecommunication, an escape character is a character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharacters. Generally, the judgement of whether something is an escape character or not depends on the context. In the telecommunications field, escape characters are used to indicate that the following characters are encoded differently.

Tilde

The tilde ("tIldeI,-di,-d@,_"tIld) or , is a grapheme with several uses. The name of the character came into English from Spanish, which in turn came from the Latin titulus, meaning "title" or "superscription". Its primary use is as a diacritic (accent) in combination with a base letter; but for historical reasons, it is also used in standalone form within a variety of contexts. The tilde was originally written over an omitted letter or several letters as a scribal abbreviation, or "mark of suspension" and "mark of contraction", shown as a straight line when used with capitals.

Interpunct

An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. (Word-separating spaces did not appear until some time between 600 and 800 CE.) It appears in a variety of uses in some modern languages and is present in Unicode as . The multiplication dot (Unicode ) is frequently used in mathematical and scientific notation, and it may differ in appearance from the interpunct.