Space (punctuation)In writing, a space () is a blank area that separates words, sentences, syllables (in syllabification) and other written or printed glyphs (characters). Conventions for spacing vary among languages, and in some languages the spacing rules are complex. Inter-word spaces ease the reader's task of identifying words, and avoid outright ambiguities such as "now here" vs. "nowhere". They also provide convenient guides for where a human or program may start new lines.
Command-line interfaceA command-line interface (CLI) is a means of interacting with a device or computer program with commands from a user or client, and responses from the device or program, in the form of lines of text. Such access was first provided by computer terminals starting in the mid-1960s. This provided an interactive environment not available with punched cards or other input methods. Operating system command-line interfaces are often implemented with command-line interpreters or command-line processors.
Tab keyThe tab key (abbreviation of tabulator key or tabular key) on a keyboard is used to advance the cursor to the next tab stop. The word tab derives from the word tabulate, which means "to arrange data in a tabular, or table, form." When a person wanted to type a table (of numbers or text) on a typewriter, there was a lot of time-consuming and repetitive use of the space bar and backspace key. To simplify this, a horizontal bar was placed in the mechanism called the tabulator rack.
Indentation (typesetting)FORCETOC In the written form of many languages, an indentation or indent is an empty space at the beginning of a line to signal the start of a new paragraph. Many computer languages have adopted this technique to designate "paragraphs" or other logical blocks in the program. For example, the following lines are indented, using between one and six spaces: This paragraph is indented by 1 space. This paragraph is indented by 3 spaces. This paragraph is indented by 6 spaces.
Off-side ruleA computer programming language is said to adhere to the off-side rule of syntax if blocks in that language are expressed by their indentation. The term was coined by Peter Landin, possibly as a pun on the offside rule in association football. This is contrasted with free-form languages, notably curly-bracket programming languages, where indentation has no computational meaning and indent style is only a matter of coding conventions and formatting. Off-side-rule languages are also described as having significant indentation.
NewlineA newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a and the start of a new one. In the mid-1800s, long before the advent of teleprinters and teletype machines, Morse code operators or telegraphists invented and used Morse code prosigns to encode white space text formatting in formal written text messages.
Control characterIn computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters (or printable characters), except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, , which rings a terminal bell.
Indentation styleIn computer programming, an indentation style is a convention governing the indentation of blocks of code to convey program structure. This article largely addresses the free-form languages, such as C and its descendants, but can be (and often is) applied to most other programming languages (especially those in the curly bracket family), where whitespace is otherwise insignificant. Indentation style is only one aspect of programming style. Indentation is not a requirement of most programming languages, where it is used as secondary notation.
UTF-8UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format - 8-bit. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes.
Programming styleProgramming style, also known as code style, is a set of rules or guidelines used when writing the source code for a computer program. It is often claimed that following a particular programming style will help programmers read and understand source code conforming to the style, and help to avoid introducing errors. A classic work on the subject was The Elements of Programming Style, written in the 1970s, and illustrated with examples from the Fortran and PL/I languages prevalent at the time.
HaskellHaskell (ˈhæskəl) is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).
XMLExtensible Markup Language (XML) is a markup language and for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML. The design goals of XML emphasize simplicity, generality, and usability across the Internet.
DelimiterA delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values. Another example of a delimiter is the time gap used to separate letters and words in the transmission of Morse code. In mathematics, delimiters are often used to specify the scope of an operation, and can occur both as isolated symbols (e.
Text editorA text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be used to change files such as s, documentation files and programming language source code. Plain text and Rich text There are important differences between plain text (created and edited by text editors) and rich text (such as that created by word processors or desktop publishing software).
Comment (computer programming)In computer programming, a comment is a programmer-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably. Comments are sometimes also processed in various ways to generate documentation external to the source code itself by documentation generators, or used for integration with source code management systems and other kinds of external programming tools.
Character (computing)In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language. Examples of characters include letters, numerical digits, common punctuation marks (such as "." or "-"), and whitespace. The concept also includes control characters, which do not correspond to visible symbols but rather to instructions to format or process the text.
String literalA string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision (issues with brackets) and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases.
Lexical analysisLexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for large language model's data preprocessing, that encode text into numerical tokens, using byte pair encoding.
Regular expressionA regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory. The concept of regular expressions began in the 1950s, when the American mathematician Stephen Cole Kleene formalized the concept of a regular language.
ASCIIASCII (ˈæskiː ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are , which severely limited its scope. Many computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.