Publication

Trustworthy Numerical Computation in Scala

Concepts associés (32)

vignette|Comme la notation scientifique, le nombre à virgule flottante a une mantisse et un exposant. La virgule flottante est une méthode d'écriture de nombres fréquemment utilisée dans les ordinateurs, équivalente à la notation scientifique en numération binaire. Elle consiste à représenter un nombre par : un signe (égal à −1 ou 1) ; une mantisse (aussi appelée significande) ; et un exposant (entier relatif, généralement borné).

Single-precision floating-point format

Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floating-point variable can represent a wider range of numbers than a fixed-point variable of the same bit width at the cost of precision. A signed 32-bit integer variable has a maximum value of 231 − 1 = 2,147,483,647, whereas an IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2−23) × 2127 ≈ 3.

Double-precision floating-point format

Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Floating point is used to represent fractional values, or when a wider range is needed than is provided by fixed point (of the same bit width), even if at the cost of precision. Double precision may be chosen when the range or precision of single precision would be insufficient.

IEEE 754

En informatique, l’IEEE 754 est une norme sur l'arithmétique à virgule flottante mise au point par le Institute of Electrical and Electronics Engineers. Elle est la norme la plus employée actuellement pour le calcul des nombres à virgule flottante avec les CPU et les FPU. La norme définit les formats de représentation des nombres à virgule flottante (signe, mantisse, exposant, nombres dénormalisés) et valeurs spéciales (infinis et NaN), en même temps qu’un ensemble d’opérations sur les nombres flottants.

Bfloat16 floating-point format

The bfloat16 (brain floating point) floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format.

Quadruple-precision floating-point format

In computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision. This 128-bit quadruple precision is designed not only for applications requiring results in higher than double precision, but also, as a primary function, to allow the computation of double precision results more reliably and accurately by minimising overflow and round-off errors in intermediate calculations and scratch variables.

Langage de programmation

thumb|Fragment de code écrit dans le langage de programmation JavaScript. Un langage de programmation est un langage informatique destiné à formuler des algorithmes et produire des programmes informatiques qui les appliquent. D'une manière similaire à une langue naturelle, un langage de programmation est composé d'un alphabet, d'un vocabulaire, de règles de grammaire, de significations, mais aussi d'un environnement de traduction censé rendre sa syntaxe compréhensible par la machine.

Half-precision floating-point format

In computing, half precision (sometimes called FP16 or float16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular and neural networks. Almost all modern uses follow the IEEE 754-2008 standard, where the 16-bit base-2 format is referred to as binary16, and the exponent uses 5 bits.

C (langage)

C est un langage de programmation impératif, généraliste et de bas niveau. Inventé au début des années 1970 pour réécrire Unix, C est devenu un des langages les plus utilisés, encore de nos jours. De nombreux langages plus modernes comme C++, C#, Java et PHP ou JavaScript ont repris une syntaxe similaire au C et reprennent en partie sa logique. C offre au développeur une marge de contrôle importante sur la machine (notamment sur la gestion de la mémoire) et est de ce fait utilisé pour réaliser les « fondations » (compilateurs, interpréteurs.

Nombre réel

En mathématiques, un nombre réel est un nombre qui peut être représenté par une partie entière et une liste finie ou infinie de décimales. Cette définition s'applique donc aux nombres rationnels, dont les décimales se répètent de façon périodique à partir d'un certain rang, mais aussi à d'autres nombres dits irrationnels, tels que la racine carrée de 2, π et e.

Decimal floating point

Decimal floating-point (DFP) arithmetic refers to both a representation and operations on decimal floating-point numbers. Working directly with decimal (base-10) fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions (common in human-entered data, such as measurements or financial information) and binary (base-2) fractions. The advantage of decimal floating-point representation over decimal fixed-point and integer representation is that it supports a much wider range of values.

Theory of computation

In theoretical computer science and mathematics, the theory of computation is the branch that deals with what problems can be solved on a model of computation, using an algorithm, how efficiently they can be solved or to what degree (e.g., approximate solutions versus precise ones). The field is divided into three major branches: automata theory and formal languages, computability theory, and computational complexity theory, which are linked by the question: "What are the fundamental capabilities and limitations of computers?".

Langage de programmation exotique

Un langage de programmation exotique est un langage de programmation imaginé comme un test des limites de la création de langages de programmation, un exercice intellectuel ou encore une blague, sans aucune intention de créer un langage réellement utile. De tels langages sont souvent un passe-temps pour les hackers ou les programmeurs. L'adjectif « exotique » permet de distinguer ces langages de ceux communément utilisés dans l'industrie.

Comparison of programming languages

Programming languages are used for controlling the behavior of a machine (often a computer). Like natural languages, programming languages follow rules for syntax and semantics. There are thousands of programming languages and new ones are created every year. Few languages ever become sufficiently popular that they are used by more than a few people, but professional programmers may use dozens of languages in a career. Most programming languages are not standardized by an international (or national) standard, even widely used ones, such as Perl or Standard ML (despite the name).

Erreur d'arrondi

Une erreur d'arrondi est la différence entre la valeur approchée calculée d'un nombre et sa valeur mathématique exacte. Des erreurs d'arrondi naissent généralement lorsque des nombres exacts sont représentés dans un système incapable de les exprimer exactement. Les erreurs d'arrondi se propagent au cours des calculs avec des valeurs approchées ce qui peut augmenter l'erreur du résultat final. Dans le système décimal des erreurs d'arrondi sont engendrées, lorsqu'avec une troncature, un grand nombre (peut-être une infinité) de décimales ne sont pas prises en considération.

Model of computation

In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how units of computations, memories, and communications are organized. The computational complexity of an algorithm can be measured given a model of computation. Using a model allows studying the performance of algorithms independently of the variations that are specific to particular implementations and specific technology.

Langage de programmation de bas niveau

vignette|Language de programmation Un langage de programmation de bas niveau ne fournit que peu d'abstraction par rapport au jeu d'instructions du processeur de la machine. Les langages de bas niveau sont à opposer aux langages de haut niveau, qui permettent de créer un programme sans tenir compte des caractéristiques particulières (registres, etc) de l'ordinateur censé exécuter le programme. Le langage machine et le langage d'assemblage sont les archétypes de langages de bas niveau, puisqu'ils permettent de manipuler explicitement des registres, des adresses mémoires, des instructions machines.

Analyse numérique

L’analyse numérique est une discipline à l'interface des mathématiques et de l'informatique. Elle s’intéresse tant aux fondements qu’à la mise en pratique des méthodes permettant de résoudre, par des calculs purement numériques, des problèmes d’analyse mathématique. Plus formellement, l’analyse numérique est l’étude des algorithmes permettant de résoudre numériquement par discrétisation les problèmes de mathématiques continues (distinguées des mathématiques discrètes).

System programming language

A system programming language is a programming language used for system programming; such languages are designed for writing system software, which usually requires different development approaches when compared with application software. Edsger Dijkstra refers to these languages as machine oriented high order languages, or mohol. General-purpose programming languages tend to focus on generic features to allow programs written in the language to use the same code on different platforms.

Floating-point error mitigation

Floating-point error mitigation is the minimization of errors caused by the fact that real numbers cannot, in general, be accurately represented in a fixed space. By definition, floating-point error cannot be eliminated, and, at best, can only be managed. Huberto M. Sierra noted in his 1956 patent "Floating Decimal Point Arithmetic Control Means for Calculator": Thus under some conditions, the major portion of the significant data digits may lie beyond the capacity of the registers.