Rank–size distributionRank–size distribution is the distribution of size by rank, in decreasing order of size. For example, if a data set consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5 (ranks 1 through 4). This is also known as the rank–frequency distribution, when the source data are from a frequency distribution. These are particularly of interest when the data vary significantly in scales, such as city size or word frequency.
Algebra of random variablesThe algebra of random variables in statistics, provides rules for the symbolic manipulation of random variables, while avoiding delving too deeply into the mathematically sophisticated ideas of probability theory. Its symbolism allows the treatment of sums, products, ratios and general functions of random variables, as well as dealing with operations such as finding the probability distributions and the expectations (or expected values), variances and covariances of such combinations.
Parallel coordinatesParallel coordinates are a common way of visualizing and analyzing high-dimensional datasets. To show a set of points in an n-dimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced. A point in n-dimensional space is represented as a polyline with vertices on the parallel axes; the position of the vertex on the i-th axis corresponds to the i-th coordinate of the point.
Long tailIn statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences far from the "head" or central part of the distribution. The distribution could involve popularities, random numbers of occurrences of events with various probabilities, etc. The term is often used loosely, with no definition or an arbitrary definition, but precise definitions are possible. In statistics, the term long-tailed distribution has a narrow technical meaning, and is a subtype of heavy-tailed distribution.
ChartA chart (sometimes known as a graph) is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of quality structure and provides different info. The term "chart" as a graphical representation of data has multiple meanings: A data chart is a type of diagram or graph, that organizes and represents a set of numerical or qualitative data.
Correlation function (statistical mechanics)In statistical mechanics, the correlation function is a measure of the order in a system, as characterized by a mathematical correlation function. Correlation functions describe how microscopic variables, such as spin and density, at different positions are related. More specifically, correlation functions quantify how microscopic variables co-vary with one another on average across space and time. A classic example of such spatial correlations is in ferro- and antiferromagnetic materials, where the spins prefer to align parallel and antiparallel with their nearest neighbors, respectively.
Log–log plotIn science and engineering, a log–log graph or log–log plot is a two-dimensional graph of numerical data that uses logarithmic scales on both the horizontal and vertical axes. Power functions – relationships of the form – appear as straight lines in a log–log graph, with the exponent corresponding to the slope, and the coefficient corresponding to the intercept. Thus these graphs are very useful for recognizing these relationships and estimating parameters. Any base can be used for the logarithm, though most commonly base 10 (common logs) are used.
Visual analyticsVisual analytics is an outgrowth of the fields of information visualization and scientific visualization that focuses on analytical reasoning facilitated by interactive visual interfaces. Visual analytics is "the science of analytical reasoning facilitated by interactive visual interfaces." It can attack certain problems whose size, complexity, and need for closely coupled human and machine analysis may make them otherwise intractable.
Plot (graphics)A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values.
Information designInformation design is the practice of presenting information in a way that fosters an efficient and effective understanding of the information. The term has come to be used for a specific area of graphic design related to displaying information effectively, rather than just attractively or for artistic expression. Information design is closely related to the field of data visualization and is often taught as part of graphic design courses.
Power lawIn statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to a power of the change, independent of the initial size of those quantities: one quantity varies as a power of another. For instance, considering the area of a square in terms of the length of its side, if the length is doubled, the area is multiplied by a factor of four.
Pareto principleThe Pareto principle states that for many outcomes, roughly 80% of consequences come from 20% of causes (the "vital few"). Other names for this principle are the 80/20 rule, the law of the vital few, or the principle of factor sparsity. Management consultant Joseph M. Juran developed the concept in the context of quality control and improvement after reading the works of Italian sociologist and economist Vilfredo Pareto, who wrote about the 80/20 connection while teaching at the University of Lausanne.
Edward TufteEdward Rolf Tufte (ˈtʌfti; born March 14, 1942), sometimes known as "ET", is an American statistician and professor emeritus of political science, statistics, and computer science at Yale University. He is noted for his writings on information design and as a pioneer in the field of data visualization. Edward Rolf Tufte was born in 1942 in Kansas City, Missouri, to Virginia Tufte (1918–2020) and Edward E. Tufte (1912–1999). He grew up in Beverly Hills, California, where his father was a longtime city official, and he graduated from Beverly Hills High School.
Law of total probabilityIn probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct events, hence the name. The law of total probability is a theorem that states, in its discrete case, if is a finite or countably infinite partition of a sample space (in other words, a set of pairwise disjoint events whose union is the entire sample space) and each event is measurable, then for any event of the same sample space: or, alternatively, where, for any for which these terms are simply omitted from the summation, because is finite.
Law of total expectationThe proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if is a random variable whose expected value is defined, and is any random variable on the same probability space, then i.e., the expected value of the conditional expected value of given is the same as the expected value of .
Law of total varianceIn probability theory, the law of total variance or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law, states that if and are random variables on the same probability space, and the variance of is finite, then In language perhaps better known to statisticians than to probability theorists, the two terms are the "unexplained" and the "explained" components of the variance respectively (cf. fraction of variance unexplained, explained variation).
Table (information)A table is an arrangement of information or data, typically in rows and columns, or possibly in a more complex structure. Tables are widely used in communication, research, and data analysis. Tables appear in print media, handwritten notes, computer software, architectural ornamentation, traffic signs, and many other places. The precise conventions and terminology for describing tables vary depending on the context. Further, tables differ significantly in variety, structure, flexibility, notation, representation and use.
Data and information visualizationData and information visualization (data viz or info viz) is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items.
Benford's lawBenford's law, also known as the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation that in many real-life sets of numerical data, the leading digit is likely to be small. In sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time.
Lévy flightA Lévy flight is a random walk in which the step-lengths have a stable distribution, a probability distribution that is heavy-tailed. When defined as a walk in a space of dimension greater than one, the steps made are in isotropic random directions. Later researchers have extended the use of the term "Lévy flight" to also include cases where the random walk takes place on a discrete grid rather than on a continuous space. The term "Lévy flight" was coined by Benoît Mandelbrot, who used this for one specific definition of the distribution of step sizes.