Nonlinear dimensionality reductionNonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.
GeodesicIn geometry, a geodesic (ˌdʒiː.əˈdɛsɪk,-oʊ-,-ˈdiːsɪk,_-zɪk) is a curve representing in some sense the shortest path (arc) between two points in a surface, or more generally in a Riemannian manifold. The term also has meaning in any differentiable manifold with a connection. It is a generalization of the notion of a "straight line". The noun geodesic and the adjective geodetic come from geodesy, the science of measuring the size and shape of Earth, though many of the underlying principles can be applied to any ellipsoidal geometry.
Geodesics on an ellipsoidThe study of geodesics on an ellipsoid arose in connection with geodesy specifically with the solution of triangulation networks. The figure of the Earth is well approximated by an oblate ellipsoid, a slightly flattened sphere. A geodesic is the shortest path between two points on a curved surface, analogous to a straight line on a plane surface. The solution of a triangulation network on an ellipsoid is therefore a set of exercises in spheroidal trigonometry .
Dimensionality reductionDimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable (hard to control or deal with).
Nearest neighbor searchNearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest-neighbor (NN) search problem is defined as follows: given a set S of points in a space M and a query point q ∈ M, find the closest point in S to q. Donald Knuth in vol.
K-nearest neighbors algorithmIn statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership.
Multidimensional scalingMultidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of objects or individuals" into a configuration of points mapped into an abstract Cartesian space. More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction.
Intrinsic metricIn the mathematical study of metric spaces, one can consider the arclength of paths in the space. If two points are at a given distance from each other, it is natural to expect that one should be able to get from the first point to the second along a path whose arclength is equal to (or very close to) that distance. The distance between two points of a metric space relative to the intrinsic metric is defined as the infimum of the lengths of all paths from the first point to the second.
Euclidean distanceIn mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore occasionally being called the Pythagorean distance. These names come from the ancient Greek mathematicians Euclid and Pythagoras, although Euclid did not represent distances as numbers, and the connection from the Pythagorean theorem to distance calculation was not made until the 18th century.
Geodesics in general relativityIn general relativity, a geodesic generalizes the notion of a "straight line" to curved spacetime. Importantly, the world line of a particle free from all external, non-gravitational forces is a particular type of geodesic. In other words, a freely moving or falling particle always moves along a geodesic. In general relativity, gravity can be regarded as not a force but a consequence of a curved spacetime geometry where the source of curvature is the stress–energy tensor (representing matter, for instance).
Gaussian curvatureIn differential geometry, the Gaussian curvature or Gauss curvature Κ of a smooth surface in three-dimensional space at a point is the product of the principal curvatures, κ1 and κ2, at the given point: The Gaussian radius of curvature is the reciprocal of Κ. For example, a sphere of radius r has Gaussian curvature 1/r2 everywhere, and a flat plane and a cylinder have Gaussian curvature zero everywhere. The Gaussian curvature can also be negative, as in the case of a hyperboloid or the inside of a torus.
Elastic mapElastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are a system of elastic springs embedded in the data space. This system approximates a low-dimensional manifold. The elastic coefficients of this system allow the switch from completely unstructured k-means clustering (zero elasticity) to the estimators located closely to linear PCA manifolds (for high bending and low stretching modules). With some intermediate values of the elasticity coefficients, this system effectively approximates non-linear principal manifolds.
Curse of dimensionalityThe curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases.
DimensionIn physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordinate is needed to specify a point on it - for example, the point at 5 on a number line. A surface, such as the boundary of a cylinder or sphere, has a dimension of two (2D) because two coordinates are needed to specify a point on it - for example, both a latitude and longitude are required to locate a point on the surface of a sphere.
Euclidean geometryEuclidean geometry is a mathematical system attributed to ancient Greek mathematician Euclid, which he described in his textbook on geometry, Elements. Euclid's approach consists in assuming a small set of intuitively appealing axioms (postulates) and deducing many other propositions (theorems) from these. Although many of Euclid's results had been stated earlier, Euclid was the first to organize these propositions into a logical system in which each result is proved from axioms and previously proved theorems.
Low-dimensional topologyIn mathematics, low-dimensional topology is the branch of topology that studies manifolds, or more generally topological spaces, of four or fewer dimensions. Representative topics are the structure theory of 3-manifolds and 4-manifolds, knot theory, and braid groups. This can be regarded as a part of geometric topology. It may also be used to refer to the study of topological spaces of dimension 1, though this is more typically considered part of continuum theory.
Geographical distanceGeographical distance or geodetic distance is the distance measured along the surface of the earth. The formulae in this article calculate distances between points which are defined by geographical coordinates in terms of latitude and longitude. This distance is an element in solving the second (inverse) geodetic problem. Calculating the distance between geographical coordinates is based on some level of abstraction; it does not provide an exact distance, which is unattainable if one attempted to account for every irregularity in the surface of the earth.
Random forestRandom forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set.
Hausdorff dimensionIn mathematics, Hausdorff dimension is a measure of roughness, or more specifically, fractal dimension, that was introduced in 1918 by mathematician Felix Hausdorff. For instance, the Hausdorff dimension of a single point is zero, of a line segment is 1, of a square is 2, and of a cube is 3. That is, for sets of points that define a smooth shape or a shape that has a small number of corners—the shapes of traditional geometry and science—the Hausdorff dimension is an integer agreeing with the usual sense of dimension, also known as the topological dimension.
Point (geometry)In classical Euclidean geometry, a point is a primitive notion that models an exact location in space, and has no length, width, or thickness. In modern mathematics, a point refers more generally to an element of some set called a space. Being a primitive notion means that a point cannot be defined in terms of previously defined objects. That is, a point is defined only by some properties, called axioms, that it must satisfy; for example, "there is exactly one line that passes through two different points".