Cluster analysisCluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
K-means clusteringk-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.
Single-linkage clusteringIn statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other. This method tends to produce long thin clusters in which nearby elements of the same cluster have small distances, but elements at opposite ends of a cluster may be much farther from each other than two elements of other clusters.
Hierarchical clusteringIn data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
Symmetric matrixIn linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally, Because equal matrices have equal dimensions, only square matrices can be symmetric. The entries of a symmetric matrix are symmetric with respect to the main diagonal. So if denotes the entry in the th row and th column then for all indices and Every square diagonal matrix is symmetric, since all off-diagonal elements are zero. Similarly in characteristic different from 2, each diagonal element of a skew-symmetric matrix must be zero, since each is its own negative.
Matrix decompositionIn the mathematical discipline of linear algebra, a matrix decomposition or matrix factorization is a factorization of a matrix into a product of matrices. There are many different matrix decompositions; each finds use among a particular class of problems. In numerical analysis, different decompositions are used to implement efficient matrix algorithms. For instance, when solving a system of linear equations , the matrix A can be decomposed via the LU decomposition.
Lloyd's algorithmIn electrical engineering and computer science, Lloyd's algorithm, also known as Voronoi iteration or relaxation, is an algorithm named after Stuart P. Lloyd for finding evenly spaced sets of points in subsets of Euclidean spaces and partitions of these subsets into well-shaped and uniformly sized convex cells. Like the closely related k-means clustering algorithm, it repeatedly finds the centroid of each set in the partition and then re-partitions the input according to which of these centroids is closest.
Similarity measureIn statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects. Though, in more broad terms, a similarity function may also satisfy metric axioms.
Correlation clusteringClustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance. Cluster analysis In machine learning, correlation clustering or cluster editing operates in a scenario where the relationships between the objects are known instead of the actual representations of the objects.
Heterogeneous computingHeterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks. Usually heterogeneity in the context of computing referred to different instruction-set architectures (ISA), where the main processor has one and other processors have another - usually a very different - architecture (maybe more than one), not just a different microarchitecture (floating point number processing is a special case of this - not usually referred to as heterogeneous).
Matrix (mathematics)In mathematics, a matrix (plural matrices) is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object. For example, is a matrix with two rows and three columns. This is often referred to as a "two by three matrix", a " matrix", or a matrix of dimension . Without further specifications, matrices represent linear maps, and allow explicit computations in linear algebra.
Square root of a matrixIn mathematics, the square root of a matrix extends the notion of square root from numbers to matrices. A matrix B is said to be a square root of A if the matrix product BB is equal to A. Some authors use the name square root or the notation A1/2 only for the specific case when A is positive semidefinite, to denote the unique matrix B that is positive semidefinite and such that BB = BTB = A (for real-valued matrices, where BT is the transpose of B).
Cholesky decompositionIn linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ʃəˈlɛski ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.
Social networkA social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for analyzing the structure of whole social entities as well as a variety of theories explaining the patterns observed in these structures. The study of these structures uses social network analysis to identify local and global patterns, locate influential entities, and examine network dynamics.
BiclusteringBiclustering, block clustering, Co-clustering or two-mode clustering is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix. The term was first introduced by Boris Mirkin to name a technique introduced many years earlier, in 1972, by John A. Hartigan. Given a set of samples represented by an -dimensional feature vector, the entire dataset can be represented as rows in columns (i.e., an matrix). The Biclustering algorithm generates Biclusters.
Polar decompositionIn mathematics, the polar decomposition of a square real or complex matrix is a factorization of the form , where is a unitary matrix and is a positive semi-definite Hermitian matrix ( is an orthogonal matrix and is a positive semi-definite symmetric matrix in the real case), both square and of the same size. Intuitively, if a real matrix is interpreted as a linear transformation of -dimensional space , the polar decomposition separates it into a rotation or reflection of , and a scaling of the space along a set of orthogonal axes.
Rotation matrixIn linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix rotates points in the xy plane counterclockwise through an angle θ about the origin of a two-dimensional Cartesian coordinate system. To perform the rotation on a plane point with standard coordinates v = (x, y), it should be written as a column vector, and multiplied by the matrix R: If x and y are the endpoint coordinates of a vector, where x is cosine and y is sine, then the above equations become the trigonometric summation angle formulae.
Matrix similarityIn linear algebra, two n-by-n matrices A and B are called similar if there exists an invertible n-by-n matrix P such that Similar matrices represent the same linear map under two (possibly) different bases, with P being the change of basis matrix. A transformation A ↦ P−1AP is called a similarity transformation or conjugation of the matrix A. In the general linear group, similarity is therefore the same as conjugacy, and similar matrices are also called conjugate; however, in a given subgroup H of the general linear group, the notion of conjugacy may be more restrictive than similarity, since it requires that P be chosen to lie in H.
Clustering coefficientIn graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971; Watts and Strogatz, 1998). Two versions of this measure exist: the global and the local.
Microscopic scaleThe microscopic scale () is the scale of objects and events smaller than those that can easily be seen by the naked eye, requiring a lens or microscope to see them clearly. In physics, the microscopic scale is sometimes regarded as the scale between the macroscopic scale and the quantum scale. Microscopic units and measurements are used to classify and describe very small objects. One common microscopic length scale unit is the micrometre (also called a micron) (symbol: μm), which is one millionth of a metre.