Publication

Efficient Greedy Coordinate Descent for Composite Problems

Related concepts (29)

In number theory, an n-smooth (or n-friable) number is an integer whose prime factors are all less than or equal to n. For example, a 7-smooth number is a number whose every prime factor is at most 7, so 49 = 72 and 15750 = 2 × 32 × 53 × 7 are both 7-smooth, while 11 and 702 = 2 × 33 × 13 are not 7-smooth. The term seems to have been coined by Leonard Adleman. Smooth numbers are especially important in cryptography, which relies on factorization of integers.

Rate of convergence

In numerical analysis, the order of convergence and the rate of convergence of a convergent sequence are quantities that represent how quickly the sequence approaches its limit. A sequence that converges to is said to have order of convergence and rate of convergence if The rate of convergence is also called the asymptotic error constant. Note that this terminology is not standardized and some authors will use rate where this article uses order (e.g., ).

Regularization (mathematics)

In mathematics, statistics, finance, computer science, particularly in machine learning and inverse problems, regularization is a process that changes the result answer to be "simpler". It is often used to obtain results for ill-posed problems or to prevent overfitting. Although regularization procedures can be divided in many ways, the following delineation is particularly helpful: Explicit regularization is regularization whenever one explicitly adds a term to the optimization problem.

Ridge regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also known as Tikhonov regularization, named for Andrey Tikhonov, it is a method of regularization of ill-posed problems. It is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters.

Elastic net regularization

In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. The elastic net method overcomes the limitations of the LASSO (least absolute shrinkage and selection operator) method which uses a penalty function based on Use of this penalty function has several limitations. For example, in the "large p, small n" case (high-dimensional data with few examples), the LASSO selects at most n variables before it saturates.

Smoothness

In mathematical analysis, the smoothness of a function is a property measured by the number of continuous derivatives it has over some domain, called differentiability class. At the very minimum, a function could be considered smooth if it is differentiable everywhere (hence continuous). At the other end, it might also possess derivatives of all orders in its domain, in which case it is said to be infinitely differentiable and referred to as a C-infinity function (or function).

Coordinate descent

Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size.

Lasso (statistics)

In statistics and machine learning, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term. Lasso was originally formulated for linear regression models. This simple case reveals a substantial amount about the estimator.

Hill climbing

In numerical analysis, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by making an incremental change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found. For example, hill climbing can be applied to the travelling salesman problem.

Mathematical optimization

Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

Conjecture

In mathematics, a conjecture is a conclusion or a proposition that is proffered on a tentative basis without proof. Some conjectures, such as the Riemann hypothesis (still a conjecture) or Fermat's Last Theorem (a conjecture until proven in 1995 by Andrew Wiles), have shaped much of mathematical history as new areas of mathematics are developed in order to prove them. Formal mathematics is based on provable truth.

Group selection

Group selection is a proposed mechanism of evolution in which natural selection acts at the level of the group, instead of at the level of the individual or gene. Early authors such as V. C. Wynne-Edwards and Konrad Lorenz argued that the behavior of animals could affect their survival and reproduction as groups, speaking for instance of actions for the good of the species. In the 1930s, R.A. Fisher and J.B.S.

Simpson's rule

In numerical integration, Simpson's rules are several approximations for definite integrals, named after Thomas Simpson (1710–1761). The most basic of these rules, called Simpson's 1/3 rule, or just Simpson's rule, reads In German and some other languages, it is named after Johannes Kepler, who derived it in 1615 after seeing it used for wine barrels (barrel rule, Keplersche Fassregel). The approximate equality in the rule becomes exact if f is a polynomial up to and including 3rd degree.

Feature selection

Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction. Feature selection techniques are used for several reasons: simplification of models to make them easier to interpret by researchers/users, shorter training times, to avoid the curse of dimensionality, improve data's compatibility with a learning model class, encode inherent symmetries present in the input space.

Convex optimization

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

Trapezoidal rule

In calculus, the trapezoidal rule (also known as the trapezoid rule or trapezium rule; see Trapezoid for more information on terminology) is a technique for approximating the definite integral. The trapezoidal rule works by approximating the region under the graph of the function as a trapezoid and calculating its area. It follows that The trapezoidal rule may be viewed as the result obtained by averaging the left and right Riemann sums, and is sometimes defined this way.

Least squares

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation. The most important application is in data fitting.

Millennium Prize Problems

The Millennium Prize Problems are seven well-known complex mathematical problems selected by the Clay Mathematics Institute in 2000. The Clay Institute has pledged a US$1 million prize for the first correct solution to each problem. The Clay Mathematics Institute officially designated the title Millennium Problem for the seven unsolved mathematical problems, the Birch and Swinnerton-Dyer conjecture, Hodge conjecture, Navier–Stokes existence and smoothness, P versus NP problem, Riemann hypothesis, Yang–Mills existence and mass gap, and the Poincaré conjecture at the Millennium Meeting held on May 24, 2000.

Integer factorization

In number theory, integer factorization is the decomposition, when possible, of a positive integer into a product of smaller integers. If the factors are further restricted to be prime numbers, the process is called prime factorization, and includes the test whether the given integer is prime (in this case, one has a "product" of a single factor). When the numbers are sufficiently large, no efficient non-quantum integer factorization algorithm is known. However, it has not been proven that such an algorithm does not exist.

Linear form

In mathematics, a linear form (also known as a linear functional, a one-form, or a covector) is a linear map from a vector space to its field of scalars (often, the real numbers or the complex numbers). If V is a vector space over a field k, the set of all linear functionals from V to k is itself a vector space over k with addition and scalar multiplication defined pointwise. This space is called the dual space of V, or sometimes the algebraic dual space, when a topological dual space is also considered.