Publication

Distributed Learning in Non-Convex Environments-Part II: Polynomial Escape From Saddle-Points

Related concepts (31)

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).

Polynomial

In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An example of a polynomial of a single indeterminate x is x2 − 4x + 7. An example with three indeterminates is x3 + 2xyz2 − yz + 1. Polynomials appear in many areas of mathematics and science.

Polynomial ring

In mathematics, especially in the field of algebra, a polynomial ring or polynomial algebra is a ring (which is also a commutative algebra) formed from the set of polynomials in one or more indeterminates (traditionally also called variables) with coefficients in another ring, often a field. Often, the term "polynomial ring" refers implicitly to the special case of a polynomial ring in one indeterminate over a field. The importance of such polynomial rings relies on the high number of properties that they have in common with the ring of the integers.

Monic polynomial

In algebra, a monic polynomial is a non-zero univariate polynomial (that is, a polynomial in a single variable) in which the leading coefficient (the nonzero coefficient of highest degree) is equal to 1. That is to say, a monic polynomial is one that can be written as with Monic polynomials are widely used in algebra and number theory, since they produce many simplifications and they avoid divisions and denominators. Here are some examples. Every polynomial is associated to a unique monic polynomial.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Gradient descent

In mathematics, gradient descent (also often called steepest descent) is a iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a local maximum of that function; the procedure is then known as gradient ascent.

Short-term memory

Short-term memory (or "primary" or "active memory") is the capacity for holding a small amount of information in an active, readily available state for a short interval. For example, short-term memory holds a phone number that has just been recited. The duration of short-term memory (absent rehearsal or active maintenance) is estimated to be on the order of seconds. The commonly cited capacity of 7 items, found in Miller's Law, has been superseded by 4±1 items. In contrast, long-term memory holds information indefinitely.

Mandelbrot set

The Mandelbrot set (ˈmændəlbroʊt,_-brɒt) is a two dimensional set with a relatively simple definition that exhibits great complexity, especially as it is magnified. It is popular for its aesthetic appeal and fractal structures. The set is defined in the complex plane as the complex numbers for which the function does not diverge to infinity when iterated starting at , i.e., for which the sequence , , etc., remains bounded in absolute value. This set was first defined and drawn by Robert W.

Homogeneous polynomial

In mathematics, a homogeneous polynomial, sometimes called quantic in older texts, is a polynomial whose nonzero terms all have the same degree. For example, is a homogeneous polynomial of degree 5, in two variables; the sum of the exponents in each term is always 5. The polynomial is not homogeneous, because the sum of exponents does not match from term to term. The function defined by a homogeneous polynomial is always a homogeneous function. An algebraic form, or simply form, is a function defined by a homogeneous polynomial.

Feature learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process.

Symmetric polynomial

In mathematics, a symmetric polynomial is a polynomial P(X1, X2, ..., Xn) in n variables, such that if any of the variables are interchanged, one obtains the same polynomial. Formally, P is a symmetric polynomial if for any permutation σ of the subscripts 1, 2, ..., n one has P(Xσ(1), Xσ(2), ..., Xσ(n)) = P(X1, X2, ..., Xn). Symmetric polynomials arise naturally in the study of the relation between the roots of a polynomial in one variable and its coefficients, since the coefficients can be given by polynomial expressions in the roots, and all roots play a similar role in this setting.

System of polynomial equations

A system of polynomial equations (sometimes simply a polynomial system) is a set of simultaneous equations f1 = 0, ..., fh = 0 where the fi are polynomials in several variables, say x1, ..., xn, over some field k. A solution of a polynomial system is a set of values for the xis which belong to some algebraically closed field extension K of k, and make all equations true. When k is the field of rational numbers, K is generally assumed to be the field of complex numbers, because each solution belongs to a field extension of k, which is isomorphic to a subfield of the complex numbers.

Distributed artificial intelligence

Distributed Artificial Intelligence (DAI) also called Decentralized Artificial Intelligence is a subfield of artificial intelligence research dedicated to the development of distributed solutions for problems. DAI is closely related to and a predecessor of the field of multi-agent systems. Multi-agent systems and distributed problem solving are the two main DAI approaches. There are numerous applications and tools. Distributed Artificial Intelligence (DAI) is an approach to solving complex learning, planning, and decision-making problems.

Learning rate

In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns". In the adaptive control literature, the learning rate is commonly referred to as gain. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting.

Irreducible polynomial

In mathematics, an irreducible polynomial is, roughly speaking, a polynomial that cannot be factored into the product of two non-constant polynomials. The property of irreducibility depends on the nature of the coefficients that are accepted for the possible factors, that is, the field to which the coefficients of the polynomial and its possible factors are supposed to belong. For example, the polynomial x2 − 2 is a polynomial with integer coefficients, but, as every integer is also a real number, it is also a polynomial with real coefficients.

Julia set

In the context of complex dynamics, a branch of mathematics, the Julia set and the Fatou set are two complementary sets (Julia "laces" and Fatou "dusts") defined from a function. Informally, the Fatou set of the function consists of values with the property that all nearby values behave similarly under repeated iteration of the function, and the Julia set consists of values such that an arbitrarily small perturbation can cause drastic changes in the sequence of iterated function values.

Long short-term memory

Long short-term memory (LSTM) network is a recurrent neural network (RNN), aimed to deal with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory".

External ray

An external ray is a curve that runs from infinity toward a Julia or Mandelbrot set. Although this curve is only rarely a half-line (ray) it is called a ray because it is an image of a ray. External rays are used in complex analysis, particularly in complex dynamics and geometric function theory. External rays were introduced in Douady and Hubbard's study of the Mandelbrot set Criteria for classification : plane : parameter or dynamic map bifurcation of dynamic rays Stretching landing External rays of (connected) Julia sets on dynamical plane are often called dynamic rays.

Long-term memory

Long-term memory (LTM) is the stage of the Atkinson–Shiffrin memory model in which informative knowledge is held indefinitely. It is defined in contrast to short-term and working memory, which persist for only about 18 to 30 seconds. LTM is commonly labelled as "explicit memory" (declarative), as well as "episodic memory," "semantic memory," "autobiographical memory," and "implicit memory" (procedural memory). The idea of separate memories for short- and long-term storage originated in the 19th century.

Saddle point

In mathematics, a saddle point or minimax point is a point on the surface of the graph of a function where the slopes (derivatives) in orthogonal directions are all zero (a critical point), but which is not a local extremum of the function. An example of a saddle point is when there is a critical point with a relative minimum along one axial direction (between peaks) and at a relative maximum along the crossing axis. However, a saddle point need not be in this form.