Publication

AKSEL: Fast Byzantine SGD

Related concepts (31)

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).

Computational complexity

In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) and memory storage requirements. The complexity of a problem is the complexity of the best algorithms that allow solving the problem. The study of the complexity of explicitly given algorithms is called analysis of algorithms, while the study of the complexity of problems is called computational complexity theory.

Time complexity

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

Exponential time hypothesis

In computational complexity theory, the exponential time hypothesis is an unproven computational hardness assumption that was formulated by . It states that satisfiability of 3-CNF Boolean formulas cannot be solved in subexponential time, i.e., for all constant , where n is the number of variables in the formula. The exponential time hypothesis, if true, would imply that P ≠ NP, but it is a stronger statement.

Gradient descent

In mathematics, gradient descent (also often called steepest descent) is a iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a local maximum of that function; the procedure is then known as gradient ascent.

Reduction (complexity)

In computability theory and computational complexity theory, a reduction is an algorithm for transforming one problem into another problem. A sufficiently efficient reduction from one problem to another may be used to show that the second problem is at least as difficult as the first. Intuitively, problem A is reducible to problem B, if an algorithm for solving problem B efficiently (if it existed) could also be used as a subroutine to solve problem A efficiently. When this is true, solving A cannot be harder than solving B.

NP (complexity)

In computational complexity theory, NP (nondeterministic polynomial time) is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time by a deterministic Turing machine, or alternatively the set of problems that can be solved in polynomial time by a nondeterministic Turing machine. NP is the set of decision problems solvable in polynomial time by a nondeterministic Turing machine.

Communication complexity

In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first introduced by Andrew Yao in 1979, while studying the problem of computation distributed among several machines. The problem is usually stated as follows: two parties (traditionally called Alice and Bob) each receive a (potentially different) -bit string and .

Computation

A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computations are mathematical equations and computer algorithms. Mechanical or electronic devices (or, historically, people) that perform computations are known as computers. The study of computation is the field of computability, itself a sub-field of computer science. The notion that mathematical statements should be ‘well-defined’ had been argued by mathematicians since at least the 1600s, but agreement on a suitable definition proved elusive.

Computational complexity theory

In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm. A problem is regarded as inherently difficult if its solution requires significant resources, whatever the algorithm used.

Theory of computation

In theoretical computer science and mathematics, the theory of computation is the branch that deals with what problems can be solved on a model of computation, using an algorithm, how efficiently they can be solved or to what degree (e.g., approximate solutions versus precise ones). The field is divided into three major branches: automata theory and formal languages, computability theory, and computational complexity theory, which are linked by the question: "What are the fundamental capabilities and limitations of computers?".

Online machine learning

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms.

Four-dimensional space

Four-dimensional space (4D) is the mathematical extension of the concept of three-dimensional space (3D). Three-dimensional space is the simplest possible abstraction of the observation that one needs only three numbers, called dimensions, to describe the sizes or locations of objects in the everyday world. For example, the volume of a rectangular box is found by measuring and multiplying its length, width, and height (often labeled x, y, and z).

Three-dimensional space

In geometry, a three-dimensional space (3D space, 3-space or, rarely, tri-dimensional space) is a mathematical space in which three values (coordinates) are required to determine the position of a point. Most commonly, it is the three-dimensional Euclidean space, the Euclidean n-space of dimension n=3 that models physical space. More general three-dimensional spaces are called 3-manifolds. Technically, a tuple of n numbers can be understood as the Cartesian coordinates of a location in a n-dimensional Euclidean space.

Dimension

In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordinate is needed to specify a point on it - for example, the point at 5 on a number line. A surface, such as the boundary of a cylinder or sphere, has a dimension of two (2D) because two coordinates are needed to specify a point on it - for example, both a latitude and longitude are required to locate a point on the surface of a sphere.

Multiclass classification

In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary classification). While many classification algorithms (notably multinomial logistic regression) naturally permit the use of more than two classes, some are by nature binary algorithms; these can, however, be turned into multinomial classifiers by a variety of strategies.

Conjugate gradient method

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

Polynomial-time approximation scheme

In computer science (particularly algorithmics), a polynomial-time approximation scheme (PTAS) is a type of approximation algorithm for optimization problems (most often, NP-hard optimization problems). A PTAS is an algorithm which takes an instance of an optimization problem and a parameter ε > 0 and produces a solution that is within a factor 1 + ε of being optimal (or 1 – ε for maximization problems). For example, for the Euclidean traveling salesman problem, a PTAS would produce a tour with length at most (1 + ε)L, with L being the length of the shortest tour.

Byzantine fault

A Byzantine fault (also Byzantine generals problem, interactive consistency, source congruency, error avalanche, Byzantine agreement problem, and Byzantine failure) is a condition of a computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed. The term takes its name from an allegory, the "Byzantine generals problem", developed to describe a situation in which, to avoid catastrophic failure of the system, the system's actors must agree on a concerted strategy, but some of these actors are unreliable.

Computational science

Computational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science that uses advanced computing capabilities to understand and solve complex physical problems. This includes Algorithms (numerical and non-numerical): mathematical models, computational models, and computer simulations developed to solve sciences (e.