Publication

Learning search polices from humans in a partially observable context

Related concepts (32)

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes.

Decision-making

In psychology, decision-making (also spelled decision making and decisionmaking) is regarded as the cognitive process resulting in the selection of a belief or a course of action among several possible alternative options. It could be either rational or irrational. The decision-making process is a reasoning process based on assumptions of values, preferences and beliefs of the decision-maker. Every decision-making process produces a final choice, which may or may not prompt action.

Markov model

In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property.

Decision problem

In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whether a given natural number is prime. Another is the problem "given two numbers x and y, does x evenly divide y?". The answer is either 'yes' or 'no' depending upon the values of x and y. A method for solving a decision problem, given in the form of an algorithm, is called a decision procedure for that problem.

Knapsack problem

The knapsack problem is the following problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items.

Automated planning and scheduling

Automated planning and scheduling, sometimes denoted as simply AI planning, is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex and must be discovered and optimized in multidimensional space. Planning is also related to decision theory. In known environments with available models, planning can be done offline.

Risk

In simple terms, risk is the possibility of something bad happening. Risk involves uncertainty about the effects/implications of an activity with respect to something that humans value (such as health, well-being, wealth, property or the environment), often focusing on negative, undesirable consequences. Many different definitions have been proposed. The international standard definition of risk for common understanding in different applications is "effect of uncertainty on objectives".

Propagation of uncertainty

In statistics, propagation of uncertainty (or propagation of error) is the effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations (e.g., instrument precision) which propagate due to the combination of variables in the function. The uncertainty u can be expressed in a number of ways. It may be defined by the absolute error Δx.

Risk aversion

In economics and finance, risk aversion is the tendency of people to prefer outcomes with low uncertainty to those outcomes with high uncertainty, even if the average outcome of the latter is equal to or higher in monetary value than the more certain outcome. Risk aversion explains the inclination to agree to a situation with a more predictable, but possibly lower payoff, rather than another situation with a highly unpredictable, but possibly higher payoff.

Problem solving

Problem solving is the process of achieving a goal by overcoming obstacles, a frequent part of most activities. Problems in need of solutions range from simple personal tasks (e.g. how to turn on an appliance) to complex issues in business and technical fields. The former is an example of simple problem solving (SPS) addressing one issue, whereas the latter is complex problem solving (CPS) with multiple interrelated obstacles.

Markov chain

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC).

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.

Group decision-making

Group decision-making (also known as collaborative decision-making or collective decision-making) is a situation faced when individuals collectively make a choice from the alternatives before them. The decision is then no longer attributable to any single individual who is a member of the group. This is because all the individuals and social group processes such as social influence contribute to the outcome. The decisions made by groups are often different from those made by individuals.

Risk neutral preferences

In economics and finance, risk neutral preferences are preferences that are neither risk averse nor risk seeking. A risk neutral party's decisions are not affected by the degree of uncertainty in a set of outcomes, so a risk neutral party is indifferent between choices with equal expected payoffs even if one choice is riskier. In the context of the theory of the firm, a risk neutral firm facing risk about the market price of its product, and caring only about profit, would maximize the expected value of its profit (with respect to its choices of labor input usage, output produced, etc.

Decisional balance sheet

A decisional balance sheet or decision balance sheet is a tabular method for representing the pros and cons of different choices and for helping someone decide what to do in a certain circumstance. It is often used in working with ambivalence in people who are engaged in behaviours that are harmful to their health (for example, problematic substance use or excessive eating), as part of psychological approaches such as those based on the transtheoretical model of change, and in certain circumstances in motivational interviewing.

Consensus decision-making

Consensus decision-making or consensus process (often abbreviated to consensus) are group decision-making processes in which participants develop and decide on proposals with the aim, or requirement, of acceptance by all. The focus on establishing agreement of at least the majority or the supermajority and avoiding unproductive opinion differentiates consensus from unanimity, which requires all participants to support a decision. The word consensus is Latin meaning "agreement, accord", derived from consentire meaning "feel together".

Navigation

Navigation is a field of study that focuses on the process of monitoring and controlling the movement of a craft or vehicle from one place to another. The field of navigation includes four general categories: land navigation, marine navigation, aeronautic navigation, and space navigation. It is also the term of art used for the specialized knowledge used by navigators to perform navigation tasks. All navigational techniques involve locating the navigator's position compared to known locations or patterns.

Risk-seeking

In accounting, finance, and economics, a risk-seeker or risk-lover is a person who has a preference for risk. While most investors are considered risk averse, one could view casino-goers as risk-seeking. A common example to explain risk-seeking behaviour is; If offered two choices; either

50 as a sure thing, or a 50% chance each of either

100 or nothing, a risk-seeking person would prefer the gamble. Even though the gamble and the "sure thing" have the same expected value, the preference for risk makes the gamble's expected utility for the individual much higher.

Computational problem

In theoretical computer science, a computational problem is a problem that may be solved by an algorithm. For example, the problem of factoring "Given a positive integer n, find a nontrivial prime factor of n." is a computational problem. A computational problem can be viewed as a set of instances or cases together with a, possibly empty, set of solutions for every instance/case. For example, in the factoring problem, the instances are the integers n, and solutions are prime numbers p that are the nontrivial prime factors of n.

Markov property

In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov property, except that the meaning of "present" is defined in terms of a random variable known as a stopping time. The term Markov assumption is used to describe a model where the Markov property is assumed to hold, such as a hidden Markov model.