Publication

Impact of quantum-chemical metrics on the machine learning prediction of electron density

Related concepts (32)

Electron density or electronic density is the measure of the probability of an electron being present at an infinitesimal element of space surrounding any given point. It is a scalar quantity depending upon three spatial variables and is typically denoted as either or . The density is determined, through definition, by the normalised -electron wavefunction which itself depends upon variables ( spatial and spin coordinates). Conversely, the density determines the wave function modulo up to a phase factor, providing the formal foundation of density functional theory.

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Charge density

In electromagnetism, charge density is the amount of electric charge per unit length, surface area, or volume. Volume charge density (symbolized by the Greek letter ρ) is the quantity of charge per unit volume, measured in the SI system in coulombs per cubic meter (C⋅m−3), at any point in a volume. Surface charge density (σ) is the quantity of charge per unit area, measured in coulombs per square meter (C⋅m−2), at any point on a surface charge distribution on a two dimensional surface.

Automated machine learning

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. AutoML was proposed as an artificial intelligence-based solution to the growing challenge of applying machine learning. The high degree of automation in AutoML aims to allow non-experts to make use of machine learning models and techniques without requiring them to become experts in machine learning.

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.

Density functional theory

Density-functional theory (DFT) is a computational quantum mechanical modelling method used in physics, chemistry and materials science to investigate the electronic structure (or nuclear structure) (principally the ground state) of many-body systems, in particular atoms, molecules, and the condensed phases. Using this theory, the properties of a many-electron system can be determined by using functionals, i.e. functions of another function. In the case of DFT, these are functionals of the spatially dependent electron density.

Regression analysis

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.

Logistic regression

In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (the coefficients in the linear combination).

Computational chemistry

Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into computer programs, to calculate the structures and properties of molecules, groups of molecules, and solids. It is essential because, apart from relatively recent results concerning the hydrogen molecular ion (dihydrogen cation, see references therein for more details), the quantum many-body problem cannot be solved analytically, much less in closed form.

Free electron model

In solid-state physics, the free electron model is a quantum mechanical model for the behaviour of charge carriers in a metallic solid. It was developed in 1927, principally by Arnold Sommerfeld, who combined the classical Drude model with quantum mechanical Fermi–Dirac statistics and hence it is also known as the Drude–Sommerfeld model.

Protein structure prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).

Gene prediction

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced. In its earliest days, "gene finding" was based on painstaking experimentation on living cells and organisms.

Online machine learning

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms.

Area density

The area density (also known as areal density, surface density, superficial density, areic density, mass thickness, column density, or density thickness) of a two-dimensional object is calculated as the mass per unit area. The SI derived unit is the kilogram per square metre (kg·m−2). A related area number density can be defined by replacing mass in by number of particles or other countable quantity. In the paper and fabric industries, it is called grammage and is expressed in grams per square meter (g/m2); for paper in particular, it may be expressed as pounds per ream of standard sizes ("basis ream").

Electron

The electron (_Electron or _beta-) is a subatomic particle with a negative one elementary electric charge. Electrons belong to the first generation of the lepton particle family, and are generally thought to be elementary particles because they have no known components or substructure. The electron's mass is approximately 1/1836 that of the proton. Quantum mechanical properties of the electron include an intrinsic angular momentum (spin) of a half-integer value, expressed in units of the reduced Planck constant, ħ.

Robust regression

In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise (i.e. are not robust to assumption violations).

Explosive

An explosive (or explosive material) is a reactive substance that contains a great amount of potential energy that can produce an explosion if released suddenly, usually accompanied by the production of light, heat, sound, and pressure. An explosive charge is a measured quantity of explosive material, which may either be composed solely of one ingredient or be a mixture containing at least two substances.

Multinomial logistic regression

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.).

Density of states

In solid-state physics and condensed matter physics, the density of states (DOS) of a system describes the number of modes per unit frequency range. The density of states is defined as , where is the number of states in the system of volume whose energies lie in the range from to . It is mathematically represented as a distribution by a probability density function, and it is generally an average over the space and time domains of the various states occupied by the system.

Errors-in-variables models

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses. In the case when some regressors have been measured with errors, estimation based on the standard assumption leads to inconsistent estimates, meaning that the parameter estimates do not tend to the true values even in very large samples.