Publication

Chemiscope: interactive structure-property explorer for materials and molecules

Related concepts (32)

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable (hard to control or deal with).

Chemical element

A chemical element is a chemical substance that cannot be broken down into other substances. The basic particle that constitutes a chemical element is the atom, and each chemical element is distinguished by the number of protons in the nuclei of its atoms, known as its atomic number. For example, oxygen has an atomic number of 8, meaning that each oxygen atom has 8 protons in its nucleus. This is in contrast to chemical compounds and mixtures, which contain atoms with more than one atomic number.

Unsupervised learning

Unsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data. Neural network tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups.

Nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.

Curse of dimensionality

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases.

Names for sets of chemical elements

There are currently 118 known chemical elements exhibiting many different physical and chemical properties. Amongst this diversity, scientists have found it useful to use names for various sets of elements, that illustrate similar properties, or their trends of properties. Many of these sets are formally recognized by the standards body IUPAC. The following collective names are recommended by IUPAC: Alkali metals – The metals of group 1: Li, Na, K, Rb, Cs, Fr. Alkaline earth metals – The metals of group 2: Be, Mg, Ca, Sr, Ba, Ra.

Feature learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process.

Chemical symbol

Chemical symbols are the abbreviations used in chemistry for chemical elements, functional groups and chemical compounds. Element symbols for chemical elements normally consist of one or two letters from the Latin alphabet and are written with the first letter capitalised. Earlier symbols for chemical elements stem from classical Latin and Greek vocabulary. For some elements, this is because the material was known in ancient times, while for others, the name is a more recent invention.

Molecular geometry

Molecular geometry is the three-dimensional arrangement of the atoms that constitute a molecule. It includes the general shape of the molecule as well as bond lengths, bond angles, torsional angles and any other geometrical parameters that determine the position of each atom. Molecular geometry influences several properties of a substance including its reactivity, polarity, phase of matter, color, magnetism and biological activity. The angles between bonds that an atom forms depend only weakly on the rest of molecule, i.

K-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership.

Artificial neural network

Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Multilinear subspace learning

Multilinear subspace learning is an approach for disentangling the causal factor of data formation and performing dimensionality reduction. The Dimensionality reduction can be performed on a data tensor that contains a collection of observations have been vectorized, or observations that are treated as matrices and concatenated into a data tensor. Here are some examples of data tensors whose observations are vectorized or whose observations are matrices concatenated into data tensor s (2D/3D), video sequences (3D/4D), and hyperspectral cubes (3D/4D).

Periodic table

The periodic table, also known as the periodic table of the elements, arranges the chemical elements into rows ("periods") and columns ("groups"). It is an organizing icon of chemistry and is widely used in physics and other sciences. It is a depiction of the periodic law, which says that when the elements are arranged in order of their atomic numbers an approximate recurrence of their properties is evident. The table is divided into four roughly rectangular areas called blocks.

Supervised learning

Supervised learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as human-labeled supervisory signal) train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias).

Intensive and extensive properties

Physical properties of materials and systems can often be categorized as being either intensive or extensive, according to how the property changes when the size (or extent) of the system changes. According to IUPAC, an intensive quantity is one whose magnitude is independent of the size of the system, whereas an extensive quantity is one whose magnitude is additive for subsystems. The terms "intensive and extensive quantities" were introduced into physics by German writer Georg Helm in 1898, and by American physicist and chemist Richard C.

Abundance of the chemical elements

The abundance of the chemical elements is a measure of the occurrence of the chemical elements relative to all other elements in a given environment. Abundance is measured in one of three ways: by mass fraction (in commercial contexts often called weight fraction), by mole fraction (fraction of atoms by numerical count, or sometimes fraction of molecules in gases), or by volume fraction. Volume fraction is a common abundance measure in mixed gases such as planetary atmospheres, and is similar in value to molecular mole fraction for gas mixtures at relatively low densities and pressures, and ideal gas mixtures.

Simplified molecular-input line-entry system

The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules. The original SMILES specification was initiated in the 1980s. It has since been modified and extended. In 2007, an open standard called OpenSMILES was developed in the open source chemistry community.

Chemical substance

A chemical substance is a form of matter having constant chemical composition and characteristic properties. Chemical substances can be simple substances (substances consisting of a single chemical element), chemical compounds, or alloys. Chemical substances that cannot be separated into their simpler constituent elements by physical means are said to be 'pure'; this notion intended to set them apart from mixtures.