Protein structureProtein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers - specifically polypeptides - formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond.
Supervised learningSupervised learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as human-labeled supervisory signal) train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias).
Feature learningIn machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process.
Unsupervised learningUnsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data. Neural network tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups.
Self-supervised learningSelf-supervised learning (SSL) is a paradigm in machine learning for processing data of lower quality, rather than improving ultimate outcomes. Self-supervised learning more closely imitates the way humans learn to classify objects. The typical SSL method is based on an artificial neural network or other model such as a decision list. The model learns in two steps. First, the task is solved based on an auxiliary or pretext classification task using pseudo-labels which help to initialize the model parameters.
ProteinProteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.
Protein structure predictionProtein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
ZeoliteZeolites are microporous, crystalline aluminosilicate materials commonly used as commercial adsorbents and catalysts. They mainly consist of silicon, aluminium, oxygen, and have the general formula M1/nn+(AlO2)-(SiO2)x・yH2O where M1/nn+ is either a metal ion or H+. These positive ions can be exchanged for others in a contacting electrolyte solution. H+ exchanged zeolites are particularly useful as solid acid catalysts.
Structural alignmentStructural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions.
Machine learningMachine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.
Biomolecular structureBiomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary.
Protein tertiary structureProtein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure.
Beta sheetThe beta sheet, (β-sheet) (also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.
Protein secondary structureProtein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure. Secondary structure is formally defined by the pattern of hydrogen bonds between the amino hydrogen and carboxyl oxygen atoms in the peptide backbone.
Intrinsically disordered proteinsIn molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.
Alpha helixAn alpha helix (or α-helix) is a sequence of amino acids in a protein that are twisted into a coil (a helix). The alpha helix is the most common structural arrangement in the secondary structure of proteins. It is also the most extreme type of local structure, and it is the local structure that is most easily predicted from a sequence of amino acids. The alpha helix has a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid that is four residues earlier in the protein sequence.
Structural biologyStructural biology is a field that is many centuries old which, as defined by the Journal of Structural Biology, deals with structural analysis of living material (formed, composed of, and/or maintained and refined by living cells) at every level of organization. Early structural biologists throughout the 19th and early 20th centuries were primarily only able to study structures to the limit of the naked eye's visual acuity and through magnifying glasses and light microscopes.
X-ray crystallographyX-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal. From this electron density, the mean positions of the atoms in the crystal can be determined, as well as their chemical bonds, their crystallographic disorder, and various other information.
Ramachandran plotIn biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a [φ,ψ] plot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The figure on the left illustrates the definition of the φ and ψ backbone dihedral angles (called φ and φ' by Ramachandran).
Protein foldingProtein folding is the physical process where a protein chain is translated into its native three-dimensional structure, typically a "folded" conformation, by which the protein becomes biologically functional. Via an expeditious and reproducible process, a polypeptide folds into its characteristic three-dimensional structure from a random coil. Each protein exists first as an unfolded polypeptide or random coil after being translated from a sequence of mRNA into a linear chain of amino acids.