Protein domainIn molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions.
Sequence logoIn bioinformatics, a sequence logo is a graphical representation of the sequence conservation of nucleotides (in a strand of DNA/RNA) or amino acids (in protein sequences). A sequence logo is created from a collection of aligned sequences and depicts the consensus sequence and diversity of the sequences. Sequence logos are frequently used to depict sequence characteristics such as protein-binding sites in DNA or functional units in proteins. A sequence logo consists of a stack of letters at each position.
Sequence alignmentIn bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns.
Codon usage biasCodon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons). There are 64 different codons (61 codons encoding for amino acids and 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon.
HomeoboxA homeobox is a DNA sequence, around 180 base pairs long, that regulates large-scale anatomical features in the early stages of embryonic development. Mutations in a homeobox may change large-scale anatomical features of the full-grown organism. Homeoboxes are found within genes that are involved in the regulation of patterns of anatomical development (morphogenesis) in animals, fungi, plants, and numerous single cell eukaryotes.
Hox geneHox genes, a subset of homeobox genes, are a group of related genes that specify regions of the body plan of an embryo along the head-tail axis of animals. Hox proteins encode and specify the characteristics of 'position', ensuring that the correct structures form in the correct places of the body. For example, Hox genes in insects specify which appendages form on a segment (for example, legs, antennae, and wings in fruit flies), and Hox genes in vertebrates specify the types and shape of vertebrae that will form.
Computational phylogeneticsComputational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For example, these techniques have been used to explore the family tree of hominid species and the relationships between specific genes shared by many types of organisms.
Multiple sequence alignmentMultiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins.
Protein familyA protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy. Proteins in a family descend from a common ancestor and typically have similar three-dimensional structures, functions, and significant sequence similarity.
Sequence homologySequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal (or lateral) gene transfer event (xenologs). Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity.
Protein superfamilyA protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain several protein families which show sequence similarity within each family.
BacteriaBacteria (bækˈtɪəriə; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among the first life forms to appear on Earth, and are present in most of its habitats. Bacteria inhabit soil, water, acidic hot springs, radioactive waste, and the deep biosphere of Earth's crust. Bacteria play a vital role in many stages of the nutrient cycle by recycling nutrients and the fixation of nitrogen from the atmosphere.
Sequence motifIn biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue. When a sequence motif appears in the exon of a gene, it may encode the "structural motif" of a protein; that is a stereotypical element of the overall structure of the protein.
BioinformaticsBioinformatics (ˌbaɪ.oʊˌɪnfɚˈmætɪks) is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
Molecular evolutionMolecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.
Protein structure predictionProtein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
Evolutionary developmental biologyEvolutionary developmental biology (informally, evo-devo) is a field of biological research that compares the developmental processes of different organisms to infer how developmental processes evolved. The field grew from 19th-century beginnings, where embryology faced a mystery: zoologists did not know how embryonic development was controlled at the molecular level. Charles Darwin noted that having similar embryos implied common ancestry, but little progress was made until the 1970s.
GenomicsGenomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism.
AnimalAnimals are multicellular, eukaryotic organisms in the biological kingdom Animalia. With few exceptions, animals consume organic material, breathe oxygen, have myocytes and are able to move, can reproduce sexually, and grow from a hollow sphere of cells, the blastula, during embryonic development. As of 2022, 2.16 million living animal species have been described—of which around 1.05 million are insects, over 85,000 are molluscs, and around 65,000 are vertebrates—but it has been estimated there are around 7.
InsectInsects (from Latin insectum) are pancrustacean hexapod invertebrates of the class Insecta. They are the largest group within the arthropod phylum. Insects have a chitinous exoskeleton, a three-part body (head, thorax and abdomen), three pairs of jointed legs, compound eyes and one pair of antennae. Their blood is not totally contained in vessels; some circulates in an open cavity known as the haemocoel. Insects are the most diverse group of animals; they include more than a million described species and represent more than half of all known living organisms.