DNA sequencingDNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. Knowledge of DNA sequences has become indispensable for basic biological research, DNA Genographic Projects and in numerous applied fields such as medical diagnosis, biotechnology, forensic biology, virology and biological systematics.
Sanger sequencingSanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses.
SequencingIn genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule. DNA sequencing DNA sequencing is the process of determining the nucleotide order of a given DNA fragment. So far, most DNA sequencing has been performed using the chain termination method developed by Frederick Sanger.
Exome sequencingExome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.
Massive parallel sequencingMassive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads (50 to 400 bases each) per instrument run.
Mycobacterium tuberculosisMycobacterium tuberculosis (M. tb), also known as Koch's bacillus, is a species of pathogenic bacteria in the family Mycobacteriaceae and the causative agent of tuberculosis. First discovered in 1882 by Robert Koch, M. tuberculosis has an unusual, waxy coating on its cell surface primarily due to the presence of mycolic acid. This coating makes the cells impervious to Gram staining, and as a result, M. tuberculosis can appear weakly Gram-positive. Acid-fast stains such as Ziehl–Neelsen, or fluorescent stains such as auramine are used instead to identify M.
Whole genome sequencingWhole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast. Whole genome sequencing has largely been used as a research tool, but was being introduced to clinics in 2014.
Transcription (biology)Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA). Other segments of DNA are copied into RNA molecules called non-coding RNAs (ncRNAs). mRNA comprises only 1–3% of total RNA samples. Less than 2% of the human genome can be transcribed into mRNA (Human genome#Coding vs. noncoding DNA), while at least 80% of mammalian genomic DNA can be actively transcribed (in one or more types of cells), with the majority of this 80% considered to be ncRNA.
Third-generation sequencingThird-generation sequencing (also known as long-read sequencing) is a class of DNA sequencing methods currently under active development. Third generation sequencing technologies have the capability to produce substantially longer reads than second generation sequencing, also known as next-generation sequencing. Such an advantage has critical implications for both genome science and the study of biology in general. However, third generation sequencing data have much higher error rates than previous technologies, which can complicate downstream genome assembly and analysis of the resulting data.
TuberculosisTuberculosis (TB) is an infectious disease usually caused by Mycobacterium tuberculosis (MTB) bacteria. Tuberculosis generally affects the lungs, but it can also affect other parts of the body. Most infections show no symptoms, in which case it is known as latent tuberculosis. Around 10% of latent infections progress to active disease which, if left untreated, kill about half of those affected. Typical symptoms of active TB are chronic cough with blood-containing mucus, fever, night sweats, and weight loss.
ChIP sequencingChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations. ChIP-seq is primarily used to determine how transcription factors and other chromatin-associated proteins influence phenotype-affecting mechanisms.
Shotgun sequencingIn genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The chain-termination method of DNA sequencing ("Sanger sequencing") can only be used for short DNA strands of 100 to 1000 base pairs. Due to this size limit, longer sequences are subdivided into smaller fragments that can be sequenced separately, and these sequences are assembled to give the overall sequence.
Transcription factorIn molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism.
Comparative genomicsComparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms.
Transcriptional regulationIn molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response.
Clinical metagenomic sequencingClinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material (DNA or RNA) in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens.
DNA microarrayA DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles (10−12 moles) of a specific DNA sequence, known as probes (or reporters or oligos). These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA (also called anti-sense RNA) sample (called target) under high-stringency conditions.
Mycobacterium bovisMycobacterium bovis is a slow-growing (16- to 20-hour generation time) aerobic bacterium and the causative agent of tuberculosis in cattle (known as bovine TB). It is related to Mycobacterium tuberculosis, the bacterium which causes tuberculosis in humans. M. bovis can jump the species barrier and cause tuberculosis-like infection in humans and other mammals. The bacteria are curved or straight rods. They sometimes form filaments, which fragment into bacilli or cocci once disturbed.
Genetic diversityGenetic diversity is the total number of genetic characteristics in the genetic makeup of a species, it ranges widely from the number of species to differences within species and can be attributed to the span of survival for a species. It is distinguished from genetic variability, which describes the tendency of genetic characteristics to vary. Genetic diversity serves as a way for populations to adapt to changing environments. With more variation, it is more likely that some individuals in a population will possess variations of alleles that are suited for the environment.
Functional genomicsFunctional genomics is a field of molecular biology that attempts to describe gene (and protein) functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects (such as genome sequencing projects and RNA sequencing). Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures.