double helix. The
sugar-
phosphate backbone chains run in opposite directions with the
bases pointing inward,
base-pairing
A to
T and
C to
G with
hydrogen bonds. |alt=DNA chemical structure diagram showing how the double helix consists of two chains of sugar-phosphate backbone with bases pointing inward and specifically base pairing A to T and C to G with hydrogen bonds.
DNA The vast majority of organisms encode their genes in long strands of
DNA (deoxyribonucleic acid). DNA consists of a
chain made from four types of
nucleotide subunits, each composed of: a five-carbon sugar (
2-deoxyribose), a
phosphate group, and one of the four
bases adenine,
cytosine,
guanine, and
thymine. Two chains of DNA twist around each other to form a DNA
double helix with the phosphate–sugar backbone spiralling around the outside, and the bases pointing inward with adenine
base pairing to thymine and guanine to cytosine. The specificity of base pairing occurs because adenine and thymine align to form two
hydrogen bonds, whereas cytosine and guanine form three hydrogen bonds. The two strands in a double helix must, therefore, be
complementary, with their sequence of bases matching such that the adenines of one strand are paired with the thymines of the other strand, and so on. The
expression of genes encoded in DNA begins by
transcribing the gene into
RNA, a second type of nucleic acid that is very similar to DNA, but whose monomers contain the sugar
ribose rather than
deoxyribose. RNA also contains the base
uracil in place of
thymine. RNA molecules are less stable than DNA and are typically single-stranded. Genes that encode proteins are composed of a series of three-
nucleotide sequences called
codons, which serve as the "words" in the genetic "language". The
genetic code specifies the correspondence during
protein translation between codons and
amino acids. The genetic code is nearly the same for all known organisms. of a human, with annotated
bands and sub-bands. It shows dark and white regions on
G banding. It shows 22
homologous chromosomes, both the male (XY) and female (XX) versions of the
sex chromosome (bottom right), as well as the
mitochondrial genome (at bottom left). The total complement of genes in an organism or cell is known as its
genome, which may be stored on one or more
chromosomes. A chromosome consists of a single, very long DNA helix on which thousands of genes are encoded. The centromere is required for binding
spindle fibres to separate sister chromatids into daughter cells during
cell division. Whereas the chromosomes of prokaryotes are relatively gene-dense, those of eukaryotes often contain regions of DNA that serve no obvious function. Simple single-celled eukaryotes have relatively small amounts of such DNA, whereas the genomes of complex
multicellular organisms, including humans, contain an absolute majority of DNA without an identified function. This DNA has often been referred to as "
junk DNA". However, more recent analyses suggest that, although protein-coding DNA makes up barely 2% of the
human genome, about 80% of the bases in the genome may be expressed, so the term "junk DNA" may be a misnomer. ==Structure and function==
Structure The
structure of a protein-coding gene consists of many elements of which the actual
protein coding sequence is often only a small part. These include introns and untranslated regions of the mature mRNA. Noncoding genes can also contain introns that are removed during processing to produce the mature functional RNA. All genes are associated with
regulatory sequences that are required for their expression. First, genes require a
promoter sequence. The promoter is recognized and bound by
transcription factors that recruit and help
RNA polymerase bind to the region to initiate transcription. Highly transcribed genes have "strong" promoter sequences that form strong associations with transcription factors, thereby initiating transcription at a high rate. Others genes have "weak" promoters that form weak associations with transcription factors and initiate transcription less frequently. For example,
enhancers increase transcription by binding an
activator protein which then helps to recruit the RNA polymerase to the promoter; conversely
silencers bind
repressor proteins and make the DNA less available for RNA polymerase. The mature messenger RNA produced from protein-coding genes contains
untranslated regions at both ends which contain binding sites for
ribosomes,
RNA-binding proteins,
miRNA, as well as
terminator, and
start and
stop codons. In addition, most eukaryotic
open reading frames contain untranslated
introns, which are removed and
exons, which are connected together in a process known as
RNA splicing. Finally, the ends of gene transcripts are defined by
cleavage and polyadenylation (CPA) sites, where newly produced pre-mRNA gets cleaved and a string of ~200 adenosine monophosphates is added at the 3' end. The
poly(A) tail protects mature mRNA from degradation and has other functions, affecting translation, localization, and transport of the transcript from the nucleus. Splicing, followed by CPA, generate the final
mature mRNA, which encodes the protein or RNA product. Many noncoding genes in eukaryotes have different transcription termination mechanisms and they do not have poly(A) tails. Many prokaryotic genes are organized into
operons, with multiple protein-coding sequences that are transcribed as a unit. The genes in an
operon are transcribed as a continuous
messenger RNA, referred to as a
polycistronic mRNA. The term
cistron in this context is equivalent to gene. The transcription of an operon's mRNA is often controlled by a
repressor that can occur in an active or inactive state depending on the presence of specific metabolites. When active, the repressor binds to a DNA sequence at the beginning of the operon, called the
operator region, and represses
transcription of the
operon; when the repressor is inactive transcription of the operon can occur (see e.g.
Lac operon). The products of operon genes typically have related functions and are involved in the same
regulatory network. and those introns can even have other genes
nested inside them. Associated enhancers may be many kilobase away, or even on entirely different chromosomes operating via physical contact between two chromosomes. A single gene can encode multiple different functional products by
alternative splicing, and conversely a gene may be split across chromosomes but those transcripts are concatenated back together into a functional sequence by
trans-splicing. It is also possible for
overlapping genes to share some of their DNA sequence, either on opposite strands or the same strand (in a different reading frame, or even the same reading frame). ==Gene expression==