Transcription is divided into
initiation,
promoter escape,
elongation, and
termination.
Setting up for transcription Enhancers, transcription factors, Mediator complex, and DNA loops in mammalian transcription Setting up for transcription in mammals is regulated by many
cis-regulatory elements, including
core promoter and promoter-proximal elements that are located near the
transcription start sites of genes. Core promoters combined with
general transcription factors are sufficient to direct transcription initiation, but generally have low basal activity. Other important cis-regulatory modules are localized in DNA regions that are distant from the transcription start sites. These include
enhancers,
silencers,
insulators and tethering elements. Among this constellation of elements, enhancers and their associated
transcription factors have a leading role in the initiation of gene transcription. An enhancer localized in a DNA region distant from the promoter of a gene can have a very large effect on gene transcription, with some genes undergoing up to 100-fold increased transcription due to an activated enhancer. Enhancers are regions of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene transcription programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. While there are hundreds of thousands of enhancer DNA regions, for a particular type of tissue only specific enhancers are brought into proximity with the promoters that they regulate. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to their target promoters. Several cell function specific transcription factors (there are about 1,600 transcription factors in a human cell) generally bind to specific motifs on an enhancer and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern level of transcription of the target gene.
Mediator (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (pol II) enzyme bound to the promoter. Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two
enhancer RNAs (eRNAs) as illustrated in the Figure. An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of transcription factor bound to enhancer in the illustration). An activated enhancer begins transcription of its RNA before activating transcription of messenger RNA from its target gene.
CpG island methylation and demethylation Transcription regulation at about 60% of promoters is also controlled by methylation of cytosines within CpG dinucleotides (where 5' cytosine is followed by 3' guanine or
CpG sites).
5-methylcytosine (5-mC) is a
methylated form of the
DNA base
cytosine (see Figure). 5-mC is an
epigenetic marker found predominantly within CpG sites. About 28 million CpG dinucleotides occur in the human genome. In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-methylCpG or 5-mCpG). However, unmethylated cytosines within 5'cytosine-guanine 3' sequences often occur in groups, called
CpG islands, at active promoters. About 60% of promoter sequences have a CpG island while only about 6% of enhancer sequences have a CpG island. CpG islands constitute regulatory sequences, since if CpG islands are methylated in the promoter of a gene this can reduce or silence gene transcription.
DNA methylation regulates gene transcription through interaction with methyl binding domain (MBD) proteins, such as MeCP2, MBD1 and MBD2. These
MBD proteins bind most strongly to highly methylated
CpG islands. These MBD proteins have both a methyl-CpG-binding domain as well as a transcription repression domain. About 94% of transcription factor binding sites (TFBSs) that are associated with signal-responsive genes occur in enhancers while only about 6% of such TFBSs occur in promoters. There are about 12,000 binding sites for EGR1 in the mammalian genome and about half of EGR1 binding sites are located in promoters and half in enhancers. Production of EGR1 transcription factor proteins, in various types of cells, can be stimulated by growth factors, neurotransmitters, hormones, stress and injury. The splice isoform DNMT3A2 behaves like the product of a classical immediate-early gene and, for instance, it is robustly and transiently produced after neuronal activation. Where the DNA methyltransferase isoform DNMT3A2 binds and adds methyl groups to cytosines appears to be determined by histone post translational modifications. On the other hand, neural activation causes degradation of DNMT3A1 accompanied by reduced methylation of at least one evaluated targeted promoter.
Initiation Transcription begins with the RNA polymerase and one or more
general transcription factors binding to a DNA
promoter sequence to form an RNA polymerase-promoter closed complex. In the closed complex, the promoter DNA is still fully double-stranded. In
archaea and
eukaryotes, RNA polymerase contains subunits
homologous to each of the five bacterial RNA polymerase subunits as well as unique ones. Instead of a single sigma factor, multiple general transcription factors are required to initiate transcription. Transcription initiation is regulated by additional proteins, known as
activators and
repressors, and, in some cases, associated
coactivators or
corepressors, which modulate formation and function of the transcription initiation complex. Abortive initiation continues to occur until an RNA product of a threshold length of approximately 10 nucleotides is synthesized, at which point promoter escape occurs and a transcription elongation complex is formed. Mechanistically, promoter escape occurs through
DNA scrunching, providing the energy needed to break interactions between RNA polymerase holoenzyme and the promoter. In bacteria, it was historically thought that the
sigma factor is definitely released after promoter clearance occurs. This theory had been known as the
obligate release model. However, later data showed that upon and following promoter clearance, the sigma factor is released according to a
stochastic model known as the
stochastic release model. In eukaryotes, at an RNA polymerase II-dependent promoter, upon promoter clearance, TFIIH phosphorylates serine 5 on the carboxy terminal domain of RNA polymerase II, leading to the recruitment of capping enzyme (CE). The exact mechanism of how CE induces promoter clearance in eukaryotes is not yet known.
Elongation One strand of the DNA, the
template strand (or noncoding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy (which elongates during the traversal). Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand and newly formed RNA can also be used as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that
thymines are replaced with
uracils, and the nucleotides are composed of a ribose (5-carbon) sugar whereas DNA has deoxyribose (one fewer oxygen atom) in its sugar-phosphate backbone). In eukaryotes, however,
nucleosomes act as major barriers to transcribing polymerases during transcription elongation. Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure. Double-strand breaks in actively transcribed regions of DNA are repaired by
homologous recombination during the S and G2 phases of the
cell cycle. Since transcription enhances the accessibility of DNA to exogenous chemicals and internal metabolites that can cause recombinogenic lesions, homologous recombination of a particular DNA sequence may be strongly stimulated by transcription.
Termination Bacteria use two different strategies for transcription termination – Rho-independent termination and Rho-dependent termination. In
Rho-independent transcription termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C-rich
hairpin loop followed by a run of Us. When the hairpin forms, the mechanical stress breaks the weak rU-dA bonds, now filling the DNA–RNA hybrid. This pulls the poly-U transcript out of the active site of the RNA polymerase, terminating transcription. In Rho-dependent termination,
Rho, a protein factor, destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex. Transcription termination in eukaryotes is less well understood than in bacteria, but involves cleavage of the new transcript followed by template-independent addition of adenines at its new 3' end, in a process called
polyadenylation. Beyond termination by a terminator sequences (which is a part of a
gene), transcription may also need to be terminated when it encounters conditions such as DNA damage or an active
replication fork. In bacteria, the
Mfd ATPase can remove a RNA polymerase stalled at a lesion by prying open its clamp. It also recruits
nucleotide excision repair machinery to repair the lesion. Mfd is proposed to also resolve conflicts between DNA replication and transcription. In eukayrotes, ATPase
TTF2 helps to suppress the action of RNAP I and II during
mitosis, preventing errors in chromosomal segregation. In archaea, the Eta ATPase is proposed to play a similar role.
Transcription increases susceptibility to DNA damage Genome damage occurs with a high frequency, estimated to range between tens and hundreds of thousands of DNA damages arising in each cell every day. The process of transcription is a major source of DNA damage, due to the formation of single-strand DNA intermediates that are vulnerable to damage. The regulation of transcription by processes using
base excision repair and/or
topoisomerases to cut and remodel the genome also increases the vulnerability of DNA to damage. == Role of RNA polymerase in post-transcriptional changes in RNA ==