of a
human, showing an overview of the
human genome on
G banding, which is a method that includes
Giemsa staining, wherein the lighter staining regions are generally more
transcriptionally active, whereas darker regions are more inactive. The added complexity of generating a eukaryotic cell carries with it an increase in the complexity of transcriptional regulation. Eukaryotes have three RNA polymerases, known as
Pol I,
Pol II, and
Pol III. Each polymerase has specific targets and activities, and is regulated by independent mechanisms. There are a number of additional mechanisms through which polymerase activity can be controlled. These mechanisms can be generally grouped into three main areas: • Control over polymerase access to the gene. This is perhaps the broadest of the three control mechanisms. This includes the functions of
histone remodeling enzymes, transcription factors, enhancers and repressors, and many other complexes • Productive elongation of the RNA transcript. Once polymerase is bound to a promoter, it requires another set of factors to allow it to escape the promoter complex and begin successfully transcribing RNA. • Termination of the polymerase. A number of factors which have been found to control how and when termination occurs, which will dictate the fate of the RNA transcript. All three of these systems work in concert to integrate signals from the cell and change the transcriptional program accordingly. While in prokaryotic systems the basal transcription state can be thought of as nonrestrictive (that is, "on" in the absence of modifying factors), eukaryotes have a restrictive basal state which requires the recruitment of other factors in order to generate RNA transcripts. This difference is largely due to the compaction of the eukaryotic genome by winding DNA around histones to form higher order structures. This compaction makes the gene promoter inaccessible without the assistance of other factors in the nucleus, and thus chromatin structure is a common site of regulation. Similar to the sigma factors in prokaryotes, the general transcription factors (GTFs) are a set of factors in eukaryotes that are required for all transcription events. These factors are responsible for stabilizing binding interactions and opening the DNA helix to allow the RNA polymerase to access the template, but generally lack specificity for different promoter sites. A large part of gene regulation occurs through transcription factors that either recruit or inhibit the binding of the general transcription machinery and/or the polymerase. This can be accomplished through close interactions with core promoter elements, or through the long distance
enhancer elements. Once a polymerase is successfully bound to a DNA template, it often requires the assistance of other proteins in order to leave the stable promoter complex and begin elongating the nascent RNA strand. This process is called promoter escape, and is another step at which regulatory elements can act to accelerate or slow the transcription process. Similarly, protein and
nucleic acid factors can associate with the elongation complex and modulate the rate at which the polymerase moves along the DNA template.
At the level of chromatin state In eukaryotes, genomic DNA is highly compacted in order to be able to fit it into the nucleus. This is accomplished by winding the DNA around protein octamers called
histones, which has consequences for the physical accessibility of parts of the genome at any given time. Significant portions are silenced through histone modifications, and thus are inaccessible to the polymerases or their cofactors. The highest level of transcription regulation occurs through the rearrangement of histones in order to expose or sequester genes, because these processes have the ability to render entire regions of a chromosome inaccessible such as what occurs in imprinting. Histone rearrangement is facilitated by
post-translational modifications to the tails of the core histones. A wide variety of modifications can be made by enzymes such as the
histone acetyltransferases (HATs),
histone methyltransferases (HMTs), and
histone deacetylases (HDACs), among others. These enzymes can add or remove covalent modifications such as methyl groups, acetyl groups, phosphates, and ubiquitin. Histone modifications serve to recruit other proteins which can either increase the compaction of the chromatin and sequester promoter elements, or to increase the spacing between histones and allow the association of transcription factors or polymerase on open DNA. For example, H3K27 trimethylation by the
polycomb complex PRC2 causes chromosomal compaction and
gene silencing. These histone modifications may be created by the cell, or inherited in an
epigenetic fashion from a parent.
At the level of cytosine methylation group to the DNA that happens at
cytosine. The image shows a cytosine single ring base and a methyl group added on to the 5 carbon. In mammals, DNA methylation occurs almost exclusively at a cytosine that is followed by a
guanine. Transcription regulation at about 60% of
promoters is controlled by methylation of cytosines within CpG dinucleotides (where 5' cytosine is followed by 3' guanine or
CpG sites).
5-methylcytosine (5-mC) is a
methylated form of the
DNA base
cytosine (see Figure). 5-mC is an
epigenetic marker found predominantly within CpG sites. About 28 million CpG dinucleotides occur in the human genome. In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-methylCpG or 5-mCpG). Methylated cytosines within 5'cytosine-guanine 3' sequences often occur in groups, called
CpG islands. About 60% of
promoter sequences have a CpG island while only about 6% of
enhancer sequences have a CpG island. CpG islands constitute regulatory sequences, since if CpG islands are methylated in the promoter of a gene this can reduce or silence gene transcription. DNA methylation regulates gene transcription through interaction with
methyl binding domain (MBD) proteins, such as
MeCP2,
MBD1 and
MBD2. These MBD proteins bind most strongly to highly methylated
CpG islands. These MBD proteins have both a methyl-CpG-binding domain as well as a transcription repression domain. About 94% of transcription factor binding sites (TFBSs) that are associated with signal-responsive genes occur in enhancers while only about 6% of such TFBSs occur in promoters. There are about 12,000 binding sites for EGR1 in the mammalian genome and about half of EGR1 binding sites are located in promoters and half in enhancers. Expression of EGR1 transcription factor proteins, in various types of cells, can be stimulated by growth factors, neurotransmitters, hormones, stress and injury. The splice isoform DNMT3A2 behaves like the product of a classical immediate-early gene and, for instance, it is robustly and transiently produced after neuronal activation. Where the DNA methyltransferase isoform DNMT3A2 binds and adds methyl groups to cytosines appears to be determined by histone post translational modifications. On the other hand, neural activation causes degradation of DNMT3A1 accompanied by reduced methylation of at least one evaluated targeted promoter.
Through transcription factors and enhancers Transcription factors Transcription factors are proteins that bind to specific DNA sequences in order to regulate the expression of a given gene. There are approximately 1,400 transcription factors in the human genome and they constitute about 6% of all human protein coding genes. In addition, often they are at the end of a
signal transduction pathway that functions to change something about the factor, like its subcellular localization or its activity. Post-translational modifications to transcription factors located in the
cytosol can cause them to translocate to the
nucleus where they can interact with their corresponding enhancers. Other transcription factors are already in the nucleus, and are modified to enable the interaction with partner transcription factors. Some post-translational modifications known to regulate the functional state of transcription factors are
phosphorylation,
acetylation,
SUMOylation and
ubiquitylation. Transcription factors can be divided in two main categories:
activators and
repressors. While activators can interact directly or indirectly with the core machinery of transcription through enhancer binding, repressors predominantly recruit co-repressor complexes leading to transcriptional repression by chromatin condensation of enhancer regions. It may also happen that a repressor may function by allosteric competition against a determined activator to repress gene expression: overlapping DNA-binding motifs for both activators and repressors induce a physical competition to occupy the site of binding. If the repressor has a higher affinity for its motif than the activator, transcription would be effectively blocked in the presence of the repressor. Tight regulatory control is achieved by the highly dynamic nature of transcription factors. Again, many different mechanisms exist to control whether a transcription factor is active. These mechanisms include control over protein localization or control over whether the protein can bind DNA. An example of this is the protein
HSF1, which remains bound to
Hsp70 in the cytosol and is only translocated into the nucleus upon cellular stress such as heat shock. Thus the genes under the control of this transcription factor will remain untranscribed unless the cell is subjected to stress.
Enhancers Enhancers or
cis-regulatory modules/elements (CRM/CRE) are
non-coding DNA sequences containing multiple activator and repressor binding sites. Enhancers range from 200 bp to 1 kb in length and can be either proximal, 5' upstream to the promoter or within the first intron of the regulated gene, or distal, in introns of neighboring genes or intergenic regions far away from the locus. Through DNA looping, active enhancers contact the promoter dependently of the core DNA binding motif promoter specificity. Promoter-enhancer dichotomy provides the basis for the functional interaction between transcription factors and transcriptional core machinery to trigger RNA Pol II escape from the promoter. Whereas one could think that there is a 1:1 enhancer-promoter ratio, studies of the human genome predict that an active promoter interacts with 4 to 5 enhancers. Similarly, enhancers can regulate more than one gene without linkage restriction and are said to "skip" neighboring genes to regulate more distant ones. Even though infrequent, transcriptional regulation can involve elements located in a chromosome different from one where the promoter resides. Proximal enhancers or promoters of neighboring genes can serve as platforms to recruit more distal elements.
Enhancer activation and implementation DNA regulatory sequence of its target
gene by formation of a chromosome loop. This can initiate
messenger RNA (mRNA) synthesis by
RNA polymerase II (RNAP II) bound to the promoter at the
transcription start site of the gene. The loop is stabilized by one architectural protein anchored to the enhancer and one anchored to the promoter and these proteins are joined to form a dimer (red zigzags). Specific regulatory
transcription factors bind to DNA sequence motifs on the enhancer. General transcription factors bind to the promoter. When a transcription factor is activated by a signal (here indicated as
phosphorylation shown by a small red star on a transcription factor on the enhancer) the enhancer is activated and can now activate its target promoter. The active enhancer is transcribed on each strand of DNA in opposite directions by bound RNAP IIs. Mediator (a complex consisting of about 26 proteins in an interacting structure) communicates regulatory signals from the enhancer DNA-bound transcription factors to the promoter. Up-regulated expression of genes in mammals can be initiated when signals are transmitted to the promoters associated with the genes.
Cis-regulatory DNA sequences that are located in DNA regions distant from the promoters of genes can have very large effects on gene expression, with some genes undergoing up to 100-fold increased expression due to such a cis-regulatory sequence. These cis-regulatory sequences include
enhancers,
silencers,
insulators and tethering elements. Among this constellation of sequences, enhancers and their associated
transcription factor proteins have a leading role in the regulation of gene expression.
Enhancers are sequences of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene expression programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to promoters. Several cell function specific transcription factor proteins (in 2018 Lambert et al. indicated there were about 1,600 transcription factors in a human cell) generally bind to specific motifs on an enhancer and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern the level of transcription of the target gene.
Mediator (coactivator) (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (RNAP II) enzyme bound to the promoter. Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two eRNAs as illustrated in the Figure. An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of a transcription factor bound to an enhancer in the illustration). An activated enhancer begins transcription of its RNA before activating a promoter to initiate transcription of messenger RNA from its target gene. Typical enhancers are often of the size 151–240 base pairs.
Super-enhancers While enhancers are needed for transcription of a gene above a low level, clusters of enhancers, known as
super-enhancers, can cause transcription of a target gene at a very high level. Super-enhancers are a group of typical enhancers, all located within a region of 10,000 to 60,000 nucleotides. The typical enhancers within a super-enhancer simultaneously loop over from a distance to strongly increase initiation and transcription of a gene. The illustration in this section shows a super-enhancer of about 12,000 base pairs in length with four typical enhancers within its length. The enhancers are each associated with the same gene, transmitting signals from the transcription factors on each enhancer through a mediator protein complex to the promoter of the gene. Each typical enhancer within the cluster interacts with its own mediator multi-protein complex. The protein
BRD4 complexes with each typical enhancer and stabilizes the super-enhancer structure. Thus, there are a large number of proteins present in close association on a super-enhancer, including BRD4 proteins, transcription factors, 26 mediator proteins for each enhancer, etc.). Most of these proteins have a structured domain as well as a tail with an intrinsically disordered region. The intrinsically disordered regions of these proteins interact with each other and usually form a water-excluding gel (phase-separated condensate) around the super-enhancer. In the case of the mouse Wap super-enhancer, the three typical enhancers, acting together, increase transcription of the
Wap gene by 1000-fold. In many types of cells, there are usually thousands of active typical enhancers and a few hundred super-enhancers. Super-enhancers (SEs) usually drive 2% to 4% of the actively transcribed regions of the genome. For instance, immune-system non-stimulated B cells have 140 super-enhancers (SEs) and 4,290 typical enhancers (TEs) (3.2% SEs). Similarly, in mouse embryonic stem cells, there are 231 SEs compared to 8,794 TEs (2.6% SEs). In neural stem cells there are 445 SEs and 9,436 TEs (4.7% SEs). While super-enhancers are only active at 2-4% of actively transcribed sites in a cell, they strongly recruit transcription machinery. The super-enhancers in a cell generally utilize 12% to 36% of the RNA polymerases, mediator proteins, BRD4 proteins, and other transcription machinery of the cell.
Regulatory landscape Transcriptional initiation, termination and regulation are mediated by "DNA looping" which brings together promoters, enhancers, transcription factors and RNA processing factors to accurately regulate gene expression. Chromosome conformation capture (3C) and more recently Hi-C techniques provided evidence that active chromatin regions are "compacted" in nuclear domains or bodies where transcriptional regulation is enhanced. The configuration of the genome is essential for enhancer-promoter proximity. Cell-fate decisions are mediated upon highly dynamic genomic reorganizations at interphase to modularly switch on or off entire gene regulatory networks through short to long range chromatin rearrangements. Related studies demonstrate that metazoan genomes are partitioned in structural and functional units around a megabase long called
Topological association domains (TADs) containing dozens of genes regulated by hundreds of enhancers distributed within large genomic regions containing only non-coding sequences. The function of TADs is to regroup enhancers and promoters interacting together within a single large functional domain instead of having them spread in different TADs. However, studies of mouse development point out that two adjacent TADs may regulate the same gene cluster. The most relevant study on limb evolution shows that the TAD at the 5' of the HoxD gene cluster in tetrapod genomes drives its expression in the distal limb bud embryos, giving rise to the hand, while the one located at 3' side does it in the proximal limb bud, giving rise to the arm. Still, it is not known whether TADs are an adaptive strategy to enhance regulatory interactions or an effect of the constrains on these same interactions. TAD boundaries are often composed by housekeeping genes, tRNAs, other highly expressed sequences and Short Interspersed Elements (SINE). While these genes may take advantage of their border position to be ubiquitously expressed, they are not directly linked with TAD edge formation. The specific molecules identified at boundaries of TADs are called insulators or architectural proteins because they not only block enhancer leaky expression but also ensure an accurate compartmentalization of cis-regulatory inputs to the targeted promoter. These
insulators are DNA-binding proteins like CTCF and TFIIIC that help recruiting structural partners such as cohesins and condensins. The localization and binding of architectural proteins to their corresponding binding sites is regulated by post-translational modifications. DNA binding motifs recognized by architectural proteins are either of high occupancy and at around a megabase of each other or of low occupancy and inside TADs. High occupancy sites are usually conserved and static while intra-TADs sites are dynamic according to the state of the cell therefore TADs themselves are compartmentalized in subdomains that can be called subTADs from few kb up to a TAD long (19). When architectural binding sites are at less than 100 kb from each other, Mediator proteins are the architectural proteins cooperate with cohesin. For subTADs larger than 100 kb and TAD boundaries, CTCF is the typical insulator found to interact with cohesion.
Of the pre-initiation complex and promoter escape In eukaryotes,
ribosomal rRNA and the
tRNAs involved in translation are controlled by
RNA polymerase I (Pol I) and
RNA polymerase III (Pol III) .
RNA Polymerase II (Pol II) is responsible for the production of
messenger RNA (mRNA) within the cell. Particularly for Pol II, much of the regulatory checkpoints in the transcription process occur in the assembly and escape of the
pre-initiation complex. A gene-specific combination of transcription factors will recruit
TFIID and/or
TFIIA to the core promoter, followed by the association of
TFIIB, creating a stable complex onto which the rest of the
General Transcription Factors (GTFs) can assemble. This complex is relatively stable, and can undergo multiple rounds of transcription initiation. After the binding of TFIIB and TFIID, Pol II the rest of the GTFs can assemble. This assembly is marked by the post-translational modification (typically phosphorylation) of the C-terminal domain (CTD) of Pol II through a number of kinases. The CTD is a large, unstructured domain extending from the
RbpI subunit of Pol II, and consists of many repeats of the heptad sequence YSPTSPS.
TFIIH, the helicase that remains associated with Pol II throughout transcription, also contains a subunit with kinase activity which will phosphorylate the serines 5 in the heptad sequence. Similarly, both
CDK8 (a subunit of the massive multiprotein Mediator complex) and
CDK9 (a subunit of the
p-TEFb elongation factor), have kinase activity towards other residues on the CTD. These phosphorylation events promote the transcription process and serve as sites of recruitment for mRNA processing machinery. All three of these kinases respond to upstream signals, and failure to phosphorylate the CTD can lead to a stalled polymerase at the promoter. ==In cancer==