Overlapping genes occur in all
domains of life, though with varying frequencies. They are especially common in
viral genomes.
Viruses from
tomato bushy stunt virus, a protein encoded by an overprinted gene. The protein specifically binds
siRNAs produced as part of the plant's
RNA silencing defense against viruses. Analysis of the fully sequenced 5386 nucleotide genome showed that the virus possessed extensive overlap between coding regions, revealing that some genes (like genes D and E) were translated from the same DNA sequences but in different reading frames. It was concluded that other undiscovered sites of
polypeptide synthesis could be hidden through the genome due to overlapping genes. An identified de novo gene of another overlapping
gene locus was shown to express a novel protein that induces lysis of E. coli by inhibiting biosynthesis of its cell wall[56], suggesting that de novo protein creation through the process of overprinting can be a significant factor in the evolution of
pathogenicity of viruses. Overlapping genes are particularly common in
viral genomes. However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes. Overprinting is a common source of
de novo genes in viruses. Segmented viruses in particular, or viruses with their genome split into separate pieces and packaged either all in the same
capsid or in separate capsids, are more likely to contain an overlapping sequence than non-segmented viruses. The lower mutation rate of DNA viruses facilitates greater genomic novelty and evolutionary exploration within a structurally constrained genome and may be the primary driver of the evolution of overlapping genes. Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not
essential to viral proliferation, but contribute to
pathogenicity. Overprinted proteins often have unusual
amino acid distributions and high levels of intrinsic
disorder. In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures; one example is the
RNA silencing suppressor p19 found in
Tombusviruses, which has both a novel
protein fold and a novel binding mode in recognizing
siRNAs.
Prokaryotes Estimates of gene overlap in
bacterial genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs. Most studies of overlap in bacterial genomes find evidence that overlap serves a function in
gene regulation, permitting the overlapped genes to be
transcriptionally and
translationally co-regulated. Long overlaps of greater than 60
base pairs are more common for convergent genes; however, putative long overlaps have very high rates of
misannotation. Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied
model organism Escherichia coli, only four gene pairs are well validated as having long, overprinted overlaps.
Eukaryotes Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging. Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common. Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated
de novo in a given eukaryotic lineage. == Function ==