Copy number variation

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs. Approximately two-thirds of the entire human genome may be composed of repeats and 4.8–9.5% of the human genome can be classified as copy number variations. In mammals, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype.

Types and chromosomal rearrangements

One of the most well known examples of a short copy number variation is the trinucleotide repeat of the CAG base pairs in the huntingtin gene responsible for the neurological disorder Huntington's disease. For this particular case, once the CAG trinucleotide repeats more than 36 times in a trinucleotide repeat expansion, Huntington's disease will likely develop in the individual and it will likely be inherited by his or her offspring. These types of short repeats are often thought to be due to errors in polymerase activity during replication including polymerase slippage, template switching, and fork switching which will be discussed in detail later. The short repeat size of these copy number variations lends itself to errors in the polymerase as these repeated regions are prone to misrecognition by the polymerase and replicated regions may be replicated again, leading to extra copies of the repeat. In addition, if these trinucleotide repeats are in the same reading frame in the coding portion of a gene, it may lead to a long chain of the same amino acid, possibly creating protein aggregates in the cell, Although the specific mechanism that allows the AMY1 gene to increase or decrease its copy number is still a topic of debate, some hypotheses suggest that the non-homologous end joining or the microhomology-mediated end joining is likely responsible for these whole gene repeats. In terms of the structural architecture of copy number variations, research has suggested and defined hotspot regions in the genome where copy number variations are four times more enriched. recent genome-wide studies have concluded otherwise. Namely, the subtelomeric regions and pericentromeric regions are where most chromosomal rearrangement hotspots are found, and there is no considerable increase in copy number variations in that region. Furthermore, these regions of chromosomal rearrangement hotspots do not have decreased gene numbers, again, implying that there is minimal spatial bias of the genomic location of copy number variations. ==Detection and identification==

Detection and identification

Copy number variation was initially thought to occupy an extremely small and negligible portion of the genome through cytogenetic observations. Copy number variations were generally associated only with small tandem repeats or specific genetic disorders, therefore, copy number variations were initially only examined in terms of specific loci. However, technological developments led to an increasing number of highly accurate ways of identifying and studying copy number variations. Copy number variations were originally studied by cytogenetic techniques, which are techniques that allow one to observe the physical structure of the chromosome. BACs can also detect copy number variations in rearrangement hotspots allowing for the detection of 119 novel copy number variations. Sequencing end reads would provide adequate information to align the reference sequence to the sequence of interest, and any misalignments are easily noticeable thus concluded to be copy number variations within that region of the clone. Relying on the fact that human recombination is relatively rare and that many recombination events occur in specific regions of the genome known as recombination hotspots, linkage disequilibrium can be used to identify copy number variations. ==Molecular mechanism==

Molecular mechanism

There are two main types of molecular mechanism for the formation of copy number variations: homologous based and non-homologous based. During meiotic recombination, homologous chromosomes pair up and form two ended double-stranded breaks leading to Holliday junctions. However, in the aberrant mechanism, during the formation of Holliday junctions, the double-stranded breaks are misaligned and the crossover lands in non-allelic positions on the same chromosome. When the Holliday junction is resolved, the unequal crossing over event allows transfer of genetic material between the two homologous chromosomes, and as a result, a portion of the DNA on both the homologues is repeated. When a double stranded break occurs in the genome unexpectedly the cell activates pathways that mediate the repair of the break. If for any reason, such as activation of ribosomal RNA, cohesin activity is affected then there may be local increase in double stranded break repair errors. These mechanisms are also involved in repairing double stranded breaks but require no homology or limited micro-homology. It is proposed that these sister chromatids will fuse together to form one dicentric chromosome, and then segregate into two different nuclei. During normal DNA replication, the polymerase on the lagging strand is required to unclamp and re-clamp the replication region continuously. ==Alpha-amylase gene==

Alpha-amylase gene

Amylase is an enzyme in saliva that is responsible for the breakdown of starch into monosaccharides, and one type of amylase is encoded by the alpha-amylase gene (AMY1). In the AMY1 genes of European Americans it is found that the concentration of salivary amylase is closely correlated to the copy number of the AMY1 gene. However, there is currently no evidence to support this theory and therefore this hypothesis remains conjecture. The recent origin of the multi-copy AMY1 gene implies that depending on the environment, the AMY1 gene copy number can increase and decrease very rapidly relative to genes that do not interact as directly with the environment. The AMY1 gene is an excellent example of how gene dosage affects the survival of an organism in a given environment. The multiple copies of the AMY1 gene give those who rely more heavily on high starch diets an evolutionary advantage, therefore the high gene copy number persists in the population. ==Brain cells==

Brain cells

Among the neurons in the human brain, somatically derived copy number variations are frequent. Copy number variations show wide variability (9 to 100% of brain neurons in different studies). Most alterations are between 2 and 10 Mb in size with deletions far outnumbering amplifications. Copy number variants in RCL1 gene are associated with a range of neuropsychiatric phenotypes in children. ==Gene families and natural selection==

Gene families and natural selection

Recently, there had been discussion connecting copy number variations to gene families. Gene families are defined as a set of related genes that serve similar functions but have minor temporal or spatial differences and these genes likely derived from one ancestral gene. These globin genes in the globin family are all well conserved and only differ by a small portion of the gene, indicating that they were derived from a common ancestral gene, perhaps due to duplication of the initial globin gene. It was suggested that the gene dosage effect accompanying copy number variation may lead to detrimental effects if essential cellular functions are disrupted, therefore proteins involved in cellular pathways are subjected to strong purifying selection. It was explained that proteins in the periphery of the pathway interact with fewer proteins and so a change in protein dosage affected by a change in copy number may have a smaller effect on the overall outcome of the cellular pathway. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com