Whole genome bisulfite sequencing

Whole genome bisulfite sequencing is a next-generation sequencing technology used to determine the DNA methylation status of single cytosines by treating the DNA with sodium bisulfite before high-throughput DNA sequencing. The DNA methylation status at various genes can reveal information regarding gene regulation and transcriptional activities. This technique was developed in 2009 along with reduced representation bisulfite sequencing after bisulfite sequencing became the gold standard for DNA methylation analysis.

History

Prior to the development of whole genome bisulfite sequencing, genome methylation analysis relied heavily on early non-specific and differential methods such as paper chromatography, high-performance liquid chromatography, and thin-layer chromatography to analyze methylation profiles. These methods were limited by the inability to amplify methylated DNA via polymerase chain reaction in vitro due to loss of methylation status. Since its development, many various protocols of whole genome bisulfite sequencing have been developed aiming to improve the efficiency and efficacy of its single-base mapping. As the costs of next-generation sequencing have decreased, whole genome bisulfite sequencing has become more widely used in clinical and experimental research. Currently, multiple public datasets of genomic data have been established, and this technique has recognized and tested approximately 95% of all cytosines in known genomes. == Method ==

Method

The following steps are derived from one potential workflow of conventional whole genome bisulfite sequencing: target DNA extraction, bisulfite conversion, library amplification, and bioinformatics analysis. However, various sequencing systems and analysis tools often adapt the technical parameters and order of the following step processes in order to optimize assay coverage and efficacy. After fragmentation, end repair enzymes and complementary adapters are then applied to the DNA in an end-prep polymerase chain reaction and adapter ligation reaction, respectively. Size selection occurs before the DNA is treated with sodium bisulfite. Conventional methods of eukaryotic DNA preparation during sequencing use a wide variety of DNA input amount, varying from as little as 10 ng for novel NGS library alternatives, such as the tagmentation approach, to as much as 500-1000 ng of DNA as sample input. Bisulfite conversion The adapter-ligated DNA sample is treated with sodium bisulfite, a chemical compound that converts unmethylated cytosines into uracil, at low pH and high temperatures. The chemical reaction is depicted in Figure 1, where sulfonation occurs at the carbon-6 position of cytosine to produce the intermediate cytosine sulfonate. This intermediate then undergoes irreversible hydrolytic deamination to create uracil sulfonate. Under alkaline conditions, uracil sulfonate desulfonates to generate uracil. This enables methylation detection by distinguishing the methylated cytosines (5-methylcytosine), which resist bisulfite treatment, from uracil. During amplification by polymerase chain reaction, the uracils are converted into thymines. Methylated cytosines are then recognized as cytosines. Their locations are then identified by comparison of the bisulfite-treated and original DNA sequence. Following bisulfite treatment, purification of the sample is required to remove unwanted products including bisulfite salts. Library amplification In order to amplify the epigenome library, bisulfite-treated DNA is primed to generate DNA with a specific tagging sequence. The 3' end of this sequence is then tagged again, creating DNA fragments with markers on either end. These fragments are amplified in a final polymerase chain reaction reaction, after which the library is prepped for sequencing-by-synthesis. This is demonstrated in Figure 2, in which high-throughput sequencing system developed by biotechnology company, Illumina, perform comprehensive assays based on sequencing-by-synthesis of base pairs. Bioinformatics analysis Following library amplification, a series of analyses can be performed on the expanded library to determine various methylation characteristics or map a genome-wide methylation profile. One such study aligns the new reads against the reference genome in order to directly compare locations of methylated cytosines and C-T mismatches. This requires software such as SOAP for side-by-side comparison of the genomes. Another potential sequencing analysis is methylated cytosine calling, which computes methylated cytosine ratios by mapping probabilities based on read quality. This helps determine methylated cytosine locations across the genome. Finally, global trends of methylome can be analyzed by calculating the distribution ratios of CG, CHGG, and CHH in methylated cytosines across the genome. These ratios can reflect features of whole genome methylation maps of certain species. == Applications ==

Applications

Due to its ability to screen methylation status at single-nucleotide resolution across a given genome, whole genome bisulfite sequencing has become increasingly promising in aiding fundamental epigenomics research, novel hypotheses on DNA methylation, and investigations of future large-scale epidemiological studies. Similarly, in plants, whole genome bisulfite sequencing was used to examine CG, CHH, and CHG methylation. It was then discovered that the plant germline conserved CG and CHG methylation while mammals lost CHH methylation in microspores and sperm cells. Other fields The unlimited resources provided by the approach of an entire genome have spurred many novel hypotheses on how whole genome bisulfite sequencing could be used in other various fields including disease diagnosis and forensic science. Studies have shown that whole genome bisulfite sequencing could detect abnormal methylation, or more specifically hyper-methylated suppressor genes, that are often seen in cancers including leukemia. Additionally, whole genome bisulfite sequencing has been applied to blood spot samples in forensic investigations to generate high-quality DNA methylation analyses on dried stains. == Limitations ==

Limitations

Technical concerns The widespread use of whole genome bisulfite sequencing has been primarily limited by its excessive cost, complex data output, and minimal required coverage. Due to the high amount and subsequent cost of DNA input, many studies using whole genome bisulfite sequencing assays occur with few or no biological replicates. For human samples, the US National Institutes of Health (NIH) Roadmap Epigenomics Project recommends a minimum of 30x coverage sequencing to achieve accurate results and approximately 80 million aligned, high quality reads. Consequently, large-scale studies for genomic-wide methylation profiling remain less cost-effective, often requiring multiple re-sequences of the entire genome multiple times for every experiment. Current studies are being conducted to reduce the conventional minimum coverage requirements while maintaining mapping accuracy. Finally, the technique is also limited the complexity of data and lack of sufficiently advanced analytical tools for downstream computational requirements. The current bioinformatics requirements for accurate data interpretation are ahead of existing technology, which stalls the accessibility of sequencing results to the general public. Biases and over-representation of DNA methylation Additionally, there are biological limitations concerning various steps in the standard protocol, particularly in the library preparation method. One of the biggest concerns is the potential of bias in the base composition of sequences and over-representation of methylated DNA data following bioinformatics analyses. Bias can arise from multiple unintended effects of bisulfite conversion including DNA degradation. This degradation can cause uneven sequence coverage by misrepresenting genomic sequences and overestimating 5-methylcytosine values. Additionally, the bisulfite conversion process only distinguishes unmethylated cytosine from 5-methylcytosine. As a result, specificity between 5-methylcytosine and 5-hydroxymethylcytosine is limited. Another potential source of bias rises from polymerase chain reaction amplification of the library, which affects sequences with highly skewed base compositions due to high rates of polymerase sequence errors in high AT-content, bisulfite-converted DNA. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com