Hi-C (genomic analysis technique)

Hi-C is a high-throughput genomic and epigenomic technique to capture chromatin conformation (3C). In general, Hi-C is considered as a derivative of a series of chromosome conformation capture technologies, including but not limited to 3C, 4C, and 5C. Hi-C comprehensively detects genome-wide chromatin interactions in the cell nucleus by combining 3C and next-generation sequencing (NGS) approaches and has been considered as a qualitative leap in C-technology development and the beginning of 3D genomics.

History

At its inception, Hi-C was a low-resolution, high-noise technology that was only capable of describing chromatin interaction regions within a bin size of 1 million base pairs (Mb). and the datasets themselves were low in both output and reproducibility. Nevertheless, Hi-C data offered new insights for chromatin conformation as well as nuclear and genomic architectures, and these prospects motivated scientists to put efforts to modify the technique over the past decade. Between 2012 and 2015, several modifications to the Hi-C protocol have taken place, with 4-cutter digestion or adapted deeper sequencing depth to obtain higher resolution. The use of restriction endonucleases that cut more frequently, or DNaseI and Micrococcal nucleases also significantly increased the resolution of the method. More recently (2017), Belaghzal et al. described a Hi-C 2.0 protocol that was able to achieve kilobase (kb) resolution. While formaldehyde captures the amino and imino groups of both proteins and DNA, the NHS-esters in DSG react with primary amines on proteins and can capture amine-amine interactions. These updates to the base protocol allowed the scientists to look at more detailed conformational structures such as chromosomal compartment and topologically associating domains (TADs), as well as high-resolution conformational features such as DNA loops. To date, a variety of derivatives of Hi-C have already emerged, including in situ Hi-C, low Hi-C, SAFE Hi-C, and Micro-C, with distinctive features related to different aspects of standard Hi-C, but the basic principle has remained the same. == Traditional Hi-C ==

Traditional Hi-C

The outline of the classical Hi-C workflow is as follows: cells are cross-linked with formaldehyde; chromatin is digested with a restriction enzyme that generates a 5' overhang; the 5' overhang is filled with biotinylated bases and the resulting blunt-ended DNA is ligated. Formaldehyde cross-linking activity is used to remove nucleotides from the ends of such fragments. This step ensures that none of these unligated fragments are selected for library preparation. The reaction is stopped with EDTA and the DNA is purified once again using phenol-chloroform DNA extraction. The ideal size of DNA fragments for the sequencing library depends on the sequencing platform that will be used. DNA can first be sheared to fragments around 300–500 bp long using sonication. Fragments of this size are suitable for high-throughput sequencing. Following sonication, fragments can be size selected using AMPure XP beads from Beckman Coulter to obtain ligation products with a size distribution between 150 and 300 bp. This is the optimal fragment size window for HiSeq cluster formation. DNA shearing causes asymmetric DNA breaks and must be repaired before biotin pulldown and sequencing adaptor ligation. This is achieved by using a combination of enzymes that fill in 5' overhangs, and add 5' phosphate groups and adenylate to the 3' ends of fragments to allow for ligation of sequencing adaptors. Biotin pull-down Using an excess of streptavdin beads, such as the My-One C1 streptavidin bead solution from Dynabeads, biotinylated Hi-C ligation products can be pulled-down and enriched for. Ligation of the Illumina paired-end adapters is performed while the DNA fragments are bound to the streptavidin beads. Adsorption to the beads increases efficiency of the ligation of these blunt-ended DNA fragments to the adaptors, as it decreases their mobility. Library preparation and sequencing After the ligation of the adaptors is complete, PCR amplification of the library is performed. The PCR step can introduce high number of duplicates in a low complexity Hi-C ligation product sample as a result of over-amplification. This results in very few interactions being captured and oftentimes, this is because the input sample size had a low amount of cells. It is important to titrate the number of cycles required to get at least 50 ng of Hi-C library DNA for sequencing. Fewer the cycle number, the better so that there are no PCR artifacts (such as off-target amplicons, non-specificity, etc.). The ideal range of PCR cycles is 9–15 and it is more ideal to pool multiple PCR reactions to get enough DNA for sequencing, than to increase the number of cycles for one PCR reaction. The PCR products are purified again using AMPure beads to remove primer dimers and then quantified before being sequenced. Regions of chromatin that interact with each other are then identified by paired-end sequencing of the biotinylated, ligated products. Any platform that can allow for the ligated fragments to be sequenced across the NheI junction (Roche 454) or by paired-end or mate-paired reads (Illumina GA and HiSeq platforms) would be suitable for Hi-C. Before high-throughput sequencing, the quality of the library should be verified using Sanger sequencing, wherein the long sequencing read will read through the biotin junction. Thirty-six or 50 bp reads are sufficient to identify most chromatin interacting pairs using Illumina paired-end sequencing. Since the average size of fragments in the library is 250 bp, 50bp paired-end reads have been found to be optimum for Hi-C library sequencing. Quality control of Hi-C libraries There are several pressure points throughout the workflow of Hi-C sample preparation that are well documented and reported. DNA at various stages can be run on 0.8% agarose gels to assay the size distribution of fragments. This is particularly important after shearing of size selection steps. Degradation of DNA can also be monitored as smears appearing as a result under low molecular weight products on gels. Degradation can occur due to not adding sufficient protease inhibitors during lysis, endogenous nuclease activity or thermal degradation due to incorrect icing. 3C PCR reactions can be performed to test for the formation of proximity ligation products. == Variants ==

Variants

Standard Hi-C has a high input cell number cost, requires deep sequencing, generates low-resolution data, and suffers from formation of redundant molecules that contribute to low complexity libraries when cell numbers are low. The use of DNaseI has been shown to greatly improve efficiency and resolution of Hi-C. The protocol is similar to standard Hi-C in terms of the basic workflow outline but differs in other ways. Several techniques that have adapted the concept of in situ Hi-C exist, including Sis Hi-C, OCEAN-C and in situ capture Hi-C. This method makes use of minor changes, including volumes and concentrations used and the timing and order of certain experimental steps to allow for the generation of high-quality Hi-C libraries from cell numbers as low as 1000 cells. It was first developed for use in yeast and was shown to conserve the structural data obtained from a standard Hi-C but with greater signal-to-noise ratio.File:MicroC.png|thumb|269x269px|Figure 5. Micro-C is an adaptation of Hi-C that uses MNase to resolve fine-scale chromatin organisation. • Fragmentation and Ligation: Due to the inherent fragmentation of ancient DNA, PaleoHi-C utilizes optimized ligation protocols to capture chromatin interactions even in highly degraded samples. • Data Analysis: Advanced computational tools process the interaction data, reconstructing chromatin structures and identifying features like topologically associating domains (TADs) and chromatin compartments. Applications PaleoHi-C has opened new avenues in paleogenomics, including: • Genome Reconstruction: It has been used to map the three-dimensional genome architecture of extinct species, such as the 52,000-year-old woolly mammoth (Mammuthus primigenius), revealing similarities with modern relatives like the Asian elephant (Elephas maximus). • Epigenetic Insights: By identifying preserved chromatin interactions, PaleoHi-C provides a unique window into the regulation of genes in ancient organisms. Studies have demonstrated that chromatin organization, including Barr bodies representing inactive X chromosomes, can remain intact in ancient nuclei. • Evolutionary Studies: The technique aids in understanding how genome organization has evolved over time and across species. Significance The adaptation of Hi-C for ancient DNA has transformed the field of paleogenomics, allowing for detailed studies of extinct species at a molecular level. By preserving and analyzing chromatin interactions, PaleoHi-C sheds light on genome structure, evolution, and adaptation in ancient ecosystems. Limitations PaleoHi-C is constrained by the availability of well-preserved samples and the inherent challenges of working with highly degraded DNA. However, advances in sequencing technologies and computational methods continue to expand its potential applications. == Data analysis ==

Data analysis

The chimeric DNA ligation products generated by Hi-C represent pairwise chromatin interactions or physical 3D contacts within the nucleus, Then several different methods can be employed to analyze these maps to identify chromosomal structural patterns and their biological interpretations. Many of these data analysis approaches also apply to 3C-sequencing or other equivalent data. Read mapping Hi-C data produced by deep sequencing is in the form of a traditional FASTQ file, and the reads can be aligned to the genome of interest using sequence alignment software (e.g. Bowtie, bwa, etc.).) often support chimeric alignment and can be directly applied to long-read Hi-C data. Short-read Hi-C alignment is more challenging. Notably, Hi-C generates ligation junctions of varying sizes, but the exact position of the ligation site is not measured. HiC-Pro, HIPPIE, HiCUP, and TADbit, to map two portions of a paired end read separately, in the case that the two portions match distinct genomic positions, thus addressing the challenge where reads span the ligation junctions. and the 4D-Nucleosome Data Portal) often align short Hi-C reads with an alignment algorithm capable of chimeric alignment, such as bwa-mem, chromap and dragmap. This procedure calls alignment once and is simpler than iterative mapping. Fragment assignment and filtering The mapped reads are then each assigned a single genomic alignment location according to its 5' mapped position in the genome. After binning, Hi-C data will be stored in a symmetrical matrix format. QuASAR, on the other hand, offers a bit more quality assessment, and compares replicate scores of the samples (given that replicates are indeed included for the experimental purpose) to find the maximum usable resolution. Some publications also tried to score interaction frequencies at the single-fragment level, where a higher coverage can be achieved even with a lower number of reads. HiCPlus, a tool developed by Zhang et al. in 2018, is able to impute Hi-C matrices similar to the original ones using only 1/16 of the original reads. and attempts to balance the symmetrical matrix using the aforementioned assumption (by equalizing the sum of each and every row and column in the matrix). the Knight-Ruiz matrix-balancing approach, and eigenvector decomposition (ICE) normalization. exist to statistically characterize the properties of loci pairs separated by a given distance, but discrete binning and fitting continuous functions are two common ways to analyze the distance-dependent interaction frequencies between datapoints. HiTC R, Although they each has their own differences and optimizations made on the original 2009 approach, their base protocols still rely on principal component analysis. 4. Topologically associating domains (TADs) TADs are sub-Mb structures that may harbor gene-regulatory features, such as local promoter-enhancer interactions. Thus, TADs represent regulatory microenvironments and usually show up on a Hi-C map as blocks of highly self-interacting regions in which interaction frequencies within the region are significantly higher than interaction frequencies between two adjacent regions. Another approach is to calculate the average interaction frequencies crossing over each bin, again within some predetermined genomic range. The resulting value is referred to as the insulation score and can be thought of as the average of a square sliding along the diagonal of the matrix (Crane et al.). resolution specific domains can be identified and a consensus set of domains conserved across resolutions can be calculated, MrTADFinder, 3DNetMod, and Matryoshka, are also developed to achieve better computing performance on higher resolution datasets. 5. Point interactions Biologically, regulatory interactions usually occur at much smaller scale than TADs, and two genomic elements can activate/inhibit the expression of a gene within as small a distance as 1 kb. Therefore, point interactions are important in interpreting Hi-C maps, and are expected to appear as local enrichments in contact probability. However, current methodologies for the identification of point interactions are all implicit in nature, in that they do not instruct what a point interaction should look like. Instead, point mutations are identified as outliers with higher interaction frequencies than expected within the Hi-C matrix, given that the background model consists only of the strongest signals such as the distance-decay functions. The background model can be estimated and constructed using both local signal distributions and global approaches (i.e. chromosome-wide/genome-wide). Many of the aforementioned bioinformatics packages incorporate algorithms to identify point interactions. In short, the significance of individual pairwise interaction is calculated, and significantly high outliers are corrected for multiple testing before they are recognized as truly informative point interactions. It is helpful to compliment identified point interactions with additional evidence such as analysis of enrichment scores and biological replicates, to indicate that these interactions are indeed of biological significance. == Uses ==

Uses

Development 1. Cell division Hi-C can reveal chromatin conformation changes during cell division. In interphase, chromatins are generally loose and vivacious so that transcription regulation and other regulatory activities could take place. When entering mitosis and cell division, chromatins become compactly folded into dense cylindrical chromosomes. When mitotic division is completed and the cell re-enters the interphase, chromatin 3D structures are observed to be re-established, and transcription regulation is restored. 3. Growth and development Mammalian somatic growth and development starts with the fertilization of sperm and oocyte, followed by the zygote stage, the 2-cell, 4-cell, and the 8-cell stage, the blastocyst stage, and finally the embryo stage. Hi-C made it possible to explore the comprehensive genomic architecture during growth and development, as both sis-Hi-C and in situ Hi-C have reported that TADs and genomic A and B compartments are not obviously present and appear to be less well-structured in oocyte cells. along with the CTCF factor in the chromatin domain evolution. Other factors, however, have been revealed by Hi-C techniques to experience structural evolutions in 3D architecture. These include codon usage frequency similarity (CUFS), paralog gene co-regulation, and spatially co-evolving orthologous modules (SCOMs). For large-scale domain evolution, chromosomal translocations, syntenic regions, as well as genomic rearrangement regions were all relatively conserved. These findings imply that Hi-C technologies is capable of providing an alternative point of view in the eukaryotic tree of life. Fang et al. have also shown how there are T-ALL specific gain or loss of chromatin insulation, which alters the strength of TAD architecture of the genome, using in situ Hi-C. Low-C has been used to map the chromatin structure of primary B cells of a diffuse large B-cell lymphoma patient and was used to find high chromosome structural variation between the patient and healthy B-cells. Overall, the application of Hi-C and its variants in cancer research provides unique insight into the molecular underpinnings of the driving factors of cell abnormality. It can help explain biological phenomena (high MYC expression in T-ALL) and help aid drug development to target mechanisms unique to cancerous cells. == References ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com