Genome architecture mapping

In molecular biology, genome architecture mapping (GAM) is a cryosectioning method to map colocalized DNA regions in a ligation independent manner. It overcomes some limitations of Chromosome conformation capture (3C), as these methods have a reliance on digestion and ligation to capture interacting DNA segments. GAM is the first genome-wide method for capturing three-dimensional proximities between any number of genomic loci without ligation.

Cryosection and laser microdissection

Cryosections are produced according to the Tokuyasu method, involving stringent fixation to preserve nuclear and cellular architecture, cryoprotection with a sucrose-PBS solution, before freezing in liquid nitrogen. In Genome Architecture Mapping, sectioning is a necessary step for exploring the 3D topology of the genome, before Laser Microdissection. Then laser microdissection can isolate each nuclear profile, before DNA extraction and sequencing. ==Data analysis - bioinformatic tools==

Data analysis - bioinformatic tools

GAMtools GAMtools is a collection of software utilities for Genome Architecture Mapping data developed by Robert Beagrie. Bowtie2 is required before running GAMtools. The input required for this program is in Fastq format. This software has a variety of features and the exact commands to use will depend on what you want to do with it, however most features require generating segregation table, so for most users the first steps to take will be to download or create input data, and perform the sequence mapping. This will generate a segregation table, which can then be used to perform various other operations which are outlined below. For further information, view the GAMtools documentation. Mapping the sequencing data The GAMtools command process_nps can be used to perform the mapping. It maps the raw sequence data from the nuclear profiles. GAMtools also provides the option to perform quality control checks on the NPs. This option can be enabled by adding the flag -c/--do-qc to the previous command. When the quality control check is enabled, GAMtools will try to exclude poor quality nuclear profiles. The GAMtools command gamtools process_nps is used to map raw sequence data from nuclear profiles (NPs) and generate a segregation table. Quality control can be enabled with the -c or --do-qc flag to exclude low-quality NPs. The GAMtools command for this step is: gamtools process_nps --do-qc -g [ ...] Windows calling and segregation table After mapping, GAMtools counts the number of reads from each nuclear profile that overlap with genomic windows, using a default window size of 50 kb. This step is performed by the same process_nps command and results in the generation of a segregation table, which indicates the presence or absence of each window across all profiles. Producing proximity matrices The GAMtools command for this process is matrix. The input file is the segregation table that was calculated from the windows calling step. GAMtools calculates these matrices using the normalized linkage disequilibrium, which means that it looks at how many times each pair of windows are detected by the same NP, and then normalizes the results based on how many times each window was detected across all NPs. The figure below shows an example of a proximity matrix heatmap produced using GAMtools. The GAMtools command for this step is: gamtools matrix [OPTIONS] -s -r [ ...] Calculating chromatin compaction The GAMtools command compaction can be used to calculate an estimation of chromatin compaction. Compaction is a value assigned to a gene that represents how large the gene is. The level of compaction is inversely proportional to the locus volume. Genomic loci with a low volume are said to have a high level of compaction, and loci with a high volume have a low level of compaction. As shown in the figure, loci with a low compaction level are expected to be intersected more often by the cryosection slices. GAMtools uses this information to assign a compaction value to each locus based on its detection frequency across many nuclear profiles. The compaction rate of these loci is not static, and will continually change throughout the life of the cell. Genomic loci are thought to be de-compacted when that gene is active. This allows a researcher to make assumptions about which genes are currently active in a cell, using the results of the GAMtools data. A locus with low compaction is also thought to be related to transcriptional activity. The time-complexity of the compaction command is O(m × n), where is the number of genomic windows, and is the number of nuclear profiles. The GAMtools command for this step is: gamtools compaction [OPTIONS] -s -o Calculating radial position GAMtools can be used to calculate the radial position of NPs. The radial position of an NP is a measure of how near or far that NP is from the equator or center of the nucleus. NPs that are close to the center of the nucleus are considered equatorial whereas NPs that are closer to the edge of the nucleus are considered apical. The GAMtools command to calculate radial positioning is radial_pos. This requires that you have previously generated a segregation table. The radial position is estimated from the average size of NPs that contain a given chromatin region. Chromatin that are closer to the periphery will typically be intersected by smaller, more apical NPs, whereas central chromatin will be intersected by larger, equatorial NPs. In order to estimate the size of each NP, GAMtools looks at the number of windows each NP saw, as NPs that saw more windows can be assumed to be larger in volume. This is very similar to the method used to estimate chromatin compaction. The figure to the right illustrates how GAMtools looks at each NP's detection rate to estimate the volume, in order to determine the compaction or the radial position. If we look at the first NP, we see that it intersects all three windows, so we can estimate that it is one of the largest NPs. The second NP intersects two out of the three windows, so we can estimate that it is smaller than the first NP. The third NP only intersects one out of the three windows, so we can estimate that it is the smallest NP. Now that we have an estimation of the size of each NP, we can estimate the radial position. If we assume that the larger NPs are more equatorial, then we find that the first NP is the most equatorial, the second NP is the second most equatorial, and the third NP is the most apical. The GAMtools command for this step is: gamtools radial_pos [OPTIONS] -s -o Here is some pseudocode that illustrates how one might calculate the radial position of a list of NPs: // Suppose we have a 2D matrix called data where the rows correspond to the NPs and the columns correspond to the windows, so if data[1][2] is 1, then that means NP 1 contains window 2 // Use this variable to keep track of the largest number of windows detected by a single NP LET MAXWINDOW = 0 // Use this array to keep track of the number of windows detected by each NP, so we can later determine the radial position LET RADIAL_POS = [] // Loop through all NPs FOR NP FROM 1 TO NUM_NPS: LET WINCOUNT = 0 // Count the number of windows each NP saw FOR WIN FROM 1 to NUM_WINDOWS: IF ( data[NP][WIN] == 1 ) WINCOUNT = WINCOUNT + 1 // See if the current NP has seen the most windows IF WINCOUNT > MAXWINDOW: MAXWINDOW = WINCOUNT // Add the count for the current NP to the array RADIAL_POS.APPEND( WINCOUNT ) // Divide the number of windows each NP saw by the largest number of windows any NP saw to get an estimate of the radial position FOR NP FROM 1 TO NUM_NPS: RADIAL_POS[NP] = RADIAL_POS[NP] / MAXWINDOW This pseudocode will create a list of radial positions that range from 0 - 1 that provide an estimation of the radial position, where 1 is the most equatorial and 0 is the most apical. The time complexity of this pseudocode is O( n * m ), where n is the number of NPs and m is the number of windows. The first for loop goes through n iterations, and it has an inner for loop which goes through m iterations, which means the time complexity of that for loop is O( n * m ). The second for loop has n iterations, so it has time complexity O( n ). Therefore, the overall time complexity of this code is O( n * m + n ), which can be reduced to O( n * m ). == Data analysis methods ==

Data analysis methods

Overview The above flowchart shows a general process of how data may be derived from GAM analysis. Circles represent processes that may be performed, and squares represent pieces of data. The first step of GAM analysis is the cryosectioning and examination of cells. This process results in a collection of nucleus slices (nuclear profiles) which contain pieces of DNA (genomic windows). These nuclear profiles are then examined so that a segregation table may be formed. Segregation tables are the foundation of GAM analysis. They contain information detailing which genomic loci appear within each nuclear profile. An example of data analysis not given below would be clustering. For example, nuclear profiles that contain similar genomic loci could be clustered together by k-means clustering or some variation. K-means would work well for this particular problem in the sense that it would cluster every nuclear profile according to a similarity measure, but it also has drawbacks. The time complexity of K-means clustering is O(tknd), where t is the number of iterations, k is the number of means, n is the number of data points, and d is the number of dimensions for each data point. Such a complexity makes it NP-hard. As such, it does not scale well to large data sets and is more suited to subsets of data. For further analysis, GAMtools may be used. but in this article, community analysis will be focused on centrality. Centrality-based communities can be thought of as analogous to celebrities and their fan bases on a social media network. The fans may not interact with each other very much, but they do interact with the celebrity, who is the "center." There are several different types of centrality, including but not limited to degree centrality, eigenvector centrality, and betweenness centrality, which may all result in different communities being defined. Something of note is that in our social network analogy above, an eigenvector centrality may not be accurate because one person who follows many celebrities may not have any influence over them. In that case, the graph may be seen as directed. In GAM analysis, it is generally assumed that the graph is undirected, so that if eigenvector centrality were to be used it would be accurate. Both clique and centrality calculations are computationally complex. Similar to the clustering mentioned above, they do not scale well to large problems. SLICE SLICE (StatisticaL Inference of Co-sEgregation) plays a key role in GAM data analysis. Estimating interaction probabilities of pairs Based on the detection efficiency and the previously defined probabilities u_0, u_1, and u_2, SLICE estimates the likelihood that a pair of genomic loci are interacting. These values represent the probabilities of detecting zero, one, or both loci in a nuclear profile when the loci are not interacting: Additionally, the blank, light bands intersecting near the center of the map (around window 44) indicate regions where data was filtered or unmappable. Visualizing the normalized linkage matrix as a heatmap provides a clear foundation for analyzing chromosomal contacts, allowing the data to be used for further advanced 3D architectural modeling. Graph analysis approach Graph analysis can be used to identify related subsets, or "communities", of genomic windows after pairwise relationships have been summarized in a normalized linkage matrix. Constructing a graph from normalized linkage data Once pairwise relationships between genomic windows have been summarized in a normalized linkage matrix, the matrix can be converted into a graph representation. Each genomic window is treated as a node, and an undirected edge is added between two windows when their normalized linkage exceeds a selected threshold. In the example shown here, this threshold is set to the third quartile (Q3) of the normalized linkage values. Because a genomic window is not connected to itself, diagonal entries are set to 0. The resulting adjacency matrix is therefore symmetric, consistent with an undirected graph. This graph representation can then be used for subsequent analyses such as centrality measurement and community detection. Assess centrality of windows Once the adjacency matrix has been established, the windows can be assessed using several different measures of centrality. The different measures of centrality that can be used to interpret the matrix are betweenness centrality, closeness centrality, eigenvector centrality, and degree centrality. Each of these measures can highlight different areas of the network and different structural roles of genomic windows within it. Betweenness centrality is calculated by considering the shortest paths between pairs of nodes and then determining how many of these paths pass through the node being observed, excluding cases in which it is itself an end node. This measure can help identify nodes that connect different parts of a network. Closeness centrality is calculated by summing up all of the nodes in a network minus one and dividing that number by the sum of the shortest distances to each of the nodes in the graph. It is based on the shortest-path distances from one node to all other nodes in the network and can help identify nodes that are, on average, closer to the rest of the graph. See the included Figure 1 for an example. Eigenvector centrality measures not only how many connections a node has, but also whether it is connected to other highly connected nodes. In this way, it can help identify nodes that are located in more influential or highly interconnected parts of the network. Degree centrality is calculated by dividing the number of edges connected to a node by the total number of nodes minus one: C_D(i) = \frac{\sum_{j=1}^{n} a_{ij}}{n-1} where a_{ij} represents whether node i is connected to node j in the adjacency matrix, and n is the total number of nodes in the graph. The numerator counts the total number of connections of node i, and the denominator scales the value by the maximum possible number of neighbors. See the included Figure 2 for an example of this calculation. The centrality of a node can be a good indicator of its potential to be influential in the dataset based on its position and connections within the network. Community detection Once centrality values have been calculated, it becomes possible to infer related subsets of the data. These related subsets are called "communities": clusters of nodes that are more closely linked to one another than to the rest of the network. While community detection is commonly used in social network analysis and the mapping of social connections, it can also be applied to problems such as genomic interactions. A relatively simple graph-based method for approximating communities is to identify several significant nodes using centrality measures, such as degree centrality, and then build communities around them. In one simple graph-based analysis of the Hist1 interaction network, the five nodes with the largest degree centrality were treated as hubs, and each community was defined as a hub together with its directly connected neighbors. If a node was connected to more than one hub, it was assigned to the community of the hub with which it had the strongest normalized linkage. This type of approximation can help identify groups of genomic windows with relatively strong local interaction patterns and may highlight potential chromatin interactions or other relationships that warrant further study. The resulting communities can be visualized either as subgraphs centered on hub windows or as local patterns in an adjacency heatmap. The subgraph view emphasizes the hub-and-neighbor structure of the community, while the heatmap view shows the same community in matrix form and highlights which genomic windows are connected within the selected subnetwork. ==Advantages==

Advantages

Compared with chromosome conformation capture (3C)-based methods such as Hi-C, genome architecture mapping (GAM) provides several advantages: • GAM enables the detection of higher-order chromatin interactions, since it does not rely on proximity ligation between DNA fragments. In contrast, 3C-based methods such as Hi-C primarily capture pairwise contacts, whereas GAM can infer multi-way interactions (e.g., triplets or higher-order associations) based on co-segregation patterns across nuclear profiles. • Unlike 3C-based approaches, GAM is a ligation-free method, as it does not require restriction enzyme digestion or proximity ligation. Instead, it is based on cryosectioning of individual nuclei followed by sequencing of nuclear slices, reducing biases associated with enzymatic fragmentation and ligation efficiency. • GAM can also be applied to small numbers of cells, making it suitable for experimental conditions where biological material is limited. This is in contrast to some Hi-C protocols, which typically require large cell populations to generate high-resolution contact maps. ==Disadvantages==

Disadvantages

Despite its ability to detect higher-order chromatin interactions, genome architecture mapping (GAM) has several limitations compared with chromosome conformation capture (3C)-based approaches such as Hi-C. • GAM is a lower-throughput method, as it relies on sequencing DNA from thin cryosections of individual nuclei, known as nuclear profiles. Each profile captures only a subset of the genome, requiring a large number of profiles to reconstruct genome-wide contact probabilities. By comparison, Hi-C produces dense genome-wide contact maps from bulk cell populations in a single experiment. • In addition, GAM relies on statistical inference rather than direct ligation counts. Contact probabilities are estimated from the co-segregation of genomic loci across multiple nuclear slices, and computational models such as SLICE are required to reconstruct interaction frequencies. • Finally, GAM has limited scalability and adoption relative to Hi-C. It is less widely used, has fewer standardized experimental and computational pipelines, and lacks extensive large-scale reference datasets, which can reduce reproducibility and complicate cross-study comparisons. ==References==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com