Overview The above flowchart shows a general process of how data may be derived from GAM analysis. Circles represent processes that may be performed, and squares represent pieces of data. The first step of GAM analysis is the cryosectioning and examination of cells. This process results in a collection of nucleus slices (nuclear profiles) which contain pieces of DNA (genomic windows). These nuclear profiles are then examined so that a segregation table may be formed. Segregation tables are the foundation of GAM analysis. They contain information detailing which genomic loci appear within each nuclear profile. An example of data analysis not given below would be clustering. For example, nuclear profiles that contain similar genomic loci could be clustered together by
k-means clustering or some variation. K-means would work well for this particular problem in the sense that it would cluster every nuclear profile according to a
similarity measure, but it also has drawbacks. The time complexity of K-means clustering is O(tknd), where
t is the number of iterations,
k is the number of means,
n is the number of data points, and
d is the number of dimensions for each data point. Such a complexity makes it
NP-hard. As such, it does not scale well to large data sets and is more suited to subsets of data. For further analysis, GAMtools may be used. but in this article, community analysis will be focused on
centrality. Centrality-based communities can be thought of as analogous to celebrities and their fan bases on a social media network. The fans may not interact with each other very much, but they do interact with the celebrity, who is the "center." There are several different types of centrality, including but not limited to degree centrality,
eigenvector centrality, and betweenness centrality, which may all result in different communities being defined. Something of note is that in our social network analogy above, an eigenvector centrality may not be accurate because one person who follows many celebrities may not have any influence over them. In that case, the graph may be seen as directed. In GAM analysis, it is generally assumed that the graph is undirected, so that if eigenvector centrality were to be used it would be accurate. Both clique and centrality calculations are computationally complex. Similar to the clustering mentioned above, they do not scale well to large problems.
SLICE SLICE (
StatisticaL Inference of Co-sEgregation) plays a key role in GAM data analysis.
Estimating interaction probabilities of pairs Based on the detection efficiency and the previously defined probabilities u_0, u_1, and u_2, SLICE estimates the likelihood that a pair of genomic loci are interacting. These values represent the probabilities of detecting zero, one, or both loci in a nuclear profile when the loci are not interacting: Additionally, the blank, light bands intersecting near the center of the map (around window 44) indicate regions where data was filtered or unmappable. Visualizing the normalized linkage matrix as a heatmap provides a clear foundation for analyzing
chromosomal contacts, allowing the data to be used for further advanced 3D architectural modeling.
Graph analysis approach Graph analysis can be used to identify related subsets, or "communities", of genomic windows after pairwise relationships have been summarized in a normalized linkage matrix.
Constructing a graph from normalized linkage data Once pairwise relationships between genomic windows have been summarized in a normalized linkage matrix, the matrix can be converted into a graph representation. Each genomic window is treated as a node, and an undirected edge is added between two windows when their normalized linkage exceeds a selected threshold. In the example shown here, this threshold is set to the third quartile (Q3) of the normalized linkage values. Because a genomic window is not connected to itself, diagonal entries are set to 0. The resulting adjacency matrix is therefore symmetric, consistent with an undirected graph. This graph representation can then be used for subsequent analyses such as centrality measurement and community detection.
Assess centrality of windows Once the adjacency matrix has been established, the windows can be assessed using several different measures of
centrality. The different measures of centrality that can be used to interpret the matrix are
betweenness centrality,
closeness centrality,
eigenvector centrality, and
degree centrality. Each of these measures can highlight different areas of the network and different structural roles of genomic windows within it.
Betweenness centrality is calculated by considering the shortest paths between pairs of nodes and then determining how many of these paths pass through the node being observed, excluding cases in which it is itself an end node. This measure can help identify nodes that connect different parts of a network.
Closeness centrality is calculated by summing up all of the nodes in a network minus one and dividing that number by the sum of the shortest distances to each of the nodes in the graph. It is based on the shortest-path distances from one node to all other nodes in the network and can help identify nodes that are, on average, closer to the rest of the graph. See the included Figure 1 for an example.
Eigenvector centrality measures not only how many connections a node has, but also whether it is connected to other highly connected nodes. In this way, it can help identify nodes that are located in more influential or highly interconnected parts of the network.
Degree centrality is calculated by dividing the number of edges connected to a node by the total number of nodes minus one: C_D(i) = \frac{\sum_{j=1}^{n} a_{ij}}{n-1} where a_{ij} represents whether node i is connected to node j in the adjacency matrix, and n is the total number of nodes in the graph. The numerator counts the total number of connections of node i, and the denominator scales the value by the maximum possible number of neighbors. See the included Figure 2 for an example of this calculation. The centrality of a node can be a good indicator of its potential to be influential in the dataset based on its position and connections within the network.
Community detection Once centrality values have been calculated, it becomes possible to infer related subsets of the data. These related subsets are called "communities": clusters of nodes that are more closely linked to one another than to the rest of the network. While community detection is commonly used in social network analysis and the mapping of social connections, it can also be applied to problems such as genomic interactions. A relatively simple graph-based method for approximating communities is to identify several significant nodes using centrality measures, such as degree centrality, and then build communities around them. In one simple graph-based analysis of the Hist1 interaction network, the five nodes with the largest degree centrality were treated as hubs, and each community was defined as a hub together with its directly connected neighbors. If a node was connected to more than one hub, it was assigned to the community of the hub with which it had the strongest normalized linkage. This type of approximation can help identify groups of genomic windows with relatively strong local interaction patterns and may highlight potential chromatin interactions or other relationships that warrant further study. The resulting communities can be visualized either as subgraphs centered on hub windows or as local patterns in an adjacency heatmap. The subgraph view emphasizes the hub-and-neighbor structure of the community, while the heatmap view shows the same community in matrix form and highlights which genomic windows are connected within the selected subnetwork. ==Advantages==