The next stage of EDAs development was the use of multivariate factorizations. In this case, the joint probability distribution is usually factorized in a number of components of limited size |\pi_i| \leq K,~\forall i\in 1,2,\dots,N. p(X_1,\dots,X_N) = \prod_{i=1}^{N} p(X_i|\pi_i) The learning of PGMs encoding multivariate distributions is a computationally expensive task, therefore, it is usual for EDAs to estimate
multivariate statistics from bivariate statistics. Such relaxation allows PGM to be built in polynomial time in N; however, it also limits the generality of such EDAs.
Extended compact genetic algorithm (eCGA) The ECGA was one of the first EDA to employ multivariate factorizations, in which high-order dependencies among decision variables can be modeled. Its approach factorizes the joint probability distribution in the product of multivariate marginal distributions. Assume T_\text{eCGA}=\{\tau_1,\dots,\tau_\Psi\} is a set of subsets, in which every \tau\in T_\text{eCGA} is a linkage set, containing |\tau|\leq K variables. The factorized joint probability distribution is represented as follows p(X_1,\dots,X_N) = \prod_{\tau\in T_\text{eCGA}} p(\tau). The ECGA popularized the term "linkage-learning" as denoting procedures that identify linkage sets. Its linkage-learning procedure relies on two measures: (1) the Model Complexity (MC) and (2) the Compressed Population Complexity (CPC). The MC quantifies the model representation size in terms of number of bits required to store all the marginal probabilities MC = \log_2 (\lambda+1) \sum_{\tau\in T_\text{eCGA}} (2^{|\tau|-1}), The CPC, on the other hand, quantifies the data compression in terms of entropy of the
marginal distribution over all partitions, where \lambda is the selected population size, |\tau| is the number of decision variables in the linkage set \tau and H(\tau) is the
joint entropy of the variables in \tau CPC = \lambda \sum_{\tau\in T_\text{eCGA}} H(\tau). The linkage-learning in ECGA works as follows: (1) Insert each variable in a cluster, (2) compute CCC = MC + CPC of the current linkage sets, (3) verify the increase on CCC provided by joining pairs of clusters, (4) effectively joins those clusters with highest CCC improvement. This procedure is repeated until no CCC improvements are possible and produces a linkage model T_\text{eCGA}. The ECGA works with concrete populations, therefore, using the factorized distribution modeled by ECGA, it can be described as P(t+1) = \beta_\mu \circ \alpha_\text{eCGA} \circ S(P(t))
Bayesian optimization algorithm (BOA) The BOA uses Bayesian networks to model and sample promising solutions. Bayesian networks are directed acyclic graphs, with nodes representing variables and edges representing conditional probabilities between pair of variables. The value of a variable x_i can be conditioned on a maximum of K other variables, defined in \pi_i. BOA builds a PGM encoding a factorized joint distribution, in which the parameters of the network, i.e. the conditional probabilities, are estimated from the selected population using the maximum likelihood estimator. p(X_1,X_2,\dots,X_N)=\prod_{i=1}^{N}p(X_i|\pi_{i}). The Bayesian network structure, on the other hand, must be built iteratively (linkage-learning). It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g.
Bayesian information criterion (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)). The scoring metric evaluates the network structure according to its accuracy in modeling the selected population. From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents. Given such scenario, every BOA step can be defined as P(t+1) = \beta_\mu \circ \alpha_\text{BOA} \circ S(P(t))
Linkage-tree Genetic Algorithm (LTGA) The LTGA differs from most EDA in the sense it does not explicitly model a probability distribution but only a linkage model, called linkage-tree. A linkage T is a set of linkage sets with no probability distribution associated, therefore, there is no way to sample new solutions directly from T. The linkage model is a linkage-tree produced stored as a
Family of sets (FOS). T_\text{LT} = \{ \{x_1\}, \{x_2\},\{x_3\},\{x_4\},\{x_1,x_2\},\{x_3,x_4\} \}. The linkage-tree learning procedure is a
hierarchical clustering algorithm, which work as follows. At each step the two
closest clusters i and j are merged, this procedure repeats until only one cluster remains, each subtree is stored as a subset \tau\in T_\text{LT}. The LTGA uses T_\text{LT} to guide an "optimal mixing" procedure which resembles a recombination operator but only accepts improving moves. We denote it as R_\text{LTGA}, where the notation x[\tau]\gets y[\tau] indicates the transfer of the genetic material indexed by \tau from y to x. Input: A family of subsets T_\text{LT} and a population P(t) Output: A population P(t+1).
for each x_i
in P(t)
do for each \tau
in T_\text{LT}
do choose a random x_j\in P(t) : x_i\neq x_j f_{x_i} := f(x_i) x_i[\tau]:= x_j[\tau]
if f(x_i) \leq f_{x_i}
then x_i[\tau]:= x_j[\tau]
return P(t) The LTGA does not implement typical selection operators, instead, selection is performed during recombination. Similar ideas have been usually applied into local-search heuristics and, in this sense, the LTGA can be seen as an hybrid method. In summary, one step of the LTGA is defined as P(t+1) = R_{\text{LTGA}}(P(t)) \circ \alpha_\text{LTGA} (P(t)) ==Other==