For a given set of labels \mathit{L}\, the Classifier Chain model (CC) learns \left \vert L \right \vert classifiers as in the Binary Relevance method. All classifiers are linked in a chain through feature space. Given a data set where the i-th instance has the form \mathit{(x_{i}, Y_{i})}\, where \mathit{Y_{i}}\, is a subset of labels, \mathit{x_{i}}\, is a set of features. The data set is transformed in \left \vert L \right \vert data sets where instances of the j-th data set has the form ((x_{i}, l_{1}, ..., l_{j-1}), l_{j}), l_{j} \in \{0,1\}. If the j-th label was assigned to the instance then \mathit{l_{j}}\, is 1, otherwise it is 0. Thus, classifiers build a chain where each of them learns
binary classification of a single label. The features given to each classifier are extended with binary values that indicate which of previous labels were assigned to the instance. By classifying new instances the labels are again predicted by building a chain of classifiers. The classification begins with the first classifier \mathit{C_{1}}\, and proceeds to the last one \mathit{C_}\, by passing label information between classifiers through the feature space. Hence, the inter-label dependency is preserved. However, the result can vary for different order of chains. For example, if a label often co-occur with some other label, then only instances of the label which comes later in the chain will have information about the other one in its feature vector. In order to solve this problem and increase accuracy it is possible to use
ensemble of classifiers. In Ensemble of Classifier Chains (ECC) several CC classifiers can be trained with random order of chains (i.e. random order of labels) on a random subset of data set. Labels of a new instance are predicted by each classifier separately. After that, the total number of predictions or "votes" is counted for each label. The label is accepted if it was predicted by a percentage of classifiers that is bigger than some threshold value. == Adaptations ==