It is discussed in Cover and Thomas. It is extensively studied in Vereshchagin and
Vitányi where also the main properties are resolved. The Kolmogorov structure function can be written as : h_{x}(\alpha) = \min_S \{\log |S| : x \in S , K(S) \leq \alpha \} where x is a binary string of length n with x \in S where S is a contemplated model (set of n-length strings) for x, K(S) is the
Kolmogorov complexity of S and \alpha is a nonnegative integer value bounding the complexity of the contemplated S's. Clearly, this function is nonincreasing and reaches \log |\{x\}| =0 for \alpha = K(x)+c where c is the required number of bits to change x into \{x\} and K(x) is the
Kolmogorov complexity of x.
The algorithmic sufficient statistic We define a set S containing x such that :K(S)+K(x|S)=K(x)+O(1). The function h_x(\alpha) never decreases more than a fixed independent constant below the diagonal called sufficiency line L defined by :L(\alpha)+\alpha = K(x). It is approached to within a constant distance by the graph of h_x for certain arguments (for instance, for \alpha = K(x)+c). For these \alpha's we have \alpha + h_x (\alpha) = K(x)+O(1) and the associated model S (witness for h_x(\alpha)) is called an optimal set for x, and its description of K(S)\leq \alpha bits is therefore an
algorithmic sufficient statistic. We write `algorithmic' for `Kolmogorov complexity' by convention. The main properties of an algorithmic
sufficient statistic are the following: If S is an algorithmic sufficient statistic for x, then :K(S)+\log |S| = K(x)+O(1). That is, the two-part description of x using the model S and as data-to-model code the index of x in the enumeration of S in \log |S| bits, is as concise as the shortest one-part code of x in K(x) bits. This can be easily seen as follows: :K(x) \leq K(x,S) +O(1) \leq K(S)+K(x|S)+O(1) \leq K(S)+\log|S|+O(1) \leq K(x)+O(1), using straightforward inequalities and the sufficiency property, we find that K(x|S)=\log |S| +O(1). (For example, given S \ni x, we can describe x self-delimitingly (you can determine its end) in \log |S|+O(1) bits.) Therefore, the
randomness deficiency \log |S|-K(x|S) of x in S is a constant, which means that x is a typical (random) element of S. However, there can be models S containing x that are not sufficient statistics. An algorithmic sufficient statistic S for x has the additional property, apart from being a model of best fit, that K(x,S)=K(x)+O(1) and therefore by the Kolmogorov complexity
symmetry of information (the information about x in S is about the same as the information about S in x) we have K(S|x^*)=O(1): the algorithmic sufficient statistic S is a model of best fit that is almost completely determined by x. (x^* is a shortest program for x.) The algorithmic sufficient statistic associated with the least such \alpha is called the algorithmic
minimal sufficient statistic. With respect to the picture: The MDL structure function \lambda_x(\alpha) is explained below. The Goodness-of-fit structure function \beta_x(\alpha) is the least randomness deficiency (see above) of any model S \ni x for x such that K(S) \leq \alpha. This structure function gives the goodness-of-fit of a model S (containing x) for the string x. When it is low the model fits well, and when it is high the model doesn't fit well. If \beta_x (\alpha) =0 for some \alpha then there is a typical model S \ni x for x such that K(S) \leq \alpha and x is typical (random) for S. That is, S is the best-fitting model for x. For more details see
Selection of properties Within the constraints that the graph goes down at an angle of at least 45 degrees, that it starts at n and ends approximately at K(x), every graph (up to a O(\log n) additive term in argument and value) is realized by the structure function of some data x and vice versa. Where the graph hits the diagonal first the argument (complexity) is that of the minimum sufficient statistic. It is incomputable to determine this place. See.
Main property It is proved that at each level \alpha of complexity the structure function allows us to select the best model S for the individual string x within a strip of O(\log n) with certainty, not with great probability. ==The MDL variant==