A
distributed algorithm is self-stabilizing if, starting from an arbitrary state, it is guaranteed to converge to a legitimate state and remain in a legitimate set of states thereafter. A state is legitimate if, starting from this state, the algorithm satisfies its specification. The property of self-stabilization enables a distributed algorithm to recover from a
transient fault regardless of its nature. Moreover, a self-stabilizing algorithm does not have to be initialized as it eventually starts to behave correctly regardless of its initial state. Dijkstra's paper, which introduces the concept of self-stabilization, presents an example in the context of a "
token ring"—a network of computers ordered in a circle. Here, each computer or processor can "see" the whole state of one processor that immediately precedes it and that this state may imply that the processor "has a token" or it "does not have a token." were often very difficult and time-consuming, such a behavior was considered desirable. (The method described in the paper cited above collects a huge amount of information from the whole network to one place; after that, it attempts to determine whether the collected global state is correct; even that determination alone can be a hard task).
Efficiency improvements More recently, researchers have presented newer methods for light-weight error detection for self-stabilizing systems using local checking. and for general tasks. The term
local refers to a part of a computer network. When local detection is used, a computer in a network is not required to communicate with the entire network in order to detect an error—the error can be detected by having each computer communicate only with its nearest neighbors. These local detection methods simplified the task of designing self-stabilizing algorithms considerably. This is because the error detection mechanism and the recovery mechanism can be designed separately. Newer algorithms based on these detection methods also turned out to be much more efficient. Moreover, these papers suggested rather efficient general transformers to transform non self stabilizing algorithms to become self stabilizing. The idea is to, • Run the non self stabilizing protocol, at the same time, • detect faults (during the execution of the given protocol) using the above-mentioned detection methods, • then, apply a (self stabilizing) "reset" protocol to return the system to some predetermined initial state, and, finally, • restart the given (non- self stabilizing) protocol. The combination of these 4 parts is self stabilizing (as long as there is no trigger of fault during the correction fault phases, e.g.,). Initial self stabilizing protocols were also presented in the above papers. More efficient reset protocols were presented later, e.g. Additional efficiency was introduced with the notion of time-adaptive protocols. The idea behind these is that when only a small number of errors occurs, the recovery time can (and should) be made short. Dijkstra's original self-stabilization algorithms do not have this property. A useful property of self-stabilizing algorithms is that they can be composed of layers if the layers do not exhibit any
circular dependencies. The stabilization time of the composition is then bounded by the sum of the individual stabilization times of each layer. New approaches to Dijkstra's work emerged later on such as the case of
Krzysztof Apt and Ehsan Shoja's proposition, which demonstrated how self-stabilization can be naturally formulated using the standard concepts of strategic games, particularly the concept of an improvement path. This particular work sought to demonstrate the link between self-stabilization and
game theory.
Time complexity The time
complexity of a self-stabilizing algorithm is measured in (asynchronous) rounds or cycles. • A
round is the shortest execution trace in which each processor executes at least one step. • Similarly, a
cycle is the shortest execution trace in which each processor executes at least one complete iteration of its repeatedly executed list of commands. To measure the output stabilization time, a subset of the state variables is defined to be externally visible (the
output). Certain states of outputs are defined to be correct (legitimate). The set of the outputs of all the components of the system is said to have stabilized at the time that it starts to be correct, provided it stays correct indefinitely, unless additional faults occur. The output stabilization time is the time (the number of (asynchronous)
rounds) until the output stabilizes.{{Citation | last1=Awerbuch | first1=Baruch | authorlink1=Baruch Awerbuch | last2=Patt-Shamir | first2=Boaz | last3=Varghese | first3=George | authorlink3=George Varghese | contribution=Self-stabilization by local checking and correction | title=Proc. 32nd Symposium on Foundations of Computer Science (FOCS) | year=1991 | pages=268–277 | doi=10.1109/SFCS.1991.185378 == Definition ==