MarketFencing (computing)
Company Profile

Fencing (computing)

Fencing is the process of isolating a node of a computer cluster or protecting shared resources when a node appears to be malfunctioning.

Basic concepts
A node fence (or I/O fence) is a virtual "fence" that separates nodes which must not have access to a shared resource from that resource. It may separate an active node from its backup. If the backup crosses the fence and, for example, tries to control the same disk array as the primary, a data hazard may occur. Mechanisms such as STONITH are designed to prevent this condition. Isolating a node means ensuring that I/O can no longer be done from it. Fencing is typically done automatically, by cluster infrastructure such as shared disk file systems, in order to protect processes from other active nodes modifying the resources during node failures. Mechanisms to support fencing, such as the reserve/release mechanism of SCSI, have existed since at least 1985. Fencing is required because it is impossible to distinguish between a real failure and a temporary hang. If the malfunctioning node is really down, then it cannot do any damage, so theoretically no action would be required (it could simply be brought back into the cluster with the usual join process). However, because there is a possibility that a malfunctioning node could itself consider the rest of the cluster to be the one that is malfunctioning, a split brain condition could ensue, and cause data corruption. Instead, the system has to assume the worst scenario and always fence in case of problems. ==Approaches to fencing==
Approaches to fencing
There are two classes of fencing methods, one which disables a node itself, the other disallows access to resources such as shared disks. In some cases, it is assumed that if a node does not respond after a given time-threshold it may be assumed as non-operational, although there are counterexamples, e.g. a long paging rampage. Fencing is the isolation of a failed node so that it does not cause disruption to a computer cluster. As its name suggests, STONITH fences failed nodes by resetting or powering down the failed node. Multi-node error-prone contention in a cluster can have catastrophic results, such as if both nodes try writing to a shared storage resource. STONITH provides effective, if rather drastic, protection against these problems. Single node systems use a comparable mechanism called a watchdog timer. A watchdog timer will reset the node if the node does not tell the watchdog circuit that it is operating well. A STONITH decision can be based on various decisions which can be customer specific plugins. ==See also==
tickerdossier.comtickerdossier.substack.com