Probability (P) Source: It is necessary to look at the cause of a failure mode and the likelihood of occurrence. This can be done by analysis, calculations / FEM, looking at similar items or processes and the failure modes that have been documented for them in the past. A failure cause is looked upon as a design weakness. All the potential causes for a failure mode should be identified and documented. This should be in technical terms. Examples of causes are: Human errors in handling, manufacturing induced faults, fatigue, creep, abrasive wear, erroneous algorithms, excessive voltage or improper operating conditions or use (depending on the used ground rules). A failure mode may be given a
Probability Ranking with a defined number of levels. This field is also often referred to as an
Occurrence Rating. This method allows a quantitative FTA to use the FMEA results to verify that undesired events meet acceptable levels of risk.
Severity (S) Source: Determine the Severity for the worst-case scenario adverse end effect (state). It is convenient to write these effects down in terms of what the user might see or experience in terms of functional failures. Examples of these end effects are: full loss of function x, degraded performance, functions in reversed mode, too late functioning, erratic functioning, etc. Each end effect is given a Severity number (S) from, say, I (no effect) to V (catastrophic), based on cost and/or loss of life or quality of life. These numbers prioritize the failure modes (together with probability and detectability). Below a typical classification is given. Other classifications are possible. See also
hazard analysis.
Detection (D) Source: The means or method by which a failure is detected, isolated by operator and/or maintainer and the time it may take. This is important for maintainability control (availability of the system) and it is especially important for multiple failure scenarios. This may involve dormant failure
modes (e.g. no direct system effect, while a redundant system / item automatically takes over or when the failure only is problematic during specific mission or system states) or latent failures (e.g. deterioration failure
mechanisms, like metal growing a crack, but not of critical length). It should be made clear how the failure mode or cause can be discovered by an operator under normal system operation or if it can be discovered by the maintenance crew by some diagnostic action or automatic built in system test. A dormancy and/or latency period may be entered.
Dormancy or latency period The average time that a failure mode may be undetected may be entered if known. For example: • Seconds, auto detected by maintenance computer • 8 hours, detected by turn-around inspection • 2 months, detected by scheduled maintenance block X • 2 years, detected by overhaul task x
Indication If the undetected failure allows the system to remain in a
safe / working state, a second failure situation should be explored to determine whether or not an indication will be evident to all
operators and what corrective action they may or should take. Indications to the operator should be described as follows: • Normal. An indication that is evident to an operator when the system or equipment is operating normally. • Abnormal. An indication that is evident to an operator when the system has malfunctioned or failed. • Incorrect. An erroneous indication to an operator due to the malfunction or failure of an indicator (i.e., instruments, sensing devices, visual or audible warning devices, etc.). PERFORM DETECTION COVERAGE ANALYSIS FOR TEST PROCESSES AND MONITORING (From ARP4761 Standard): This type of analysis is useful to determine how effective various test processes are at the detection of latent and dormant faults. The method used to accomplish this involves an examination of the applicable failure modes to determine whether or not their effects are detected, and to determine the percentage of failure rate applicable to the failure modes which are detected. The possibility that the detection means may itself fail latently should be accounted for in the coverage analysis as a limiting factor (i.e., coverage cannot be more reliable than the detection means availability). Inclusion of the detection coverage in the FMEA can lead to each individual failure that would have been one effect category now being a separate effect category due to the detection coverage possibilities. Another way to include detection coverage is for the FTA to conservatively assume that no holes in coverage due to latent failure in the detection method affect detection of all failures assigned to the failure effect category of concern. The FMEA can be revised if necessary for those cases where this conservative assumption does not allow the top event probability requirements to be met. After these three basic steps the Risk level may be provided.
Risk level (P×S) and (D) Source:
Risk is the combination of end effect probability and severity where probability and severity includes the effect on non-detectability (
dormancy time). This may influence the end effect probability of failure or the worst case effect severity. The exact calculation may not be easy in all cases, such as those where multiple scenarios (with multiple events) are possible and detectability / dormancy plays a crucial role (as for redundant systems). In that case fault tree analysis and/or event trees may be needed to determine exact probability and risk levels. Preliminary risk levels can be selected based on a
risk matrix like that shown below, based on Mil. Std. 882. The higher the risk level, the more justification and mitigation is needed to provide evidence and lower the risk to an acceptable level. High risk should be indicated to higher level management, who are responsible for final decision-making. • After this step the FMEA has become like a
FMECA. == Enhanced Design FMEA Technique (DDMA) ==