One of the tenets of maintenance and asset Management that is rarely questioned is the belief that it is essential for organisations to identify “critical” assets. While this may be true, the reality is that most organisations do this extremely poorly, and the effort that they put into doing “Equipment Criticality Assessment” is mostly completely wasted. Let’s examine this proposition, and the reasons why it may be true.
What is equipment criticality?
Let’s start by seeing if there is common agreement on what equipment criticality is. More specifically, is equipment criticality the same as equipment failure risk, or is it something different?
ISO 31000:2009 – Risk Management – Principles and Guidelines defines risk as “the effect of uncertainty on objectives”. The risk assessment process starts by first identifying risk events. In turn, these risk events have two dimensions:
- The consequence of an event
- The likelihood of an event
The overall level of risk is determined by the combination of these dimensions, frequently visualised in a risk matrix.
We can consider risk to be the combination of the severity of consequences of an event, and the probability or likelihood of that event occurring. In other words, risk applies to an event – not to a physical item (such as an item of equipment). If we consider that Equipment criticality is the same as equipment failure risk, then we had better be clear about what the failure event(s) are that we are assessing. An equipment item may fail due to many different causes, and the likelihood and consequences of each of these failure events will be different. So how do we roll the risks associated with all of these individual events up to equipment level in order to arrive at an overall level of failure risk for the equipment item as a whole? There are no standards that state how this is to be done.
Some standards do exist for the performance of Failure Modes, Effects and Criticality Analysis (FMECA). However, it should be noted that in this process, criticality is determined at a failure mode (failure cause) level, and is used as part of a process to identify the most appropriate action to be taken to minimise or eliminate the potential likelihood or consequences of each failure mode (cause). This process is not intended to result in an overall criticality assessment at equipment item level. Further, this risk assessment is typically done assuming that there are no controls in place to mitigate the risks associated with each failure mode – which is generally not the case once the equipment is operating.
Even the recently published ISO 55000 standard for Asset Management does not define equipment criticality – although it does define a critical asset as being an “asset having potential to significantly impact on the achievement of the organisation’s objectives”. ISO 55002 suggests that a “a risk ranking process can be used to determine which assets have a significant potential to impact on the achievement of the asset management objectives, i.e. which are the critical assets”. However, once again, assessing risk implies having to assess the likelihood of an event, which in turn means that we need to be clear about exactly which events are being assessed, and how the probability and consequences associated with multiple events on an equipment item are to be rolled up to an overall failure risk associated with that equipment.
Furthermore, if we accept that Equipment Criticality is somehow derived from equipment failure risks, it is not clear whether we are intended to assess the unmitigated or mitigated risks associated with each of these failures. In other words, are we supposed to assess the risks assuming that we do not have any controls in place to minimise the likelihood or consequences of those failure events (or that they are not effective) or are we expected to assume that the controls that we currently have in place are effective when assessing equipment failure risks?
Why do you need equipment criticality?
So far, so complex. So let’s go back to basics. Before we can really agree on how we should define equipment criticality, perhaps we should consider what we want to use it for. Perhaps if we understand what we want to use it for, then this will help us define it better.
Some of the potential uses for an Equipment Criticality rating (sometimes stored in a separate field against each equipment item in a CMMS) are:
- As an input to determine the overall priority for performing a maintenance task (sometimes combined with a “Work Order Priority” entered against the specific task to give an overall priority for the task)
- To determine, at a high level, the type of risk mitigation strategy to be applied to the equipment (e.g. do condition monitoring and defect elimination on high criticality items)
- As an input into determining the optimum spare parts holdings required for the equipment item
- To provide input into the capital program so that “high criticality” equipment is given a higher priority for upgrade or replacement
- To guide reliability engineers so that they focus their reliability improvement efforts on the most “critical” equipment
Let’s examine each of these applications, and discuss the value that a single “criticality” rating in a CMMS may have in assisting in each of these decision making processes.
To determine the priority for performing a maintenance task
The priority for a maintenance task should be determined by the level of risk associated with not performing that task. The level of risk associated with not doing that task is determined by both the consequences of the potential failure that may result if the task is not performed and the likelihood of that failure occurring if the task is not performed at the particular point in time at which that priority is being determined. A generic criticality rating for an equipment item will not necessarily capture the level of risk associated with that particular task at that particular point in time.
For example, let’s assume that the task we are considering is to be performed on a particular piece of equipment where there is 100% installed redundancy. For example, the equipment is the duty pump in a pair of pumps which normally operate in duty/standby mode. Because there is 100% installed redundancy, the criticality rating for this pump is likely to be moderate or low. However let’s assume that, on this particular occasion the standby pump is unserviceable (it may be undergoing some maintenance of its own). All of a sudden, this duty pump becomes highly critical, and the maintenance task to be performed on it becomes very high priority (or the maintenance work that we have already started on the standby pump becomes much higher priority). This will not (and cannot) be reflected in the equipment criticality rating stored in the CMMS.
Further, let’s assume that the maintenance task that was being considered was to replace the nameplate on the pump, which is becoming difficult to read. What are the potential consequences associated with failure to perform this task? Alternatively, what if the task was to replace the pump seals, as they are leaking and spraying acid (this is an acid pump) into a working area? Clearly the consequences associated with not performing this task are determined more by what the task is, than by what the equipment is. So using generic “equipment criticality” to determine the priority for a maintenance task is mostly invalid, and if this is your only reason for performing equipment criticality analysis, then don’t do it.
To determine the high level risk mitigation strategy to be applied to equipment
By “high level risk mitigation strategy” we mean a framework similar to the following:
Equipment criticality | Mitigation strategy |
---|---|
Very high | Contingency plans, hold critical spare parts, predictive and preventative maintenance |
High | Hold critical spare parts, predictive and preventative maintenance |
Moderate | Predictive and preventative maintenance |
Low | Preventative maintenance |
Very Low | Run to failure – corrective maintenance only |
The key issue with this type of approach (and we have seen some of our clients use something similar to this) for determining whether Predictive Maintenance, Preventive Maintenance or Run to Failure strategies are employed is that it fails to comply with the “Four Core Concepts of Preventive Maintenance Development”. In particular, the selection of maintenance strategy occurs at a task, not an equipment level, and must be in response to the specific failure modes (causes) occurring on the equipment. There is perhaps some merit, however, in ensuring that contingency plan are in place for very high criticality equipment items. We consider using Criticality Assessment to determine whether or not to hold critical spare parts below.
To determine the optimum spare parts holdings required for the equipment
If the definition of Equipment Criticality is not clear, then the definition of “critical spare part” is no clearer. Some organisations take this to mean high value items only, others consider that this excludes fast-moving stock. Others confuse “critical” spare parts with “insurance” spares.
So how can we use equipment criticality to determine whether to hold critical spare parts or not? The only way that we can accurately determine the answer to that question is to consider the failure events that give rise to the need for that part. This may, or may not, be the same failure event that we considered when we were assessing the overall criticality for that equipment. If it is not, then using a “generic” assessment of equipment criticality for this exercise is likely to lead to over-stocking of spare parts. Further, we need to assess the risks (criticality) that we would be exposed to without holding the spare part in stock, and compare this with the risks that we would be exposed to if we did hold the item in stock. The level of risk reduction that results needs to be balanced against the costs of holding the spare part in stock.
Clearly, some form of risk assessment is an important part of determining what spare parts are critical – but a single rating for “equipment criticality” is of very limited value in this process.
To ensure that “high criticality” equipment is given a higher priority for upgrade or replacement
One of the key factors in determining whether to replace or retire an item of equipment is its current condition. In addition, life cycle cost analysis will determine what the appropriate replacement schedule for an item of equipment is. How does equipment criticality fit into this picture? For equipment criticality to be useful in this exercise, it needs to be reviewed and updated on a periodic basis (perhaps annually) and the likelihood of failure updated based on an assessment of current equipment condition. Few, if any, organisations, in my experience, currently do this on a routine basis.
Further, as our capital planning cycle is typically measured in years, we need to forecast when we are likely to need to replace or upgrade equipment at least one year, and most probably several years in advance. This means that, to be valuable in this capital planning exercise, it is not sufficient for us to assess the current equipment criticality – we must also forecast its future criticality. Again, few, if any organisations, in my experience currently do this. And where would we store these criticality ratings – most CMMS only have space to record one criticality rating per equipment item.
Using Equipment Criticality for this purpose is clearly a waste of time.
To focus reliability improvement efforts on the most “critical” equipment
Here we are probably on firmer ground. If we are assessing equipment criticality in a general, “overall” sense, then this is the purpose to which it is best suited. It is not well-suited to being used for more specific purposes – in which case a more specific assessment of equipment failure-related risks is required. Equipment criticality can be valuable in terms of providing direction to those responsible for improving equipment performance as to which equipment items are more important, in an overall sense, to the business – and therefore where they should be directing their efforts.
It can also be useful in terms of providing general guidance regarding the level of rigour to be applied when making some of the decisions that we have discussed previously – such as determining critical spare parts holdings. Clearly, if, in general terms, equipment is more critical to the business, then we would want to be more careful regarding the decisions we make regarding that equipment – and therefore apply more rigorous analytical approaches when making decisions on critical equipment. An example of how this could be applied is discussed in our article “Alternative approaches for developing and optimising Preventive Maintenance”.
If there is one key message in this article it is this. Please ensure that you understand what you are planning to use Equipment Criticality for before you start performing it. If you are planning to use it for a specific purpose (such as identifying critical spare parts) then you may find that determining criticality at an equipment level is insufficient. You may also find that you need to assess both mitigated and unmitigated criticality to assist you with making some decisions. And the assumptions that you make regarding what mitigations are in place when you perform the assessment will also depend on what you are intending to use the Equipment Criticality for.
Don’t waste your and other people’s time in assessing Equipment Criticality unless you are absolutely sure about the purposes to which it is to be put, and that the process that you follow is suitable to achieve the goals that you are intending to achieve.
If you enjoyed this article and want to receive notifications of future articles that we publish, please sign up for our newsletter here.