Asset Reliability Management: Strategy or Plan?

Managing individual, risk-oriented operation and maintenance (O&M) strategies involves the classic four stages of the Deming cycle — from analysis and strategy to planning, execution, and strategy adjustment. Typically, reliability specialists stop at the formulated strategy, neglecting the nuances of business planning and the creation of production and O&M schedules. However, these nuances can significantly impact the carefully calibrated "strategic" rhythms of interventions, leading to the failure to achieve target reliability indicators, such as reduced risk and the effect of implementing a new strategy. Several key and recurring nuances, such as the production schedule, the availability of specific resources, seasonality of work, external and internal constraints, and more, can be incorporated into the strategy formulation stage itself. This article explores this concept.
Risk-Oriented Maintenance Management: stopping at a New Planning Strategy or Reaching a Business Plan?
In our previous articles (link to article 1 and link to article 2), we discussed the importance of implementing a risk-oriented maintenance management system based on individual risk assessment. We touched upon the issue of detaching formed recommendations for increased reliability from the realities of production and business. In the classic PDCA (Plan, Do, Control, Act) reliability management scheme, the transition from conceptual strategy to real planning often occurs without the involvement of a reliability specialist. While a strategic recommendation might be well-thought-out in terms of failure factors, it can overlook things like sales plans and schedules, the categorical undesirability of shutting down certain equipment at specific times, financial limitations, resource and personnel constraints, seasonal work scheduling, and other critical factors. One might argue that a strategy sets certain cycles of specific actions based on risk factors but doesn’t "account" for real timelines and costs. However, this isn’t entirely true. If a recommendation states that something should be done every six months and we start doing it from the moment the strategy is approved, we’ve already planned when (down to the day!) the action should be implemented. We’ve essentially already planned the work! But planning work without considering all of the aforementioned factors is impossible. If the realities of production dictate that the strategy can’t be implemented as written, then the strategy itself needs to be revised!

It’s more accurate to say that a good reliability specialist considers many of these factors when formulating recommendations, but they are not formally documented or included in the "algorithm" for making decisions about a new strategy.

Let’s take a closer look at the factors that should be considered when formulating a maintenance strategy for a specific period (usually the following year):

  • Production schedule: This dictates not only the production schedule but also the periods during which equipment downtime is highly undesirable. It’s no secret that the risk matrix, which includes risks of production shortfalls, is not constant over time. It’s determined on average, without taking into account that the same production shortfall can carry different risks at different times. For example, a boiler’s failure to provide heat in winter and summer has vastly different consequences. Similarly, missed product deliveries to a customer can have different consequences depending on the customer and the terms of the contract. If you factor in maximum consequences for risk, it’s not entirely accurate either. Some equipment might constantly "flash red" even when it’s not operational.
  • Financial constraints: Even a well-justified maintenance strategy (which can also be problematic, as we’ll discuss later) might require funding at certain times, which, while logical, is incredibly untimely! In a battle between production managers and finance teams, finance usually wins.
  • Resource (materials) constraints: Procurement cycles for some spare parts can be quite long, and sometimes unpredictable. No one wants to overstock warehouses with large inventories. Everyone wants to get everything they need "just in time." To achieve this, logistics and its limitations need to be factored in.
  • Personnel constraints: This is straightforward — you can’t plan anything for the same period if you don’t have enough people to carry it out.

All these aspects might not be considered when formulating a strategy and determining equipment downtime solely based on reliability and average risk. However, real-world situations will lead to the norm of shifting repairs from the ideal time for reliability to the time that’s acceptable for business, negating all the efforts of the reliability specialists.

How do we envision resolving these issues?

There are two primary approaches to incorporate the realities of production when formulating a maintenance strategy:

  1. Formalized Competency-Based ApproachAny recommendation for reducing a risk factor should be based as much as possible on the company’s existing competencies, formalized in the form of diagnostic, maintenance, and repair technical cards. The introduction of a new technical card should be a carefully calculated event, considering not only the cost and duration of the intervention but also the costs of implementing the new competency. This approach will reduce risks of overlooking the specifics and costs of the proposed intervention.
  2. Prototyping with Real Production PlansBefore approval, a new maintenance strategy for the entire company should be tested on a prototype of a real production plan for the corresponding period (typically the next year). During the development and approval of strategies for the entire asset pool, a test plan should be conducted in the same way as during "live" business planning.

We discussed the first approach partially in our previous articles and will cover it in more detail in a separate article to be published soon.

With regard to the second approach (test planning), the following is crucial:

• The result of planning should be a schedule of interventions of all types (inspections, diagnostics, maintenance, and repairs) detailed to the day for the planned horizon (1 year or more).

• When determining the specific start date of an intervention, consider:
  • The recommended interval for the intervention, defined in the strategy and tied to a specific event (date of the last such intervention, date of the "new life" of the asset, etc.)
  • Financial, resource, and personnel constraints for each day of the production calendar. These constraints are determined during business planning for the entire company.
  • Constraints on periods of "undesirability" for shutting down technological systems/equipment, as defined during business planning.
  • Seasonality constraints for work, as defined by the technical card.
  • Requirements to minimize the duration of planned downtime, leading to the need to combine interventions on different systems/equipment for the same period.
  • Requirements that prevent work on two or more objects due to the fact that these objects are redundant for each other.

• The planning algorithm should attempt to assign a start date for each intervention, considering the conditions described above. If it’s impossible to do so, the start date should be moved further in time. To understand which object will claim resources first, all objects should be ranked according to their associated risk.

• If it’s impossible to place an intervention within the "strategic" time interval, these objects should be visible for analysis and decision-making: either increase resources or shift the intervention to a "non-reliable" period with a calculated risk for such a shift. The analysis of the generated plan should include the unmitigated risk and the balance between it and the potential costs (increased resources or reduced periods of "undesirability" of interventions) for its elimination.

• If, even after increasing resources and relaxing requirements for undesirable periods, there are systems/equipment that don’t fall within the plan (and therefore recommendations for them cannot be implemented), these objects should be sent for re-analysis and the formulation of a different strategy.

The algorithm described above should also be used during "live" business planning. However, in the case of any changes to the conditions of its formation "at the last moment," the approved maintenance strategy should not be revised. The risk carriers that have fallen out of the planned interventions should be visible to everyone, and this unmitigated risk should be included in the final business plan and accounted for in the expenses in some way.

To reiterate our main point, a risk-oriented maintenance strategy that doesn’t take into account the company’s production and business realities in a specific period might be an "empty exercise." To avoid this, reliability specialists need to take one more step: based on the formulated strategies, create a maintenance plan that takes into account the company’s production and business realities within the planned horizon. If this plan shows that it’s impossible to fully implement the reliability specialists' strategic recommendations, either return and refine the strategies or, with full justification, insist on the decisions and demand adjustments to the conditions — resource base, production plans, etc.