What is a System?

Risk-oriented operation and maintenance of technological equipment dictate an individual approach to risk assessment and the formation of a facility’s strategy. The very concept of individual risk brings us to the need to evaluate it in all the diversity of interrelationships of various factors, both within the facility and outside it. How to correctly isolate the object so as not to "drown" in its interconnections with the outside world, but also not to miss important factors? This article is dedicated to reflections on ways to define the object of analysis — what is the connecting "glue" that unites different pieces of equipment into a system suitable for reliability analysis?
Risk-oriented operation and maintenance management: What should be the Object of Analysis?
As we mentioned in the previous article, everyone involved in asset management, operation, and maintenance could not help but notice the growing interest in risk-oriented approaches to organizing these processes. In that article, we tried to understand the dialectics of the transition from the general to the particular, tried to warn against excessive enthusiasm for an individual approach to reliability analysis and strategy formation at the expense of defining general patterns and methods of risk management.
In particular, we defined the concept of "Reliability Model" as an analytical structure consisting of interconnected definitions of Function, Functional Failures, Mechanisms, Types of Failure / Causes, Consequences, Recommendations, and Strategies. Such a Model should describe all aspects of the reliability of the corresponding Object of analysis. The object of analysis can be a specific piece of equipment (private model), a class / type / type of equipment (general model), and a technological system (aggregated model). And if there is already a more or less common consensus on what a piece of equipment is and what a class / type / type of equipment is, then the concept of System in the practice of reliability analysis is clearly not perceived so unambiguously!
How it was before
If we look into the past, then in the practice of O&M, the concept of the System was not there at all. Well, that is, there were some descriptive structures that included different installations, units, and pieces of equipment in one group according to some features significant for a particular enterprise, but they were purely illustrative in nature. The usual equipment register was built according to other criteria, as a rule, organizational and technical criteria of "belonging" of one object to another, "inclusion" of one in another.

The ultimate object of management was always a specific piece of equipment with its own O&M cycle, failures in this cycle, unscheduled and emergency repairs. One of the main tasks of the O&M planner was to "gather together" all these scattered deadlines in order to simply save on planned shutdowns and repair costs. And, by the way, solving this problem, the planner often guided, without knowing it, risk-oriented approaches to planning! After all, it would be he who would be to blame if something "didn't wait" for the repair that he had shifted in time! Many enterprises were forced to switch to the concept of complex repair, when all individual deadlines were reduced to the simultaneous repair of something large that combines many pieces of equipment according to some criteria. This "something big" is the prototype of the System that we will discuss further.
How it’s changing now
Risk-oriented approaches have changed not only the way O&M is planned. The need to formulate such "new" concepts as Function and Functional Failure required their clear linking to product output, quality, and cost. The type and stage of the product are not important in this case, what is important is that the existence of any piece of equipment makes sense only because it, directly or indirectly, participates in the production of the product, affects its quality and cost!

You will say — there is nothing new here! All this is written in RCM II and other similar methodologies. However, a wrongly chosen object of such an analysis will most likely lead to incorrect risk assessments — overestimation in one case and underestimation in another. For example, failure to take into account the degree of system redundancy and the role of emergency automation can overestimate the risk, while some auxiliary object performing a deeply secondary function and considered in isolation may be underestimated due to the unclear nature of its influence on the same output, quality, and cost of the final product.
How it should be
In our opinion, one should approach the matter of defining the object of subsequent analysis and forming a reliability strategy "from top to bottom", hierarchically. At the first stage, we have every right to consider our entire enterprise as such an object! A complex, multifunctional object, intended (i.e. has a Function) for the production of products of the required volume, quality, and cost. The start of the analysis "from above" begins with the definition of the most important Functions of the enterprise as a whole. Such functions should correlate with the main, "general" KPIs of the enterprise — the volume of output, its cost, the level of rejects, ESG-costs, etc. Function is the achievement of the corresponding KPI! Accordingly, top-level (also "general") Functional failures are events that prevent the achievement of KPIs!

It is clear that stopping at the level of the enterprise as a whole would not be good — too complex an object for analysis. Decomposition into more "simple" sub-objects is necessary, at the end of which there will be O&M objects — pieces of equipment. This can be done in different ways, but in our opinion, the most effective one is to combine equipment based on their direct or indirect participation in a particular stage of the technological process (technological operation), and at such a stage, at the end of which it is possible to clearly indicate its output (final or semi-finished product). The entire technological process is a chain of such stages, each of which, receiving semi-finished products and resources at the input, performs its set of functions for technological processing and transfers its results (products / semi-finished products, resources, waste) to another stage. The division of the enterprise’s target technological process into such stages is usually already well thought out by technologists and plant designers, each stage has its own KPIs, embedded in higher-level KPIs, are a self-contained area of responsibility and can (but not necessarily) continue to work "to the warehouse" even if subsequent stages of production are interrupted.

The totality of equipment involved in each such stage, we call the System in the context of the object of reliability analysis.

Here are a few rules for forming such a System:

  • There must be a clear definition of the "product" produced by the System — semi-finished product of any stage; resource (electricity, heat, air, water, etc.); defects; waste. This will allow you to adequately identify all the Functions that are truly important for the production of this product.
  • The technological process implemented by this System must be unambiguously defined — a method of processing input resources and semi-finished products for this technological process into the output "product". This will allow you to adequately identify Functional failures, as well as their mechanisms and causes.
  • The "product" of any System must be of independent value that can be conditionally "put in the warehouse" (work in progress) or sent to the distribution network (energy, water, etc.). This will allow you to adequately assess the economic damage from a production outage at this stage.
  • Equipment included in the System should only be included in one System. If any piece of equipment "works" for several technological processes (participates in the production of different "products"), then it should be included in a different type of System, providing its "product" to other Systems as an "internal supplier". This will reduce analysis errors and allow you to formulate an effective strategy for such objects — "multi-machine operators"
  • Systems must be "built" into a single technological scheme of the enterprise (directed graph), sequentially transferring input (purchased) resources through the stages of their processing until the release and shipment of finished products, taking into account system redundancy schemes. And in this scheme, you need to "not forget" auxiliary processes — energy, preparation of technological media, transport, waste disposal, etc. This will allow you to adequately take into account the mutual influence of Systems and their reliability on each other.
  • The System should not be too big. A system consisting of hundreds of pieces of equipment is still too complex for a comprehensive analysis. Subsystems should be identified wherever possible (with the same set of rules), even if technologists did not explicitly specify the structure of operations in such a System. It is possible without much stretching to identify an intermediate "product" and form a subsystem around it, especially if there are elements of a parallel process within the framework of a large system. In principle, nothing prevents such detailed decomposition of processes into sub-processes for several more iterations into depth, but here you can overdo it — "chop" a single technological process into too many "simple" elements, which will now be difficult to combine into one.
  • The System can be typical. For example, air preparation systems, water supply, power supply, typical production lines, not unique to the enterprise and by industry, can be standardized. For such a typical system, a generalized analysis can be carried out (a Reliability Model of a typical System can be built), which can then be built into the corresponding places many times — forming a Model for a specific System of this type. This can greatly simplify the analysis and development of strategies.

This decomposition of the enterprise into systems and subsystems will allow us to more effectively understand each system without losing its production context. In addition, we get another view of the company’s assets, another hierarchical model that allows us to "reconcile" at least the operation (since the technological process is visible here), repairmen (since this model is assembled from their O&M objects), АСУ ТП specialists (since the influence of АСУ ТП systems on the technological process and reliability is taken into account here), and business management (since the relationship of their business KPIs with the entire production system is visible here, the so-called "Places of Origin" of KPIs are visible).

The model describing the entire technological process of the enterprise, which is also compiled from specific "low-level" pieces of equipment with their assessment of risks, reliability, current and retrospective technical condition and operating modes — is also an excellent basis for subsequent solving a variety of tasks of simulation modeling, both in the scenario "what if?" and "what is needed to?". And in the case of enterprise informatization, creating a "digital twin" - it is precisely such a model that is most adequate for the purposes of aggregating, displaying and modeling data, analyzing it in a variety of ways.

For reliability analysis tasks, such a model gives us several very important opportunities:

  1. Forming a Hierarchy of Functions — from the most important business functions to the highly specialized technical functions of each piece of equipment. As a consequence of such a hierarchy — forming a Hierarchy of KPIs, also from top to bottom. This will give management at all levels a real "symphony" of goals and objectives, no matter how narrowly and "specifically" they are formulated.
  2. Forming a Hierarchy of Functional Failures, their mechanisms and causes, taking into account the mutual influence of equipment units within the system.
  3. Forming the Consequences of Failures and Matrices of Unmitigated Risks in relation to the actual technological process and the place of the analyzed object in it, the output and consumed "product", the mutual influence of objects on each other, redundancy
  4. Forming Recommendations and Strategies for each piece of equipment, taking into account its inclusion in the System. Solving the problems of grouping influences (a package of influences on the system as a whole) by the timing of their implementation, the types of influences, taking into account the optimization of the timing of planned downtime, the combination / absorption of work, the impossibility or necessity of carrying out work simultaneously, etc.

As a summary of the above — yes, the RCM methodology also recommends performing reliability analysis for systems, but the way such systems are formed is largely left to the discretion of a specific specialist. Our experience shows that such "freedom" is not always good — some rules are needed. Everything described above is our attempt to write down such rules. How systems built according to these rules will help in the subsequent planning of production and O&M (and not only) will be described in our next article "Asset Reliability Management. Strategy or Plan".