Analysis of maintenance policies supported by simulation in a flexible manufacturing cell Análisis de políticas de mantenimiento soportada por simulación en una célula de fabricación flexible

The purpose of this article is to analyze three maintenance policies in a flexible manufacturing cell from a mechanical manufacturing company. The policies were opportunistic maintenance, in which all the machines are repaired whenever the cell stops, corrective maintenance, in which only the failed machines are repaired whenever the cell stops, and partial corrective maintenance, in which only the failed subsystem is repaired whenever the cell stops. The research method is quantitative modeling and computer simulation. The cell is part of a flexible manufacturing system (FMS) and includes two robots in parallel and a CNC machine. Previous lifetime data analysis provided reliability functions, using Weibull and exponential models. The evaluation of the policies relied on the number of repairs and the resulting MTBF of the cell. The simulation relied on twenty replications of the three policies to ground the analysis. The study concluded that the two policies, the corrective maintenance, and the partial corrective maintenance are almost equally effective. The opportunistic maintenance policy is less effective than the others. The article also calculated the systemic reliability of the cell.


INTRODUCTION
The classical competitive dimensions in manufacturing are cost, dependability, quality, and flexibility [1]. Other dimensions already cited in the literature can be considered as derivations or variations of the four classical dimensions. Objectives such as spare parts and service, lead-time and on-time deliveries, environmental concerns and reliability, and innovation and design, can be considered sub-dimensions of cost, dependability, quality, and flexibility, respectively. Such sub-dimensions are less relevant in studies that do not directly focus on the process of strategy formulation but are important in analyzing trade-offs in manufacturing strategies [33].
One of the most important trade-offs in manufacturing is the relationships between cost and dependability. Usually, manufacturing achieves at the same time cost reduction and increased dependability by increasing the availability of the most critical resources, that is, those that limit the global throughput or have no substitute in the case of breakdown. The availability is the likelihood that a single machine or an entire production system is available to service when required by a production schedule. In numerical terms, the availability Av is the ratio between the time the equipment or system is available to operate and the total scheduled time. A low availability of critical equipment can lead to delays in deliveries and an increase in costs, hampering the simultaneous achievement of low cost and high dependability [2].
Instant availability Av(t) is the relationship between two functions of the machine's operating time, the reliability, and the maintainability functions. Reliability R(t) is the probability that an equipment or system will work under the design conditions, free from failure, until time t. Maintainability M(t) is the probability that, given a failure, a repair occurs until time t. Av(t) increases if R(t) increases or if M(t) decreases. In short, Av(t) improves by increasing the mean time between failures (MTBF), decreasing the mean time to repair (MTTR), or both [3].
Various maintenance policies can contribute to the achievement of manufacturing objectives. Usually, trade-offs arise when a company aims at the same time cost reduction and increased dependability [4]. The classical strategy for cost reduction in maintenance is to adopt emergency or breakdown maintenance. No maintenance procedure occurs unless a breakdown happens, which implies a production stoppage. This policy reduces the cost of maintenance but also reduces dependability. It is no longer possible to predict when the production machinery will be available, with a significant risk of delay in deliveries by unexpected breakdowns.
One variant of this policy is the so-called corrective maintenance. The breakdown does not necessarily imply a stoppage because there is redundant equipment or inventory enough to load the next machine during the stoppage. Repair does not have to be immediate, and the maintenance crew can postpone it until a more favorable situation [5]. Another variant is the so-called partial corrective maintenance, in which the crew inspects and repairs only the subsystems in a failure state, not all the subsystems as in the corrective maintenance.
The classical strategy for increased dependability is to adopt preventive maintenance. Maintenance interventions occur only on fixed dates, unconditionally, even without failures in the period or machines in a failure state. This policy improves dependability but also increases cost. Dependability increases because the crew repairs on-going failures before their occurrence, avoiding stoppage and assuring scheduled deliveries. However, the intervention may be useless. As failures are random, there is a significant probability that the crew finds no on-going failure to repair. In the preventive maintenance, there are much more part replacements and hiring of labor than in the emergency maintenance policy, with a substantial increase in cost [6].
One variant of this policy is the so-called predictive maintenance. The intervention is conditional and occurs only if a diagnosis detects a fault condition in progress. Another variant is the so-called opportunistic maintenance. Given any kind of stoppage, the maintenance crew inspects all the machines, regardless of the previous failure states [7].
Only the execution of previously defined maintenance policies does not assure the achievement of the other two objectives of a manufacturing strategy: improvement of quality and increase of flexibility. Both objectives rely more on technological implementations than on management policies. One kind of technological implementation that improves quality and at the same time increases the flexibility is the adoption of FMS (Flexible Manufacturing Systems) [8].
An FMS is a manufacturing system that has the ability to provide fast reaction to unexpected changes in the production context. Additionally, FMS's provide a substantial reduction of variability, because the main processing operations no longer depend on human labor, but on advanced technology, namely robotic arms, CNC machines, automatic tool changing facilities, and automated materials handling units. Due to those facilities, FMS's are efficient technological solutions to achieve at the same time strategic results in quality and flexibility requirements [9].
It is possible to conclude that the combination of an FMS implementation and a set of suitable maintenance policies can produce global strategic results in modern manufacturing.
Having this conclusion in mind, the purpose of this study is to analyze policies for the maintenance of an FMS installed in a manufacturing plant of the automotive industry. The policies are opportunistic maintenance, corrective maintenance, and partial corrective maintenance. The outcomes are the MTBF produced and the number of maintenance procedures required by the policies. As further research, the study suggests other policies, like preventive and predictive maintenance, and other outcomes, like mean lead-time of orders. The research method is the simulation.

THEORETICAL ANALYSIS
Maintenance policies are expected to reduce the effects of wear-outs in equipment. However, random failures and breakdowns are still expected to occur. Breakdowns imply stoppages of production and loss of throughput in the manufacturing activity. Failures imply a loss of some function in equipment but do not imply an immediate loss of throughput. Every breakdown comes from a failure, but a failure does not necessarily turn into a breakdown [10].
Given the differences between failure and breakdown, maintenance policies fall into two main types: planned intervention and random intervention.
Planned interventions require a fixed schedule that sets dates and time duration for the intervention. During the procedure, the machine is unavailable. Random (or unplanned) interventions occur at any time, depending on the failures or breakdowns, which occur randomly. As well as the first case, the machine is unavailable during the intervention [11].
Intermediate situations between planned and random interventions can also occur. One of these intermediate situations is the so-called corrective maintenance policy. When manufacturing has some kind of redundancy, the corrective maintenance policy applies. In corrective maintenance, although the failure occurs randomly, the situation does not require immediate intervention, meaning that the maintenance crew can schedule and plan the intervention. Usually, redundancy allows conveniently scheduling of the maintenance intervention within a certain time horizon [12].
Another intermediate situation is the so-called opportunistic maintenance, in which the intervention occurs not according to a schedule but according to opportunities that arise from any kind of stoppage of the manufacturing. Once the machine stops for any reason, the crew inspects and eventually repairs all the equipment, even those in an operational state. Usually, the outcome situation of this policy is of the type "as-good-as-new" equipment. The intervention resets the time to failure random process due to wear-out and a new failure cycle begins [13]. Finally, in the partial corrective maintenance policy, a failure does not imply an automatic repair. Instead, the crew inspects and repairs only the subsystem that caused the interruption. The crew will not inspect a machine of a subsystem that does not cause the interruption, even if this machine is in a failure state. The machine will be served only when the last machine of the subsystem turns into a failure state.
One important tool to help to define maintenance strategy is the lifetime data analysis. Applied to industrial equipment, the term lifetime data refers to the duration of an operation free of failure in certain, defined operation conditions, usually defined in the design phase of the system. Lifetime can be measured in hours, days, kilometers, cycles or any kind of metric that successfully measure the period in which an equipment or system produces free of stoppages caused by breakdowns [14].
Time to failure can be censored or complete. Censored time occurs when the equipment successfully operates until the time t, in which the analysis ended. Therefore, the exact time of the failure ttf remains unknown, because it does not happen yet. The only information is that [ttf > t]. Complete time occurs when the exact time ttf of the failure is available [14]. This study handles only with complete data.
The lifetime data analysis strongly relies on the Weibull distribution, a flexible probability distribution that is able to fit lifetime data originated from various types of random time to failure processes in materials and equipment. Various distributions can be considered particular cases or good approximations of the Weibull distribution. Convenient shape parameters of the Weibull distribution can satisfactorily approximate the normal and the lognormal distributions. When the shape factor equals to 2, the Weibull distribution reduces to the Rayleigh distribution, important to handle fatigue failures caused by wear-out processes. In the particular case in which the shape factor of the Weibull distribution equals 1, the Weibull distribution reduces to the exponential distribution, simpler to handle and used to fit data originated from random failure processes. Therefore, the literature also refers to this kind of analysis as the Weibull analysis [2]. The specific case of the exponential distribution interests for this study.
The Weibull analysis attempts to make predictions about the time to failure process of an equipment by fitting a distribution to lifetime data from a sample of time to failure data. Usually, the analysis estimates parameters by the Maximum Likelihood Estimation (MLE) method. The main goal of the analysis is to define a reliability function R(t), representing the probability that the equipment will operate free of failure until the time t [14]. The three-parameters Weibull distribution has the general form of equation 1.
In which: t 0 = time free of failure; b = shape factor; h = scale factor.
The Weibull distribution is also helpful in using the bath-tube curve, an abstract construction that plots the rate of failure function along the life cycle of the equipment [15]. According to the shape factor of the distribution, an equipment can be found in the infant mortality (or early failures) phase (b < 1), random failure (or useful life) phase (b = 1), or wear-out phase (b > 1) [2].
Weibull analysis consist of (i) gathering lifetime data for the equipment, usually from maintenance information and management systems; (ii) fitting a lifetime distribution and estimate the parameters by MLE, usually supported by a suitable software package; (iii) operating with the distribution to estimate required lifetime characteristics of the equipment [14,2].
The study requires a connection between maintenance policies and FMS's throughput. An FMS is usually composed of CNC machines, automatic tool storage, changing facilities, robots, and automated materials handling units, controlled by computers and interconnected by local industrial networks. In short, FMS is an arrangement of automated machines interconnected by an automated material conveying, handling, and inspection system, controlled by a central computer [16][17]. An FMS can also be described as an automated manufacturing cell, which includes processing workstations interconnected by an automated material handling and storage system [18].
Maintenance policies in FMS should take into account some features of the equipment. Usually, the potential benefits of FMS are sensitive to maintenance policies [19]. FMSs typically operate at 70-80% utilization. This utilization is much higher than that of other machines, which operate at substantially lower rates. Such high utilization causes FMSs to suffer more wear-out than other types of production equipment. In short, FMSs usually operate with increasing failure rate due to accelerated usage and more intense wearouts of equipment [20]. During the useful life, FMSs experience more wear-out than other machines working over the same period. Therefore, maintenance policy on FMSs has a greater impact on production than on another type of equipment [21].

METHODOLOGY
The research object is a flexible manufacturing cell (FMC) installed in a Brazilian company of the automotive industry. An FMC is a subset of an FMS installation. An FMC cannot fully meet the need for manufacturing but takes part in the process. Given the high investment for the facility, many times, FMCs turns into production bottlenecks, which increases the need for a maintenance policy, aligned with manufacturing goals. Two robotic arms (RB1 and RB2) for load and unload materials, and a processing CNC machine compose the studied FMC. The robotic arms can both load or unload the CNC. Therefore, RB1 and RB2 configure a parallel subsystem, in series with the subsystem CNC.
The research method is: (i) previous lifetime data analysis provided individual reliability functions for the three equipment; (ii) supported by a random generator, applications of the Monte Carlo method provided sets of random lifetimes for the three equipment; (iii) starting with the simulated lifetime data, simulation of three maintenance policies, the opportunistic maintenance, in which given a production stop, all the equipment is repaired; corrective maintenance, in which, given a stoppage, only the failed equipment is repaired; and partial corrective maintenance, in which, given a stoppage, only the subsystem that failed is repaired; and (iv) comparative analysis of results by the intervention number and the resultant MTBF for each policy. Figure 1 shows the reliability block diagram (RBD) of the FMC and the two subsystems RBx and CNC. In Figure 1, it is possible to observe the parallel configuration of RB1 and RB2, forming the subsystem RBx, in serial connection with the subsystem CNC, composed of a single machine.

RESULTS
A previous lifetime data analysis based on historical data used MLE (Maximum Likelihood Estimation) to estimate the parameters of the functions. RB1 and RB2 have both increasing failure rates. Weibull model fitted the two sets of lifetime data, which indicates wear-outs for both robots. Weibull model, as well as an exponential model, fitted CNC lifetime data, which indicates a random failure pattern. For simplicity, we chose the exponential model.
The Weibull analysis used data from twenty consecutive times to failure processes. The presentation of these data and the maximum likelihood modeling process that produced the models of the individual reliability functions are of less importance for the purpose of this study. Therefore, the study omits this part, focusing more attention on the following analysis. The analysis starts with the achieved models for the individual reliability functions of the equipment. Equations (2), (3) and (4) give the reliability functions of RB1, RB2, and CNC, respectively.
Considering the bathtub curve [15], RB1 and RB2 have an increasing failure rate, which implies that both are in the wear-out failures zone [22]. CNC has a constant failure rate, which suggests that it is still in the random failures phase [23]. Equation 5 gives the systemic reliability of the FMC. Equation 6 gives the MTBF for the systemic reliability function R(t).
Considering equation (5), Figure 2 shows the systemic reliability function of the FMC.  (6) to Weibull and exponential models results equations (7) and (8) for the MTBF.
The study uses simulated data and Monte Carlo method (MCM) for the analysis. MCM relies on random streams to heuristically calculate the probabilities by simulated successive repetitions [24]. This type of method is helpful in dealing with problems related to maintenance policies [25].
Although essential in certain areas as cryptology, getting a sample of truly random numbers is unlikely. Furthermore, truly random numbers are not reproducible. In areas such as engineering applications, pseudo-random numbers, produced by deterministic algorithms, are sufficient [26]. A pseudo-random number generator (PRNG) should produce samples uniformly distributed and statistically independent. The generator shall not degenerate with repetition and be able to repeat sequences [27].
The study uses the PRNG of Excel 2013. Despite the previous criticisms [28], since the 2010 version, the PRNG is much more satisfactory than in the earlier versions [29][30]. It proved to be at least similar to Gnumeric, which is recognized as appropriate for scientific applications [31]. This study included successfully performed regular tests in the random streams to assure uniform distribution (goodnessof-fit) and statistical independence (autocorrelation) in the employed time series.
As suggested in [31], the simulation uses the RAND() function, instead of the Data Analysis Toolpak PRNG. To assure repeatability, we chose a stream whose mean value does not differ from the MTBF calculated from the previously known reliability functions. Equations (7) and (8) are helpful to check the accuracy of the simulated lifetime data. Table 1 compares MTBF calculated by the models and MTBF of the simulated lifetime data. The next step is to apply the three maintenance policies and observe the simulated results. The application needs three simulated sets of TBF (time between failures), one for each equipment. With a random generator, and by Monte Carlo method, we simulated the inverse of the given reliability functions and provided three series of twenty random TBF. To define the number of replications we used a pilot case and an iterative procedure [31], starting with five replications and recording the resulting 95% confidence interval. The next iteration increases the number of replications by five until the confidence interval does not vary by more than 5%. A pilot simulation with twenty replications differed by 1% from the previous one. Therefore, the study adopted twenty replications for the simulated streams.
The first policy is corrective maintenance. Table 2 shows the simulated result of the corrective maintenance policy.
In the corrective maintenance policy, every time production of the cell stops, the crew inspects and repairs only the machines that failed. Table 3 assigns the repairs by the TBF in bold. For example, in the first row, the general TBF is 33.09, which belongs to RB1. Since RB1 and RB2 operate in parallel,  In the opportunistic maintenance policy, every time the production stops, the crew inspects and eventually repairs all the machines. The systemic TBF S is the minimum value between the maximum value between [TBF RB1; TBF RB2] and [TBF CNC]. The crew exchanges any broken pieces and reinforces the weak points of the machines. The failure process resets, and a new run begins for all the machines.
Over the twenty runs of the random process to failure, n = sixty repairs were performed. The resulting MTBF was 26.57 hours, which is close to the calculated MTBF of 27.15 hours. The standard deviation of the twenty TBF is TBF =20.73 hours. The cumulative total time of operation for the twenty executions was Tacc = 531.37 hours. The twenty TBF fits an exponential model with l = 0.037. For the comparison with other policies, the outcome of the opportunistic maintenance policy is: after sixty machine repairs, MTBF = 26.57 hours. the TBB of RB2 is necessarily smaller than that of RB1, which means that RB2 was already in the fault state when RB1 failed. Therefore, after the repair, the random failure processes of RB1 and RB2 have cleared and started the counting of new TBFs. The crew exchanges the broken pieces and reinforces the weak points.
Over the twenty runs of the random process to failure, n = thirty-five repairs were performed. The next policy is opportunistic maintenance. Table 3 shows the simulated result of the opportunistic maintenance policy.
The last policy is the partial corrective maintenance. Table 4 shows the simulated result of the partial corrective maintenance policy.
In the partial corrective maintenance policy, every time the production stops, the crew inspects and repairs only the subsystem that failed. Table 4 assigns the repairs by the TBF in bold. Machines in failure state that belong to a subsystem that did not fail yet, remain in the failure state, without repair. Table 4 assigns FS (failure state) to this situation.
The repair will occur only when the entire subsystem collapses. For example, in the fifth row, the general TBF is 51.37, which belongs to CNC. Although RB2 already failed (TBF = 46.35), the subsystem formed by RB1 and RB2 still did not fail. Therefore, the repair occurs only when both RB1 and RB2 fail, i.e., when the subsystem collapses.
Over the twenty runs of the random process to failure, n = thirty repairs were performed, resulting in an MTBF of 21.49 hours. The standard deviation of the twenty TBF was 17.32 hours. The cumulative total time of operation for the twenty executions was Tacc = 429.88 hours. The twenty TBF fits an exponential model with l = 0.049. For the comparison with other policies, the output of the partial corrective maintenance policy is: after thirty machine repairs, MTBF = 21.49 hours.

CONCLUSION
None of the three policies aims to modify technological aspects of equipment or the manufacturing process. Therefore, in the long run, the type of failure and the absolute number of failures are the same in all scenarios, implying that the cost of labor and materials is the same for the three policies. The change that matters is the downtime that each policy causes, which influences the cost and the dependability of the manufacturing. This study focuses on these competitive dimensions. Moreover, the previous argument implies that the cost of the production lost by a stoppage is the same, regardless of the subsystems that provoked the stoppage. Having in mind this argument, the study also assumes that the mean time to restore (MTTR), i.e., the average time that the system requires to resume the full rate of production is the same for all kind of intervention and equals one hour.
The implication of the rationale is that the outcomes more suitable to evaluate the performance of the policies are the number of interruptions n (number of times that the system stops) and the mean availability Av of the entire system. The quotient between MTBF and (MTBF + MTTR) provides the mean availability Av [2]. Both variables appraise the capacity of the policy to influence the production cost: the higher the number of interventions and the lower the availability, the lower the machine efficiency and the higher the production cost. Table 5 shows the outcomes of the simulation. The Table also indicates the unitary amount of MTBF for each repair.
It is worthwhile to observe that the three maintenance policies establish a non-linear relationship between the number of interruptions n, the resultant MTBF No policy has the best performance in the two criteria (n and Av). However, the opportunistic policy can be dropped out of the analysis, as it has a worse performance than that of the corrective policy in both criteria. Therefore, the final decision involves only partial corrective and corrective policies.
In the partial corrective maintenance (PCM) policy, n = 30 and Av = 95.55%, whereas in the corrective policy (CP) n = 35 and Av = 96.59%. Transforming the differences in downtime hours (MTTR = 1 hour), the smaller number of interruptions benefits PCM in five hours (35-30 interruptions) but the smaller availability harms PCM in 1 percentage point (96.6%-95.6%) that represents 4.3 hours (0.01 x 429.88 hours). The balance is 0.7 hours in favor of PCM, which is an almost irrelevant difference. Although increased MTBF is a desirable effect, it is an intermediate, not an ultimate objective and is less important than the number of interruptions and the availability.
Therefore, under the assumed circumstances, the only conclusive assertive is that the opportunistic maintenance policy is not suitable for the FMC. The other policies, corrective and partial corrective maintenance, produce results that are essentially equivalent in cost reduction for the manufacturing, which usually is an objective of a manufacturing strategy. The corrective and partial corrective maintenance policies can also influence the other competitive dimensions. Both can improve the quality of the final product as partial or total correction can prevent deviations in crucial parameters and therefore reduce the variability. The policies also implement flexibility aspects in the equipment by corrective actions. However, both policies do not contribute to the dependability, as the policies do not anticipate, but manage the failures and their effects. If the reliability in deliveries is important in the industry, the company must handle a trade-off. The company may wish to anticipate the production to fully meet the deadlines but must handle an increment in the operational cost [32].
The study opens room to further research that should include the specific cost of intervention and stoppage, the efficiency in preventing failures of a predictive policy, and the calculation of the optimum interval between preventive interventions. Therefore, the study can simulate the five known maintenance policies for the FMC and choose the better one, under the assumed premises. Further research shall also include the effect of technological improvements in the other competitive dimensions of the manufacturing, the flexibility, and the quality, in order to embrace the most important issues involved in the production strategy of a company based on manufacturing activities.