Delivering Uptime: Eliminate Variability, high MTTR and Rework with TEAMS

March 29, 2013

by SD

Comments are off

As the products evolve from mechanical to electronics, they are becoming more reliable, but harder to fix when problems arise. Customers are demanding faster problem resolution times (smaller mean times to repair), self-service capabilities, seamless onboard-off-board diagnosis, and transparent escalation of jobs from help desk to tech assistance centers to field support to engineering. Indeed, it turns out that variability in service and production, which causes congestion (i.e., work-in-process and cycle time inflation), is directly proportional to mean time to repair (MTTR). In addition, revisits and rework increase workload and congestion. TEAMS guided troubleshooting solution helps fix problems right the first time, and minimizes the MTTR with smart troubleshooting logic, thereby reducing variability. Reduction in variability leads to highly capable processes, guaranteed lead times and high levels of consistent service.

Mean Time to Repair and Variability: Variability is anything that causes the system to depart from regular, predictable behavior[1]. Typical sources of variability include machine breakdown, material/parts shortages, rework/revisits, technician skill levels, operator unavailability, transportation, changes in demand, and so on.

Variability increases the mean and variance of effective time to process a task. Coefficient of variation is a normalized measure of variability. Formally, if t_n is the mean of nominal processing time and σ_n is the standard deviation of a task at a machine (sans sources of variability), both measured in hours, then the coefficient of variation c_n = σ_n/ t_n. Evidently, the coefficient of variation is a dimensionless quantity; it is a measure of noise-to-signal ratio.

What are the mean, and coefficient of variation, of effective process time in the presence of machine failures and repairs? Suppose the mean time to failure of a machine is MTTF and its mean time to repair is MTTR. Then, its availability A = MTTF/(MTTF+MTTR). Then, the mean of effective process time, t_effective = t_n/A = t_n (MTTF+MTTR)/MTTF. Let σ_r² be the variance of the time to repair so that its coefficient of variation, c_r= σ_r/ MTTR. The squared coefficient of variation of effective process time, c_e² is given by

C_e² = σ_e²/ t_e² = C_n² + (1+ C_r²) A (1-A) MTTR/ t_n (1)

This equation has interesting maintenance service implications. Larger the MTTR, larger is the squared coefficient of variation. In addition, larger the variability in MTTR as reflected in c_r², larger is the squared coefficient of variation. The effective process time increases with increase in MTTR. Thus, for a given availability, smaller the MTTR (i.e., shorter disruptions), smaller is the variability.

Rework/Revisits and Variability: Rework/revisits are significant sources of variability and lost capacity in manufacturing and service operations. If u is the system utilization without rework and p is the fraction of time rework occurs, then the effective utilization of the system with rework, u_r = u+p. The effects of rework are amplified if it involves a multi-stage manufacturing/service line, and especially if the rework necessitates going back to the beginning of the line. In addition, the coefficient of variation of effective processing time becomes C_e² = C_n² + p (1- C_n²). The increased variability degrades performance by increasing congestion effects (e.g., increased work-in-process and response time), and these effects increase nonlinearly with system utilization, u_r (∝ 1/(1- u_r)). Evidently, rework magnifies congestion effects by decreasing capacity and substantially increasing WIP and response time.

In simple English, incidences of long downtime, even if they are infrequent, are more detrimental to consistent predictable performance of a production line than the more frequent shorter disruptions. As field service organizations move from the traditional break-fix model to one of delivering performance and uptime, they must minimize the high repair time associated with such instances.

What causes the variability in MTTR? For most service organizations, the biggest unknown in the MTTR is the time it takes to identify exactly what has gone wrong. This time, called the troubleshooting time or Mean Time to Diagnose (MTTD), is often dependent on the ability of the service agent to think and infer the root cause. This usually leads to large variation in performance across service agents.

TEAMS and MTTR: Humans are local optimizers, and are biased towards their more recent experiences. Consequently, manual troubleshooting, especially on infrequent and atypical problems, takes substantially longer time and is error prone.

The reasoner in TEAMS guides the service agent with optimized sequence of troubleshooting steps. The reasoner considers all the possible causes of failure, weighs them by probability of occurrence, and takes into account available skills, materials and knowledge, the observed failure symptoms, and user complaints, when optimizing the troubleshooting strategy. This enables all service agents to troubleshoot like an expert, delivering consistent performance and lowest possible MTTD.

Moreover, the reasoner guides the service agent through a systematic process of diagnosing the problem, correcting the problem, and then verifying the correctness of the fix, thereby ensuring that the problem is fixed right the first time.

Typically, the troubleshooting time is reduced by 75-80% over manual troubleshooting methods, and first time success rate of 90% or more is achieved. Progressive service organizations can leverage Remote Diagnosis and Diagnose before dispatch capabilities of the TEAMS solution to further reduce the unproductive service calls, and increased first-time fix rates reduce rework and revisits.

As seen from Eq. (1), the reduction in MTTR has direct impact on reducing variability, and consequently, on cycle time and work-in-process. As stated earlier, reduction in variability leads to highly capable processes, guaranteed lead times and high levels of consistent service; attributes essential to delivering performance and uptime.

[1] Wallace J. Hopp and Mark J. Spearman, Factory Physics, McGraw-Hill, NY, 3^rd edition, 2008.

About the Author

Somnath Deb, Ph.D., is founder, President and CTO of QSI, and a recognized expert in the field of diagnostics and reasoner technology. His passion is to help field service organizations of High-tech equipment manufacturers improve their Quality of Service (QoS) while lowering their Cost of Service (CoS) using QSI's products portfolio.

Delivering Uptime: Eliminate Variability, high MTTR and Rework with TEAMS

Support