Qualtech Systems » Service Practices

10 Field Service Predictions for 2014!

January 21, 2014

by SD

Comments are off

Folks, I want to share with you this interesting article titled “10 Field Service Predictions for 2014” by Joyce Tam of Trimble’s Field Service Management. All I can say is — its about time!

Talk to us, we can help you with most of these things.

When you are putting together your field service strategy, ask yourself this — does your expert technician use luck or logic to solve the most challenging troubleshooting problems? If it is logic, shouldn’t a computer be able to do it better?

When you are building your remote monitoring and M2M strategy, ask yourself this — are you creating a “needle in a haystack” problem by spending all those big bucks collecting tons of data in the hope that it may be useful to somebody somewhere? Who is going to ever analyze it? Don’t collect a haystack – collect only the needles, and deliver smarter service! Check out our M2M demo for more information!

Delivering Uptime: Eliminate Variability, high MTTR and Rework with TEAMS

March 29, 2013

by SD

Comments are off

As the products evolve from mechanical to electronics, they are becoming more reliable, but harder to fix when problems arise. Customers are demanding faster problem resolution times (smaller mean times to repair), self-service capabilities, seamless onboard-off-board diagnosis, and transparent escalation of jobs from help desk to tech assistance centers to field support to engineering. Indeed, it turns out that variability in service and production, which causes congestion (i.e., work-in-process and cycle time inflation), is directly proportional to mean time to repair (MTTR). In addition, revisits and rework increase workload and congestion. TEAMS guided troubleshooting solution helps fix problems right the first time, and minimizes the MTTR with smart troubleshooting logic, thereby reducing variability. Reduction in variability leads to highly capable processes, guaranteed lead times and high levels of consistent service.

Mean Time to Repair and Variability: Variability is anything that causes the system to depart from regular, predictable behavior[1]. Typical sources of variability include machine breakdown, material/parts shortages, rework/revisits, technician skill levels, operator unavailability, transportation, changes in demand, and so on.

Variability increases the mean and variance of effective time to process a task. Coefficient of variation is a normalized measure of variability. Formally, if t_n is the mean of nominal processing time and σ_n is the standard deviation of a task at a machine (sans sources of variability), both measured in hours, then the coefficient of variation c_n = σ_n/ t_n. Evidently, the coefficient of variation is a dimensionless quantity; it is a measure of noise-to-signal ratio.

What are the mean, and coefficient of variation, of effective process time in the presence of machine failures and repairs? Suppose the mean time to failure of a machine is MTTF and its mean time to repair is MTTR. Then, its availability A = MTTF/(MTTF+MTTR). Then, the mean of effective process time, t_effective = t_n/A = t_n (MTTF+MTTR)/MTTF. Let σ_r² be the variance of the time to repair so that its coefficient of variation, c_r= σ_r/ MTTR. The squared coefficient of variation of effective process time, c_e² is given by

C_e² = σ_e²/ t_e² = C_n² + (1+ C_r²) A (1-A) MTTR/ t_n (1)

This equation has interesting maintenance service implications. Larger the MTTR, larger is the squared coefficient of variation. In addition, larger the variability in MTTR as reflected in c_r², larger is the squared coefficient of variation. The effective process time increases with increase in MTTR. Thus, for a given availability, smaller the MTTR (i.e., shorter disruptions), smaller is the variability.

Rework/Revisits and Variability: Rework/revisits are significant sources of variability and lost capacity in manufacturing and service operations. If u is the system utilization without rework and p is the fraction of time rework occurs, then the effective utilization of the system with rework, u_r = u+p. The effects of rework are amplified if it involves a multi-stage manufacturing/service line, and especially if the rework necessitates going back to the beginning of the line. In addition, the coefficient of variation of effective processing time becomes C_e² = C_n² + p (1- C_n²). The increased variability degrades performance by increasing congestion effects (e.g., increased work-in-process and response time), and these effects increase nonlinearly with system utilization, u_r (∝ 1/(1- u_r)). Evidently, rework magnifies congestion effects by decreasing capacity and substantially increasing WIP and response time.

In simple English, incidences of long downtime, even if they are infrequent, are more detrimental to consistent predictable performance of a production line than the more frequent shorter disruptions. As field service organizations move from the traditional break-fix model to one of delivering performance and uptime, they must minimize the high repair time associated with such instances.

What causes the variability in MTTR? For most service organizations, the biggest unknown in the MTTR is the time it takes to identify exactly what has gone wrong. This time, called the troubleshooting time or Mean Time to Diagnose (MTTD), is often dependent on the ability of the service agent to think and infer the root cause. This usually leads to large variation in performance across service agents.

TEAMS and MTTR: Humans are local optimizers, and are biased towards their more recent experiences. Consequently, manual troubleshooting, especially on infrequent and atypical problems, takes substantially longer time and is error prone.

The reasoner in TEAMS guides the service agent with optimized sequence of troubleshooting steps. The reasoner considers all the possible causes of failure, weighs them by probability of occurrence, and takes into account available skills, materials and knowledge, the observed failure symptoms, and user complaints, when optimizing the troubleshooting strategy. This enables all service agents to troubleshoot like an expert, delivering consistent performance and lowest possible MTTD.

Moreover, the reasoner guides the service agent through a systematic process of diagnosing the problem, correcting the problem, and then verifying the correctness of the fix, thereby ensuring that the problem is fixed right the first time.

Typically, the troubleshooting time is reduced by 75-80% over manual troubleshooting methods, and first time success rate of 90% or more is achieved. Progressive service organizations can leverage Remote Diagnosis and Diagnose before dispatch capabilities of the TEAMS solution to further reduce the unproductive service calls, and increased first-time fix rates reduce rework and revisits.

As seen from Eq. (1), the reduction in MTTR has direct impact on reducing variability, and consequently, on cycle time and work-in-process. As stated earlier, reduction in variability leads to highly capable processes, guaranteed lead times and high levels of consistent service; attributes essential to delivering performance and uptime.

[1] Wallace J. Hopp and Mark J. Spearman, Factory Physics, McGraw-Hill, NY, 3^rd edition, 2008.

Lean, Six Sigma & TEAMS Tool set

March 18, 2013

by SD

Comments are off

Quality is the key to economic success of an enterprise, because it increases productivity at little cost and is vital for business growth and enhanced competitive position. Quality is achieved by reducing variability (the Six Sigma principle) and eliminating waste due to defects, waiting, inventory, unnecessary motion, transportation, overproduction, and over-processing (the so-called Lean principle). Cost of fixing quality problems in the field increases exponentially, as evidenced by the recent Boeing 787’s Li-ion battery fire-hazard problems and high recall rates by automotive manufacturers, medical equipment makers, and laptop producers. As faulty design and manufacturing are behind most such problems, early failure analysis and mitigation can substantially reduce warranty costs caused by recalls as well as consequent loss in reputation.

What is Lean: Lean focuses on response time (cycle time) reduction, where cycle time is the value-added and non-value added time in manufacturing a product or in providing services. Lean identifies sources of waste to reduce the non-value added elements. The sources of waste can be classified into the following seven categories:

Waiting: Any non-productive time waiting for parts, tools, supplies, personnel, failed systems to be brought on-line,…
Transportation: wasted effort due to unproductive service calls, transport of materials, parts or finished goods to storage or between processes;
Correction: Repair or rework or revisits;
Inventory: Maintaining excess inventory of parts, raw materials or finished goods.
Motion: Any wasted motion to pick up or stack parts or due to walking;
Overproduction: Manufacturing more than needed before it is needed;
Processing: Doing more work than is necessary.

Little’s theorem links work-in-process (WIP or queue length) with cycle time and throughput. Indeed, larger WIP implies larger cycle times and saturated throughputs (overworked personnel and systems). Lean improves throughput by creating and optimizing smooth operational flows by level loading, reducing setups, linking suppliers, and reducing time and waste.

What is Six Sigma: Six Sigma focuses on reducing variability, thereby improving product/process quality. Variability stems from failures, setups, long infrequent disruptions, synchronization requirements, and many others. Variability causes congestion (i.e., WIP/cycle time inflation), propagates through the system and inflates the seven sources of waste. Reduced variability leads to highly capable processes, guaranteed lead times and high levels of service.

TEAMS Toolset: A number of quality and inflated cycle time problems can be alleviated by verifying design attributes related to fault detectability and diagnosibilty, and system reliability, availability and life cycle cost. Design for Testability (DFT) facilitates such verification capability, and thereby reduces unexpected downtime, maintenance, warranty and logistic costs – leading to a higher degree of customer satisfaction. Diagnostic modeling and analysis capabilities of QSI’s TEAMS toolset enable the designers to perform DFT optimization for remedying deficiencies in the design phase, and service engineers to arrive at rapid operational fixes for deployed systems.

QSI’s TEAMS Toolset features a common model-based ‘systems engineering’ methodology, as well as off-line design and on-line implementation processes, to build highly reliable, dependable, and serviceable systems. The highly acclaimed integrated diagnostic modeling methodology embedded in the TEAMS toolset helps design engineers to

Identify the potential failure modes in complex systems and characterize nominal and faulty behaviors under various modes of operation. This lack of knowledge often leads to long duration disruptions.
Design tests to detect anomalies under varying usage, environmental and operating conditions; lack of detection and diagnosis capability is a major source of variability and throughput reduction.
Perform model-based testability analysis and Failure Modes, Effects and Criticality Analysis (FMECA) a priori during design rather than be surprised by field failures,
Design optimal sensor allocation strategies to maximize fault diagnosability and minimize maintenance and parts inventory costs,
Provide on-line/off-line diagnosis and prognosis schemes that are robust and adaptive to different system configurations and operating conditions for proactive and predictive maintenance lading to high system availability,
Sequence tests to minimize setups and mean time to isolate and repair; these are two major contributors of variability in a production or service system,’
“Diagnose before dispatch” capabilities reduce the unproductive service calls,
First-time fix rates reduce rework and revisits.

Use of the same models and algorithms throughout a system life-cycle ensure consistent specification and analysis of system requirements, rigorous evaluation of design for service trade-offs of system architectures, selection and implementation of the best design, easy verification of design implementation, and post-implementation assessment of how well the product meets the specified requirements. This “build a model once and use it many times” approach enhances communication and coordination among complex system stakeholders, and reduces development risk (cost and schedule) by improving productivity and quality.

How to measure the accuracy of your Guided Troubleshooting Solution?

December 27, 2012

by SD

Field Service

Comments are off

A key benefit of the TEAMS solution is that it helps Field Service Organizations improve the Quality of Service (QoS) while lowering the Cost of Service (CoS).

Without TEAMS, organizations rely on the intelligence of the individual Field Service Engineers (FSEs) to solve complex troubleshooting problems. However, not all the FSEs are equally smart, leading to inconsistent performance that no amount of training can overcome.

The consistent QoS delivered by the TEAMS solution is therefore a key motivation for adopting the TEAMS solution. But, how do you measure and demonstrate a consistent performance when there are so many variables that affect the field service process? More to the point, while variability is to be expected, how much variability is cause for concern?

Fortunately, quality measurement is a mature science, and we need to look no further than “Control Charts for attributes“, originally conceptualized in 1924 by Walter Shewart and extended by Edward Demming.

The underlying theory is very simple.

Supposing you want a consistent success rate of S%, where success implies the Guided Troubleshooting Solution enabled the FSE to identify the root-cause of the failure and apply the appropriate corrective action. How would you periodically verify that you are still achieving S% success rate?

If periodically you sampled N cases, the standard deviation, s, associated with the estimate of S from N measurements is inversely proportional to the sqrt(N). This makes intuitive sense; the more data you have the more precise your estimates of S will be.

Assuming the cases you sampled are independent of each other, the expected number of successes from N trials will be between (S – 3s) and (S + 3s) with 99% confidence. This is basic property of normal distribution, and we will assume that it is applicable to S.

So, what does this all mean? Let’s plugin some values to make sense.

Let’s assume your goal is to have a success rate S = 80%. So, when you have a large enough data set, say thousands of troubleshooting test cases, you will get approximately 80% successful troubleshooting outcomes.

But how many successes should you expect if you just sampled 30, 100, 300 or 1000 test cases? The following table gives you the answer:

Number of Cases	Minimum Number of Successes	Expected Number of Successes	Maximum Number of Successes
10	4	8	10
30	16	24	30
100	67	80	93
300	217	240	263
1000	758	800	842

You could monitor the accuracy of your Guided Troubleshooting Solution by periodically sampling N test cases and counting how many of those are successful. How to interpret these results?

80% success rate does not mean that every 5th case will be unsolved. When you look at small enough data set, expect to see large enough variation, and that is no reason for alarm. Even if you see 5 successes in 10 trials, that still does not prove you are not going to have 80% success rate. Gather more data, maybe you just had a bad day!
Whenever you get successes outside these bounds, do take a closer look. There is still a small chance that there is nothing wrong. But may be there is something else going on. Has your target system changed, and models need to be updated? Is this a new operating condition? Are the FSEs getting confused by the instructions? Whatever it is, it is worth a second look.
Do track the moving average. If your average is falling and you are consistently getting worst case results, chances are you are falling behind on your Success rates. On the other hand, if you are consistently beating the average, may be you are overachieving!
These tests are known to be too insensitive to small variations in the underlying process, and are therefore suited to measuring the performance of a Global Field Service Organization. However, you can add a rule of thumb or two to improve its sensitivity. For example
- If you get two or three back to back data sets that are more than two-thirds of the way to the best or worst case, or four or five back to back data sets that are more than one-thirds away from the expected value, you may want to take a closer look at the underlying data.
- If you get 8 or more data sets that are all above or below the expected value, it may be time to reevaluate S.

Hope this helps you in measuring success in your Field Service Organization. Let me know how it works out.

Design For Service (D4S)

December 07, 2012

by SD

Aerospace, Health Management, Medical

Comments are off

This blog theme is for any questions or points of view that deal with the benefits and challenges of Design For Service (D4S) topics.

Diagnose Before Dispatch (DBD)

December 05, 2012

by SD

Health Management, Medical, Semiconductor

Comments are off

This blog theme is for any questions or points of view that relate to improving your ability to Diagnose the true issue and either solve or dispatch the correct Service Agent to solve the problem.