- About Us
Folks, I want to share with you this interesting article titled “10 Field Service Predictions for 2014” by Joyce Tam of Trimble’s Field Service Management. All I can say is — its about time!
Talk to us, we can help you with most of these things.
When you are putting together your field service strategy, ask yourself this — does your expert technician use luck or logic to solve the most challenging troubleshooting problems? If it is logic, shouldn’t a computer be able to do it better?
When you are building your remote monitoring and M2M strategy, ask yourself this — are you creating a “needle in a haystack” problem by spending all those big bucks collecting tons of data in the hope that it may be useful to somebody somewhere? Who is going to ever analyze it? Don’t collect a haystack – collect only the needles, and deliver smarter service! Check out our M2M demo for more information!
I am continually amazed at how modeling is made into a task much more complex than it needs to be! Modelers often get enamored by the power of the TEAMS analysis tool, or, obsessed with the intricacies of their equipment, and get lost in the details. You do not need to be Einstein to model, just follow his words — “Make things as simple as possible, but not simpler.”
In this blog, I want to walk you through a simple modeling approach, where you match the depth of the model to the quality and quantity of information available, and the results you want to get out of it.
This model was being developed for the field service organization entrusted with the support and maintenance of a sophisticated Bio-Medical instrument. So, the purpose of this model is to capture enough information to enable the reasoner to effectively utilize all available observables (error codes, status LEDs) to narrow down the possible list of suspects, and then add just enough manual troubleshooting steps to determine the root cause so that service agents can fix the problem right the first time.
We start by identifying the error codes from the service manuals of the system. Why start with these? Because they are continuously monitoring and can automatically detect any malfunctions. Many instruments are capable of uploading these error codes back to field service center, or, often the operator will call in with the error codes displayed on the instrument; so these are usually the first available information from the instruments.
For the reasoner to be able to interpret the error code data, we need to provide it with the list of possible causes (or faults) that would trigger each of these error codes. Many OEMs document these relationships rather well in their service manuals; while others may need to dig deeper into engineering knowledge to identify them. Once the list is available, it is a simple task in TEAMS to lay in the FRUs and their interconnections, identify the Error Codes, and capture their relationships.
Most error codes are broad functional tests, and therefore, each error code can be triggered by multiple faults in multiple FRUs. Traditionally, field service organizations have disregarded such Error codes as they do not provide a specific root cause. However, the reasoner in TEAMS can look across all Error Codes, and use the presence and absence of Error Codes to rule in or rule out possible faults, to significantly reduce the number of possible causes. In fact, we can run an analysis in TEAMS to quantify how often the reasoner will be able to identify the root cause using just the Error codes! For those situations, no further manual troubleshooting will be necessary – the reasoner will be able to pin-point the root cause — and therefore no further manual test procedures need to be modeled.
However, more than likely, there will be a significant percentage of cases where additional information is required to identify the root cause. So now we look for additional observables that are also low effort and readily available. An example would be the status LEDs which may provide additional information not provided by the Error Codes. These status LEDs can also be modeled similar to the Error Codes, and the model updated with these new sources of observation.
Next, we run the TEAMS analysis again, and identify the remaining faults that cannot be isolated uniquely. TEAMS identifies these groups of faults, called the ambiguity groups, where each member of the group cause the same exact combination of Error codes and status LED states, and are therefore indistinguishable from each other. A field service agent would need to troubleshoot further to identify the root cause from among this ambiguity group. The modeling effort, from this point on, only needs to identify the manual procedures that will help differentiate between the faults of an ambiguity group — if a group has N members, one would need to model anywhere from (N – 1) to ln (N) troubleshooting steps to achieve full fault isolation. This focused approach to capturing manual procedures that are only necessary for troubleshooting after the reasoner has utilized all available information to limit the possible set of suspects to minimum, not only reduces the effort in modeling, but also helps the subject matter experts helping in the modeling to only what is important to improve the serviceability of the instrument.
Once the desired level of fault isolation capability is achieved (remember, sometimes it is cheaper to replace a couple of FRUs than to research which might be the real culprit), we can augment the model with better instructions and illustrations on some of the more complex troubleshooting steps. We can link reference material, images, repair instructions, etc. to the manual procedures to help less experienced service agents perform complex tasks. We can identify the skill or certification required for some of the steps, and the reasoner can determine to what degree a problem can be resolved based on the service agent’s skills, and when to transfer the problem to an agent with higher level of certifications. The important thing is to do all of those only for the procedures that are actually necessary for troubleshooting, and not get bogged down in the tedious and long troubleshooting sequences service agents have to undertake because they lack the ability to extract most information from Error codes and status LEDs.
The best model for the job is the simplest model that gets the job done!
If you have any questions or need further clarification, please send me your comments and I will be happy to respond.
As the products evolve from mechanical to electronics, they are becoming more reliable, but harder to fix when problems arise. Customers are demanding faster problem resolution times (smaller mean times to repair), self-service capabilities, seamless onboard-off-board diagnosis, and transparent escalation of jobs from help desk to tech assistance centers to field support to engineering. Indeed, it turns out that variability in service and production, which causes congestion (i.e., work-in-process and cycle time inflation), is directly proportional to mean time to repair (MTTR). In addition, revisits and rework increase workload and congestion. TEAMS guided troubleshooting solution helps fix problems right the first time, and minimizes the MTTR with smart troubleshooting logic, thereby reducing variability. Reduction in variability leads to highly capable processes, guaranteed lead times and high levels of consistent service.
Mean Time to Repair and Variability: Variability is anything that causes the system to depart from regular, predictable behavior. Typical sources of variability include machine breakdown, material/parts shortages, rework/revisits, technician skill levels, operator unavailability, transportation, changes in demand, and so on.
Variability increases the mean and variance of effective time to process a task. Coefficient of variation is a normalized measure of variability. Formally, if tn is the mean of nominal processing time and σn is the standard deviation of a task at a machine (sans sources of variability), both measured in hours, then the coefficient of variation cn = σn/ tn. Evidently, the coefficient of variation is a dimensionless quantity; it is a measure of noise-to-signal ratio.
What are the mean, and coefficient of variation, of effective process time in the presence of machine failures and repairs? Suppose the mean time to failure of a machine is MTTF and its mean time to repair is MTTR. Then, its availability A = MTTF/(MTTF+MTTR). Then, the mean of effective process time, teffective = tn/A = tn (MTTF+MTTR)/MTTF. Let σr2 be the variance of the time to repair so that its coefficient of variation, cr = σr/ MTTR. The squared coefficient of variation of effective process time, ce2 is given by
Ce2 = σe2/ te2 = Cn2 + (1+ Cr2) A (1-A) MTTR/ tn (1)
This equation has interesting maintenance service implications. Larger the MTTR, larger is the squared coefficient of variation. In addition, larger the variability in MTTR as reflected in cr2, larger is the squared coefficient of variation. The effective process time increases with increase in MTTR. Thus, for a given availability, smaller the MTTR (i.e., shorter disruptions), smaller is the variability.
Rework/Revisits and Variability: Rework/revisits are significant sources of variability and lost capacity in manufacturing and service operations. If u is the system utilization without rework and p is the fraction of time rework occurs, then the effective utilization of the system with rework, ur = u+p. The effects of rework are amplified if it involves a multi-stage manufacturing/service line, and especially if the rework necessitates going back to the beginning of the line. In addition, the coefficient of variation of effective processing time becomes Ce2 = Cn2 + p (1- Cn2). The increased variability degrades performance by increasing congestion effects (e.g., increased work-in-process and response time), and these effects increase nonlinearly with system utilization, ur (∝ 1/(1- ur)). Evidently, rework magnifies congestion effects by decreasing capacity and substantially increasing WIP and response time.
What causes the variability in MTTR? For most service organizations, the biggest unknown in the MTTR is the time it takes to identify exactly what has gone wrong. This time, called the troubleshooting time or Mean Time to Diagnose (MTTD), is often dependent on the ability of the service agent to think and infer the root cause. This usually leads to large variation in performance across service agents.
TEAMS and MTTR: Humans are local optimizers, and are biased towards their more recent experiences. Consequently, manual troubleshooting, especially on infrequent and atypical problems, takes substantially longer time and is error prone.
The reasoner in TEAMS guides the service agent with optimized sequence of troubleshooting steps. The reasoner considers all the possible causes of failure, weighs them by probability of occurrence, and takes into account available skills, materials and knowledge, the observed failure symptoms, and user complaints, when optimizing the troubleshooting strategy. This enables all service agents to troubleshoot like an expert, delivering consistent performance and lowest possible MTTD.
Moreover, the reasoner guides the service agent through a systematic process of diagnosing the problem, correcting the problem, and then verifying the correctness of the fix, thereby ensuring that the problem is fixed right the first time.
Typically, the troubleshooting time is reduced by 75-80% over manual troubleshooting methods, and first time success rate of 90% or more is achieved. Progressive service organizations can leverage Remote Diagnosis and Diagnose before dispatch capabilities of the TEAMS solution to further reduce the unproductive service calls, and increased first-time fix rates reduce rework and revisits.
As seen from Eq. (1), the reduction in MTTR has direct impact on reducing variability, and consequently, on cycle time and work-in-process. As stated earlier, reduction in variability leads to highly capable processes, guaranteed lead times and high levels of consistent service; attributes essential to delivering performance and uptime.
Quality is the key to economic success of an enterprise, because it increases productivity at little cost and is vital for business growth and enhanced competitive position. Quality is achieved by reducing variability (the Six Sigma principle) and eliminating waste due to defects, waiting, inventory, unnecessary motion, transportation, overproduction, and over-processing (the so-called Lean principle). Cost of fixing quality problems in the field increases exponentially, as evidenced by the recent Boeing 787’s Li-ion battery fire-hazard problems and high recall rates by automotive manufacturers, medical equipment makers, and laptop producers. As faulty design and manufacturing are behind most such problems, early failure analysis and mitigation can substantially reduce warranty costs caused by recalls as well as consequent loss in reputation.
What is Lean: Lean focuses on response time (cycle time) reduction, where cycle time is the value-added and non-value added time in manufacturing a product or in providing services. Lean identifies sources of waste to reduce the non-value added elements. The sources of waste can be classified into the following seven categories:
Little’s theorem links work-in-process (WIP or queue length) with cycle time and throughput. Indeed, larger WIP implies larger cycle times and saturated throughputs (overworked personnel and systems). Lean improves throughput by creating and optimizing smooth operational flows by level loading, reducing setups, linking suppliers, and reducing time and waste.
What is Six Sigma: Six Sigma focuses on reducing variability, thereby improving product/process quality. Variability stems from failures, setups, long infrequent disruptions, synchronization requirements, and many others. Variability causes congestion (i.e., WIP/cycle time inflation), propagates through the system and inflates the seven sources of waste. Reduced variability leads to highly capable processes, guaranteed lead times and high levels of service.
TEAMS Toolset: A number of quality and inflated cycle time problems can be alleviated by verifying design attributes related to fault detectability and diagnosibilty, and system reliability, availability and life cycle cost. Design for Testability (DFT) facilitates such verification capability, and thereby reduces unexpected downtime, maintenance, warranty and logistic costs – leading to a higher degree of customer satisfaction. Diagnostic modeling and analysis capabilities of QSI’s TEAMS toolset enable the designers to perform DFT optimization for remedying deficiencies in the design phase, and service engineers to arrive at rapid operational fixes for deployed systems.
QSI’s TEAMS Toolset features a common model-based ‘systems engineering’ methodology, as well as off-line design and on-line implementation processes, to build highly reliable, dependable, and serviceable systems. The highly acclaimed integrated diagnostic modeling methodology embedded in the TEAMS toolset helps design engineers to
Use of the same models and algorithms throughout a system life-cycle ensure consistent specification and analysis of system requirements, rigorous evaluation of design for service trade-offs of system architectures, selection and implementation of the best design, easy verification of design implementation, and post-implementation assessment of how well the product meets the specified requirements. This “build a model once and use it many times” approach enhances communication and coordination among complex system stakeholders, and reduces development risk (cost and schedule) by improving productivity and quality.
Lean times warrant more cost effective solutions. Over the past few years government and private agencies groping with the looming budget cuts have begun charting plans to align spending with the reduced funding. These cost-cutting measures are impacting government and private businesses in the form of scale-backs on the number of new contracts, program stretch-outs, and cuts in funding levels available for procurement of new equipment.
The fact that funding vehicles being restructured and re-purposed towards maximizing the utilization of current capabilities has led the industry to put increased focus on extending the life of their current equipment. Several defense journals and newsweeklies cite the increased impetus on maintenance and sustainment of existing fleet. Aviation Week article (dt. Sept 25, 2012) cites the U.S. Air Force as “… pushing to more than double the life of its stalwart F-15 Eagles…” and “…delay fleet retirements…” while Defense News (dt. Aug 31, 2012) mentions the same military arm is planning F-16’s modifications to “…extend the life and upgrade more than 300 jets in the coming years…”. Stripes (dt. Apr 26, 2012) talks about Congressional momentum to “…extend the service life of the Navy’s nuclear ballistic missile submarines…”. Defense News (dt. May 31, 2011) also says this about US Navy and service lives of ships “…Revised U.S. Fleet Plan Extends Some Ships…”.
This has intensified the spotlight on MRO operations. Over the recent years the MRO markets have scaled up and the MRO landscape continues to expand ever so rapidly.
The development of IT in the MRO sector has evolved into a vital part of fleet operations. Many operators have begun updating/upgrading their IT infrastructure software, into a more capable and powerful tool for managing maintenance costs. Current proposed solutions include “…software upgrade that is focused on the health management…” (Aviation Week, dt. Nov 5, 2012) and “…integrating (more storage-capable) software into existing hardware on newer airplanes…” (Defense News, dt. Dec 31, 2012).
QSI has long been a provider of niche software solution which is a perfect fit for MRO IT infrastructure. QSI’s software interfaces are designed to work with existing legacy architecture, and at the same time can be integrated within enterprise systems. For the past two decades the TEAMS Tool-set has been an integral part of industry forecasting, maintenance-planning and scheduling processes thereby improving fleet reliability. QSI provides a suite of reliability-centered maintenance management products designed to eliminate mechanic research time, minimize excess inventory on hand, and increase service levels.
Learn more about QSI’s Integrated Diagnostics philisophy and the legacy of 20 years of providing cutting-edge fleet health management solutions:
We believe that a partnership with QSI will go a long way in reducing supplier-side working capital costs and creating customizable service packages driven by innovative solutions. Let us be a principal factor driving your business strategies in this rapidly evolving world.
We have been building TEAMS-Designer and TEAMS-RDS in both 32bit and 64bit versions. However, for most users, we recommend the 32bit version. This is based on the following considerations:
See the following table for our performance test results for different models of different sizes.
|Model Size||Testability Analysis Memory Usage||GUI/Reachability Memory Usage||Testability Analysis Runtime|
|10,000||293MB||288MB||1 min, 44s|
|14,000||280MB||301MB||4 min, 4s|
|20,000||1.88GB||311MB||8 min, 11s|
|39,000||1.79GB||1.29GB||19 min, 1s|
A key benefit of the TEAMS solution is that it helps Field Service Organizations improve the Quality of Service (QoS) while lowering the Cost of Service (CoS).
Without TEAMS, organizations rely on the intelligence of the individual Field Service Engineers (FSEs) to solve complex troubleshooting problems. However, not all the FSEs are equally smart, leading to inconsistent performance that no amount of training can overcome.
The consistent QoS delivered by the TEAMS solution is therefore a key motivation for adopting the TEAMS solution. But, how do you measure and demonstrate a consistent performance when there are so many variables that affect the field service process? More to the point, while variability is to be expected, how much variability is cause for concern?
Fortunately, quality measurement is a mature science, and we need to look no further than “Control Charts for attributes“, originally conceptualized in 1924 by Walter Shewart and extended by Edward Demming.
The underlying theory is very simple.
Supposing you want a consistent success rate of S%, where success implies the Guided Troubleshooting Solution enabled the FSE to identify the root-cause of the failure and apply the appropriate corrective action. How would you periodically verify that you are still achieving S% success rate?
If periodically you sampled N cases, the standard deviation, s, associated with the estimate of S from N measurements is inversely proportional to the sqrt(N). This makes intuitive sense; the more data you have the more precise your estimates of S will be.
Assuming the cases you sampled are independent of each other, the expected number of successes from N trials will be between (S – 3s) and (S + 3s) with 99% confidence. This is basic property of normal distribution, and we will assume that it is applicable to S.
So, what does this all mean? Let’s plugin some values to make sense.
Let’s assume your goal is to have a success rate S = 80%. So, when you have a large enough data set, say thousands of troubleshooting test cases, you will get approximately 80% successful troubleshooting outcomes.
But how many successes should you expect if you just sampled 30, 100, 300 or 1000 test cases? The following table gives you the answer:
|Number of Cases||Minimum Number of Successes||Expected Number of Successes||Maximum Number of Successes|
You could monitor the accuracy of your Guided Troubleshooting Solution by periodically sampling N test cases and counting how many of those are successful. How to interpret these results?
Hope this helps you in measuring success in your Field Service Organization. Let me know how it works out.
I am often asked by a prospective customer this simple question – do you do Prognosis?
If the customer is in DoD/Aerospace world, or has an established R&D and Health Management program, the short answer is yes!
However, customers asking this question often are field service organizations that maintain expensive assets and are looking for an alternative to the traditional break-fix service model. The term Prognosis has become quite popular over the years, thanks to millions of dollars invested by the US Department of Defense in programs such as the Joint Strike Fighter. The promise of Prognostics is simple — wouldn’t it be nice if you could predict how much life each component has left, so that you could replace them just before they failed? But is Prognosis the right tool for you?
First, how good is your Diagnosis? If you are struggling with faults that have already happened, chances are you won’t do any better with faults that have not happened yet! Think of Prognosis as something that involves Diagnosing impending failures and predicting when they will develop into full blown faults. So, Diagnosis is the foundation for Prognosis, and the need for predictive capability makes Prognosis a powerful but expensive technique that should be used wisely only where it is necessary.
This brings us the second question: do you really need Prognosis? For example, your car has two headlights. While it is no fun driving in the rain with one headlight (have you noticed how headlights always seem to fail on rainy days?), having two headlights means that you can still get back home when one has failed. So, redundancy and fault-management are effective ways of reducing unscheduled downtime. For your critical components, evaluate what is the most cost-effective method to avoid disruption in service, and choose wisely.
And now, the all important third question: why do you want Prognosis?
Some people will answer I want to use less maintenance. For example, you may be used to changing the oil in your car every 3 months (schedule-based maintenance) or 3000 miles (usage-based maintenance), but by monitoring the condition of the oil and the filter, you could change the oil only when needed (condition-based maintenance or CBM). This is a valid application of condition monitoring, although strictly speaking, this is not Prognosis. Also, keep in mind that you may not be able to extend maintenance interval for safety critical components without exposing yourself to more liability.
However, most of our customers answer they need prognosis to reduce unscheduled downtime by doing preemptive repairs. Here too, Prognosis is not the only answer.
Let’s take an example — supposing you want to avoid being stranded on the highway due to tire failures. You could add sophisticated techniques that monitor the tread of the tires and how it is wearing out, how the underlying structure of the tire is holding up, the stress on the tire, etc, and predict when failure is imminent. You could develop such Prognosis at significant R&D expense, or, you could simply replace the tires when they look worn (CBM) (e.g., cracks on sidewall and/or tread-depth of 1/32nd inch or less). The second method may cause you to use at most one extra set of tires over the life of the car since you will throwaway tires with still some useful life left on it, but newer tires also improves your safety and performances, which has its own reward. Best of all, you can use the second technique on your current installed base without having to develop new technology.
Let’s also not forget, Prognosis or CBM does not completely prevent unscheduled downtime. You could still hit a pothole and get a flat tire – no matter how new your tire is!
To sum up, there is more than one way to reduce unscheduled downtime: Prognosis, Condition Based Maintenance, redundancy and fault tolerance and fault management. QSI can help you implement all of the techniques discussed here in a balanced health management solution. Health Management is not just Diagnosis or Prognosis, but an effectively engineered delivery of uptime at a reasonable cost.
Let us help you find the right balance of techniques to achieve your objectives.
Are you finding it cumbersome to create a test that only detects a single failure mode or failure modes on the same hierarchy level? Try using a Direct Test. A Direct Test has all the functionality of a regular test without the complexity. The complexity is caused by having to create a new unique function and attach it to the tests and applicable failure modes. With Direct Tests all you need to do is specify which failure modes are detected by the test, eliminating the need to create and attach a new function to the failure modes and test. Try it!