Control Chart to Analyze Customer Satisfaction Data

Control chart, data, analysis

Q: Let’s assume we have a process that is under control and we want to monitor a number of key quality characteristics expressed through small subjective scales, such as: excellent, very good, good, acceptable, poor and awful. This kind of data is typically available from customer satisfaction surveys, peer reviews, or similar sources.

In my situation, I have full historical data available and the process volume average is approximately 200 deliveries per month, giving me enough data and plenty of freedom to design the control chart I want.

What control chart would you recommend?

I don’t want to reduce my small scale data to pass/fail, since I would lose insight in the underlying data. Ideally, I’d like a chart that both provides control limits for process monitoring and gives insight on the repartition of scale items (i.e., “poor,” “good,” “excellent”).

A: You can handle this analysis a couple of ways.  The most obvious choice and probably the one that would give you the most information is a Q-chart. This chart is sometimes called a quality score chart.

The Q-chart assigns a weight to each category. Using the criteria presented, values would be:

  • excellent = 6
  • very good =5
  • good =4
  • acceptable =3
  • poor =2
  • awful=1.

You calculate the subgroup score by taking the weight of each score and multiply it by the count and then add all of the totals for the subgroup mean.

If 100 surveys were returned with results of 20 that were excellent, 25 very good, 25 good, 15  acceptable, 12 poor, and 3 awful, the calculation is:

6(20)+5(25)+4(25)+3(15)+2(12)+3(1)= 417

This is your score for this subgroup.   If you have more subgroups, you can calculate a grand mean by adding all the subgroup scores and dividing it by the number of subgroups.

If you had 10 subgroup scores of 417, 520, 395, 470, 250, 389, 530, 440, 420, and 405, the grand mean is simply:

((417+ 520+ 395+ 470+ 250+ 389+ 530+ 440+ 420+ 405)/10) = 4236/10 =423.6

The control limits would be the grand mean +/- 3 √grand mean.  Again, in this example, 423.6 +/-3√423.6 = 423.6 +/-3(20.58).   The lower limit is  361.86 and the upper limit is 485.34. This gives you a chance to see if things are stable or not.  If there is an out of control situation, you need to investigate further to find the cause.

The other choice is similar, but the weights have to total to 1. Using the criteria presented, the values would be:

  •  excellent = .3
  • very good = .28
  • good =.25
  • acceptable =.1
  • poor=.05
  • awful = .02.

You would calculate the numbers the same way for each subgroup:

.3(20)+.28(25)+.25(25)+.1(15)+.05(12)+.02(1)= 6+7+6.25+1.5+.6+.02=21.37

If you had 10 subgroup scores of 21.37, 19.3, 20.22, 25.7, 21.3, 17.2, 23.3, 22, 19.23, and 22.45, the grand mean is simply ((21.37+ 19.3+ 20.22+ 25.7+ 21.3+ 17.2+ 23.3+ 22+ 19.23+ 22.45)/10)= 212.07/10 =21.207.

The control limits would be the grand mean +/- 3 √grand mean.  Therefore, the limits would be 21.207+/-3 √21.207= 21.207+/-3(4.605).  The lower limit is  7.39 and the upper limit is 35.02.

The method is up to you.  The weights I used were simply arbitrary for this example. You would have to create your own weights for this analysis to be meaningful in your situation.  In the first example, I have it somewhat equally weighted. In the second example, it is biased to the high side.

I hope this helps.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
ASQ Fellow, CQE, CQA, CMQ/OE, CSSBB, CSSMBB
Fort Worth, TX

For more on this topic, please visit ASQ’s website.

Sampling in a Call Center

Q: I work as a quality assessor (QA) and I am assisting with a number of analyses in a call center. I need a little help with sampling. My questions are as follows:

1. How do I sample calls taken by an agent if there are six assessors and 20 call center agents that each make 100 calls per day?

2. I am assessing claims paid and I want to determine the error rate and the root cause. How many of those claims would have to be assessed by the same number of QAs if claims per day, per agent, exceed 100?

3. If there are 35 interventions made by an agent per day, with two QAs assessing 20 agents in this environment, then the total completed would amount to between 300 to 500 per month. What would be the sample size be in this situation?

A: I may be able to provide some ideas to help solve your problem.

The first question is about sampling calls per day by you and your fellow assessors. It is clear that the six assessors are not able to cover all of the calls handled by the 20 call center agents.

What is missing from the question is what are you measuring — customer satisfaction, correct resolution of issues, whether agents are appropriately following call protocols, or something else? Be very clear on what you are measuring.

For the sake of providing a response, let’s say you are able to judge whether the agents are appropriately addressing callers’ issues or not. A binary response, or simply a call, is either considered good or not (pass/fail). While this may oversimply your situation, it may be instructive on sampling.

Recalling some basic terms from statistics, remember that a sample is taken from some defined population in order to characterize or understand the population. Here, a sample of calls are assessed and you are interested in what portion of the calls are handled adequately (pass). If you could measure all calls, that would provide the answer. However, a limit on resources requires that we use sampling to estimate the population proportion of adequate calls.

Next, consider how sure you want the results of the sample to reflect the true and unknown population results. For example, if you don’t assess any calls and simply guess at the result, there would be little confidence in that result.

Confidence in sampling in one manner represents the likelihood that the sample is within a range of about the sample’s result. A 90 percent confidence means that if we repeatedly draw samples from the population, then the result from the sample would be within a confidence bound (close to the actual and unknown result) 90 percent of the time. That also means that the estimate will be wrong 10 percent of the time due to errors caused by sampling. This error is simply the finite chance that the sample draws from more calls that “pass” or “fail.” The sample, thus, is not able to accurately reflect the true population.

Setting the confidence is a reflection on how much risk one is willing to take related to the sample providing an inaccurate result. A higher confidence requires more samples.

Here is a simple sample size formula that may be useful in some situations.

n is samples size

C is confidence where 90% would be expressed as 0.9

pi is proportion considered passing, in this case good calls.

ln is  the natural logarithm

If we want 90 percent confidence that at least 90 percent of all calls are judged good (pass), then we need at least 22 monitored calls.

This formula is a special case of the binomial sample size calculation and assumes that there are no failed calls in the calls monitored. This assumes that if we assess 22 calls and none fail, that we have at least 90% confidence that the population has at least 90% good calls. If there is a failed call out the 22 assessments, we have evidence that we have less than 90 percent confidence of at least 90 percent good calls. This doesn’t provide information to estimate the actual proportion, yet it is a way to detect if the proportion falls below a set level.

If the intention is to estimate the population proportion of good vs. bad calls, then we use a slightly more complex formula.

pi is the same, the proportion of good calls vs. bad calls

z is the area under a standard normal distribution corresponding to alpha/2 (for 90 percent confidence, we have 90 = 100 percent (1-alpha), thus, in this case alpha is 0.1. The area under the standard normal distribution is 1.645.

E is related to accuracy of the result. It defines a range within which the estimate should reside about the resulting estimate of the population value. A higher value of E reduces the number of samples needed, yet the result may be further away from the true value than desired.

The value of E depends on the standard deviation of the population. If that is not known, just use an estimate from previous measurements or run a short experiment to determine a reasonable estimate. If the proportion of bad calls is the same from day-to-day and from agent-to-agent,  then the standard deviation may be relatively small. If, on the other hand, there is agent-to -agent and day-to-day variation, the standard deviation may be relatively large and should be carefully estimated.

The z value is directly related to the confidence and affects the sample size as discussed above.

Notice that pi, the proportion of good calls, is in the formula. Thus if you are taking the sample in order to estimate an unknown pi, then to determine sample size, assume pi is 0.5. This will generate the largest possible sample size and permit an estimate of pi with confidence of 100 percent (1-alpha) and accuracy of E or better. If you know pi from previous estimates, then use it to help reduce the sample size slightly.

Let’s do an example and say we want 90 percent confidence. The alpha is 0.1 and the z alpha/2 is 1.645. Let’s assume we do not have an estimate for pi, so we will use 0.5 for pi in the equation. Lastly, we want the final estimate based on the sample to be within 0.1 (estimate of pi +/- 0.1), so E is 0.1.

Running the calculation, we find that we need to sample 1,178 calls to meet the constraints of confidence and accuracy. Increasing the allowable accuracy or increasing the sampling risk (higher E or higher C) may permit finding a meaningful sample size.

It may occur that obtaining a daily sample rate with an acceptable confidence and accuracy is not possible. In that case, sample as many as you can. The results over a few days may provide enough of a sample to provide an estimate.

One consideration with the normal approximation of a binomial distribution for the second sample size formula is it breaks down when either pi n and n (1-pi) are less than five. If either value is less than five, then the confidence interval is large enough to be of little value. If you are in this situation, use the binomial distribution directly rather than the normal approximation.

One last note. In most sampling cases, the overall size of the population doesn’t really matter too much. A population of about 100 is close enough to infinite that we really do not consider the population size. A small population and a need to sample may require special treatment of sampling with or without replacement, plus adjustments to the basic sample size formulas.

Creating the right sample size to a large degree depends on what you want to know about the population. In part, you need to know the final result to calculate the “right” sample size, so it often just an estimate. By using the above equations and concepts, you can minimize risk of determining an unclear result, yet it will always be an evolving process to determine the right sample size for each situation.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability
http://www.fmsreliability.com

Related Content:

Find more information about sampling on ASQ’s website.