Sampling in a Call Center

Q: I work as a quality assessor (QA) and I am assisting with a number of analyses in a call center. I need a little help with sampling. My questions are as follows:

1. How do I sample calls taken by an agent if there are six assessors and 20 call center agents that each make 100 calls per day?

2. I am assessing claims paid and I want to determine the error rate and the root cause. How many of those claims would have to be assessed by the same number of QAs if claims per day, per agent, exceed 100?

3. If there are 35 interventions made by an agent per day, with two QAs assessing 20 agents in this environment, then the total completed would amount to between 300 to 500 per month. What would be the sample size be in this situation?

A: I may be able to provide some ideas to help solve your problem.

The first question is about sampling calls per day by you and your fellow assessors. It is clear that the six assessors are not able to cover all of the calls handled by the 20 call center agents.

What is missing from the question is what are you measuring — customer satisfaction, correct resolution of issues, whether agents are appropriately following call protocols, or something else? Be very clear on what you are measuring.

For the sake of providing a response, let’s say you are able to judge whether the agents are appropriately addressing callers’ issues or not. A binary response, or simply a call, is either considered good or not (pass/fail). While this may oversimply your situation, it may be instructive on sampling.

Recalling some basic terms from statistics, remember that a sample is taken from some defined population in order to characterize or understand the population. Here, a sample of calls are assessed and you are interested in what portion of the calls are handled adequately (pass). If you could measure all calls, that would provide the answer. However, a limit on resources requires that we use sampling to estimate the population proportion of adequate calls.

Next, consider how sure you want the results of the sample to reflect the true and unknown population results. For example, if you don’t assess any calls and simply guess at the result, there would be little confidence in that result.

Confidence in sampling in one manner represents the likelihood that the sample is within a range of about the sample’s result. A 90 percent confidence means that if we repeatedly draw samples from the population, then the result from the sample would be within a confidence bound (close to the actual and unknown result) 90 percent of the time. That also means that the estimate will be wrong 10 percent of the time due to errors caused by sampling. This error is simply the finite chance that the sample draws from more calls that “pass” or “fail.” The sample, thus, is not able to accurately reflect the true population.

Setting the confidence is a reflection on how much risk one is willing to take related to the sample providing an inaccurate result. A higher confidence requires more samples.

Here is a simple sample size formula that may be useful in some situations.

n is samples size

C is confidence where 90% would be expressed as 0.9

pi is proportion considered passing, in this case good calls.

ln is  the natural logarithm

If we want 90 percent confidence that at least 90 percent of all calls are judged good (pass), then we need at least 22 monitored calls.

This formula is a special case of the binomial sample size calculation and assumes that there are no failed calls in the calls monitored. This assumes that if we assess 22 calls and none fail, that we have at least 90% confidence that the population has at least 90% good calls. If there is a failed call out the 22 assessments, we have evidence that we have less than 90 percent confidence of at least 90 percent good calls. This doesn’t provide information to estimate the actual proportion, yet it is a way to detect if the proportion falls below a set level.

If the intention is to estimate the population proportion of good vs. bad calls, then we use a slightly more complex formula.

pi is the same, the proportion of good calls vs. bad calls

z is the area under a standard normal distribution corresponding to alpha/2 (for 90 percent confidence, we have 90 = 100 percent (1-alpha), thus, in this case alpha is 0.1. The area under the standard normal distribution is 1.645.

E is related to accuracy of the result. It defines a range within which the estimate should reside about the resulting estimate of the population value. A higher value of E reduces the number of samples needed, yet the result may be further away from the true value than desired.

The value of E depends on the standard deviation of the population. If that is not known, just use an estimate from previous measurements or run a short experiment to determine a reasonable estimate. If the proportion of bad calls is the same from day-to-day and from agent-to-agent,  then the standard deviation may be relatively small. If, on the other hand, there is agent-to -agent and day-to-day variation, the standard deviation may be relatively large and should be carefully estimated.

The z value is directly related to the confidence and affects the sample size as discussed above.

Notice that pi, the proportion of good calls, is in the formula. Thus if you are taking the sample in order to estimate an unknown pi, then to determine sample size, assume pi is 0.5. This will generate the largest possible sample size and permit an estimate of pi with confidence of 100 percent (1-alpha) and accuracy of E or better. If you know pi from previous estimates, then use it to help reduce the sample size slightly.

Let’s do an example and say we want 90 percent confidence. The alpha is 0.1 and the z alpha/2 is 1.645. Let’s assume we do not have an estimate for pi, so we will use 0.5 for pi in the equation. Lastly, we want the final estimate based on the sample to be within 0.1 (estimate of pi +/- 0.1), so E is 0.1.

Running the calculation, we find that we need to sample 1,178 calls to meet the constraints of confidence and accuracy. Increasing the allowable accuracy or increasing the sampling risk (higher E or higher C) may permit finding a meaningful sample size.

It may occur that obtaining a daily sample rate with an acceptable confidence and accuracy is not possible. In that case, sample as many as you can. The results over a few days may provide enough of a sample to provide an estimate.

One consideration with the normal approximation of a binomial distribution for the second sample size formula is it breaks down when either pi n and n (1-pi) are less than five. If either value is less than five, then the confidence interval is large enough to be of little value. If you are in this situation, use the binomial distribution directly rather than the normal approximation.

One last note. In most sampling cases, the overall size of the population doesn’t really matter too much. A population of about 100 is close enough to infinite that we really do not consider the population size. A small population and a need to sample may require special treatment of sampling with or without replacement, plus adjustments to the basic sample size formulas.

Creating the right sample size to a large degree depends on what you want to know about the population. In part, you need to know the final result to calculate the “right” sample size, so it often just an estimate. By using the above equations and concepts, you can minimize risk of determining an unclear result, yet it will always be an evolving process to determine the right sample size for each situation.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability
http://www.fmsreliability.com

Related Content:

Find more information about sampling on ASQ’s website.

Z1.4 and Z1.9 in Micro Testing and API Chemical Analysis

Chemistry, micro testing, chemical analysis, sampling

Q: I work at a cosmetics manufacturing company that produces sunscreen in bulk amounts. When we make 3,000 kg of sunscreen, we will use that in 10,000 units of final sunscreen products which will weigh 300 g each.

How many samples do I need to collect from the 10,000 units to pass the qualification?

The products need to pass both attribute and variable sampling tests such as container damage, coding error, micro testing, and Active Pharmaceutical Ingredients (API)  failure. Almost 100 percent of final products were inspected for appearance error, but a small number of them should be measured for micro testing and API chemical analysis.

For Z1.4-2008: Sampling Procedures and Tables for Inspection by Attributes, we have to collect a sample of 200 (lot size of 3,201-10,000; general inspection level II;  acceptable quality level 4.0 L), and more than 179 should pass for qualification.

For Z1.9-2008: Sampling Procedures and Tables for Inspection by Variables for Percent Nonconforming, we have to collect a sample of 25 (lot size of 3,201-10,000; general inspection level II; acceptable quality level 4.0, L), to meet the requirement of 1.12 percent of nonconformance.

Which sampling plan should we follow for micro testing and API chemical analysis?

A: If the micro test is pass/fail, then you should use Z1.4. The API chemical test  probably yields a numerical result for which you can calculate the average and standard deviation. Then, the proper standard to use is Z1.9. If the micro test gives you a numerical result, then you can use Z1.9 for it as well.

One thing to consider is the fact that the materials are from a
batch. If the batch can be assumed to be completely mixed without settling or separation prior to loading into final packaging, then the API chemical test may only need to be done on the batch, not on the final product. Micro testing, which can be affected by the cleanliness of the packaging equipment, probably needs to be done on the final product.

Brenda Bishop
U.S. Liaison to TC 69/WG3
ASQ CQE, CQA, CMQ/OE, CRE, SSBB, CQIA
Belleville, Illinois

For more on this topic, please visit ASQ’s website

Z1.4:2008 Inspection Levels

Q: I am reading ANSI/ASQ Z1.4-2008: Sampling procedures and tables for inspection by attributes, and there is a small section regarding inspection level (clause 9.2). Can I get further explanation of how one would justify that less discrimination is needed?

For example, my lot size is 720 which means, under general inspection level II, the sample size would be 80 (code J). However, we run a variety of tests, including microbial and heavy metal testing. These tests are very costly. We would like to justify that we can abide by level I or even lower if possible. Do you have any advice?

The product is a liquid dietary supplement.

 A: Justification of a specific inspection level is the responsibility of the “responsible party.” Rationale for using one of the special levels (S-1, S-2, S-3, S-4) could be based on the cost or time to perform a test. Less discrimination means that the actual Acceptable Quality Level (AQL) on the table underestimates the true AQL, as the sample size has been reduced from the table-suggested sample size (i.e. Table II-A has sample level G of 32 for a lot size of 151 to 280, while General Inspection level I would require Letter E or 13 samples for the same lot size).

Justification of a sampling plan is based on risk and a sampling plan can be justified based on the cost of the test, assuming you are willing to take larger sampling risks. If you use one of the special sampling plans based on the cost of the test, it is helpful to calculate the actual AQL and Limiting Quality (LQ) using the following formulas.

You solve the equation for AQL and LQ for a given sample size (n) and defects allowed (x):

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Principal Statistician, BD
http://statisticaloutsourcingservices.com

For more on this topic, please visit ASQ’s website.

AQL for Electricity Meter Testing

Chart, graph, sampling, plan, calculation, z1.4

Q: We have implemented a program to test electricity meters that are already in use. This would target approximately 28,000 electricity meters that have been in operation for more than 15 years. Under this program, we plan to test a sample of meters and come to a conclusion about the whole batch  —  whether replacement is required or not. As per ANSI/ISO/ASQ 2859-1:1999: Sampling procedures for inspection by attributes — Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection, we have selected a sample of 315 to be in line with the total number of electricity meters in the batch.

Please advice us on how to select an appropriate acceptable quality level (AQL) value to accurately reflect the requirement of our survey and come in to a decision on whether the whole batch to be rejected and replaced. Thank you.

A: One of the least liked phrases uttered by statisticians is “it depends.” Unfortunately, in response to your question, the selection of the AQL depends on a number of factors and considerations.

If one didn’t have to sample from a population to make a decision, meaning we could perform 100% inspection accurately and economically, we wouldn’t need to set an AQL. Likewise, if we were not able to test any units from the population at all, we wouldn’t need the AQL. It’s the sampling and associated uncertainty that it provides that requires some thought in setting an AQL value.

As you may notice, the lower the AQL the more samples are required. Think of it as reflecting the size of a needle. A very large needle (say, the size of a telephone pole) is very easy to find in a haystack. An ordinary needle is proverbially impossible to find. If you desire to determine if all the units are faulty or not (100% would fail the testing if the hypothesis is true), that would be a large needle and only one sample would be necessary. If, on the other hand, you wanted to find if only one unit of the entire population is faulty, that would be a relatively small needle and 100% sampling may be required, as the testing has the possibility of finding all are good except for the very last unit tested in the population.

AQL is not the needle or, in your case, the proportion of faulty fielded units. It is the average quality level which is related to the proportion of bad units. The AQL is fixed by the probability of a random sample being drawn from a population with an unknown actual failure rate of the AQL (say 0.5%), creating a sample that has a sample failure rate of 0.5% or less. We set the probability of acceptance relatively high, often 95%. This means if the population is actually mostly as good as or better than our AQL, we have a 95% chance of pulling a sample that will result in accepting the batch as being good.

The probability of acceptance is built into the sampling plan. Drafting an operating characteristic curve of your sampling plan is helpful in understanding the relationship between AQL, probability of acceptance, and other sampling related values.

Now back to the comment of “it depends.” The AQL is the statement that basically says the population is good enough – an acceptable low failure rate. For an electrical meter, the number of out of specification may be defined by contract or agreement with the utility or regulatory body. As an end customer, I would enjoy a meter that under reports my electricity use as I would pay for less than I received. The utility company would not enjoy this situation, as it provides their service at a discount. And you can imagine the reverse situation and consequences. Some calculations and assumptions would permit you to determine the cost to the consumers or to the utility for various proportions of units out of specification, either over or under reporting. Balance the cost of testing to the cost to meter errors and you can find a reasonable sampling plan.

Besides the regulatory or contract requirements for acceptable percent defective, or the balance between costs, you should also consider the legal and publicity ramifications. If you accept 0.5% as the AQL, and there are one million end customers, that is 5,000 customers with possibly faulty meters. What is the cost of bad publicity or legal action? While not likely if the total number of faulty units is small, there does exist the possibility of a very expensive consequence.

Another consideration is the measurement error of the testing of the sampled units. If the measurement is not perfect, which is a reasonable assumption in most cases, then the results of the testing may have some finite possibilities to not represent the actual performance of the units. If the testing itself has repeatability and reproducibility issues, then setting a lower AQL may help to provide a margin to guard from this uncertainty. A good test (accurate, repeatable, reproducible, etc.) should have less of an effect on the AQL setting.

In summary, if the decision based on the sample results is important (major expensive recall, safety or loss of account, for example), then use a relatively lower AQL. If the test result is for an information gathering purpose which is not used for any major decisions, then setting a relatively higher AQL is fine.

If my meter is in the population under consideration, I am not sure I want my meter evaluated. There are three outcomes:

  • The meter is fine and in specification, which is to be expected and nothing changes.
  • The meter is overcharging me and is replaced with a new meter and my utility bill is reduced going forward. I may then pursue the return of past overcharging if the amount is worth the effort.
  • The meter is undercharging me, in which case I wouldn’t want the meter changed nor the back charging bill from the utility (which I doubt they would do unless they found evidence of tampering).

As an engineer and good customer, I would want to be sure my meter is accurate, of course.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability
http://www.fmsreliability.com

For more on this topic, please visit ASQ’s website

Sampling Plan for Pharmaceuticals

Pharmaceutical sampling

Q: We are a U.S. dietary supplements manufacturer operating under c-GMP conditions set by the U.S. Food & Drug Administration (FDA).

As such, we perform analyses of incoming raw materials (finished product ingredients), intermediate products (during manufacturing), and finished products. Analyses include identity testing (incoming raw materials), and other types of analysis (e.g. microbiological, heavy metals, some quantitative assays on specific compounds). These tests would be the attributes we wish to assess.

Basically, we are refining our sampling procedures and need to ascertain an acceptable number of samples to be taken for the various testing purposes outlined above.

The World Health Organization’s (WHO) Technical Report Series No. 929,  Annex 4, “WHO Guidelines for sampling of pharmaceutical products and related materials” references ANSI/ISO/ASQ 2859-1:1999 Sampling procedures for inspection of attributes – Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection in reference to the selection of a statistically-valid number of samples for testing purposes.

I note from your website that there are a number of other sampling standards available. I am seeking some guidance as to the most appropriate standard(s) for our particular purposes.

Any assistance you can offer would be much appreciated.

A: Though many of the sampling plans are similar, many standards organizations have published different interpretations of sampling schemes.  Since WHO recommends using ISO 2859-1 as the guidance document, I suggest selecting that plan.

There are similar documents that could be used as an alternative, if necessary:

1. ANSI/ASQ Z1.4-2003 (R2018): Sampling Procedures and tables for inspection by attributes

2. BS 6001-1:1999/ISO 2859-1:1999+A1:2011 Sampling procedures for inspection by attributes. Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection

3. MIL-STD-105E – Sampling Procedures and Tables for Inspection by Attributes*

4. JIS Z9015-0-1999 Sampling procedures for inspection by attributes — Part 0 Introduction to the JIS Z 9015 attribute sampling system

A few points to consider:

  • Usually for FDA-regulated products, a c=0 sampling plan is appropriate. See H1331 Zero Acceptance Number Sampling Plans, Fifth Edition, by Nicholas L. Squeglia
  • Based on risk, an Acceptable Quality Level (AQL) should be selected
  • Your sample size is usually set to be proportional to lot size.  If you are doing testing on bulk raw materials, the sample size will be set based on the variability of the lot as well as the variability of the method.

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Principal Statistician, BD
http://statisticaloutsourcingservices.com/

Note:

 *Military standard, cancelled and superceded by MIL-STD-1916, “DoD Preferred Methods for Acceptance of Product”, or ANSI/ASQ Z1.4:2008, according to Notice of Cancellation

Explore more about this topic.

Sampling Employee Tasks

Employees, Training, Working, Learning, Duties, Tasks

Question

We are collecting data on what tasks our employees in various departments do each day. We hope to eventually get a representation of what each employee does all year long.  Randomly, throughout the day, employees record the tasks they are doing.  We are not sure how to calculate an appropriate sample size and we are not sure how many data points to collect.

Answer

I wish there was a simple answer.  We need to consider:

  • If it makes a difference on how long an employee has been performing a job?
  • Are the departments are equivalent in terms of what they are doing?
  • What is the difference that you  want to detect?

The simple rule is that the smaller the difference, then the larger the sample size. By smaller, it is less than 1 standard deviation from the data that has been detected.

Random records are O.K., but really, shouldn’t you want a record for everyone for at least a week? That would give you an idea of what is done across the board and, then, if you are trying to readjust the workloads, you have some basis for it based on the logs.  My concern with the current method is that you may have a lot of extra paperwork to account for everyone for a certain time.

Additional information provided by the questioner:

The goal of this project is to establish a baseline of activities that occur in the department and to answer the question “What does the department do all day?”

The amount of time an employee has been performing a job does not make a difference. The tasks performed in each department are considered equivalent.  We are not accounting for the amount of time it takes to complete a task — we are more interested in how frequently that task is required/requested.

The results will be used to identify enhancement opportunities to our database and identifying improvements to the current (and more frequent) processes.  The team will use a system (form in Metastorm) to capture activities throughout the day.  Frequency is approximately 5 entries an hour at random times of the hour.

I have worked with the department’s manager to capture content for the following fields using the form:

  1. Department (network management or dealer relation)
  2. Task (tier 1)
  3. Why (tier 2 – dependent on selection of task)
  4. Lessee/client name
  5. Application
  6. Country
  7. Source of request (department)

We are looking for a reasonable approach to calculate the sample size required for a 90 – 95% confidence level.  The frequency of hourly entries and length of period to capture the data can be adjusted to accommodate the resulting sample size.

Answer

The additional information helps.  Since you have no previous data and you are getting 5 samples an hour from each employee, (assuming a 7 hour workday, taking out lunch and two breaks), that will give you approximately 35 samples a day. Assuming a five-day week, that gives you approximately 175 data points per employee.  This should give you enough information to get an estimate of what is done for a week.

Now, you will probably want to extend this out another three weeks so that you have an idea of what happens over a month.  If you can assume that the data collected is representative of all months, then you should be O.K.  If you feel that some months are different, then you may want to look at taking another sample during the months where you anticipate different volumes from the one you have. You can use the sample size calculation for discrete data using the information that you have already collected and not look at all employees, but target your average performers.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
ASQ Fellow, CQE, CQA, CMQ/OE, CSSBB, CMBB
Fort Worth, TX

For more on this topic, please visit ASQ’s website.

ISO 2859-3 Skip-lot Sampling 5.1.1, 5.2.1

Suppliers, supplier management

Q: Our quality team is trying to improve inspection efficiency and enhance supplier management by employing ISO 2859-3:2005 Sampling procedures for inspection by attributes — Part 3: Skip-lot sampling procedures.

Here are two questions on product qualification related to clause 5.2.1 Generic requirements for product qualification.

1. The standard requires that:

b) The product shall not have any critical classes of nonconforming items or nonconformities.

First, my understanding is that the risk level with any potential failure or nonconforming of the product should be low to customer — is this correct? Second, if a candidate product carries some critical features (dimension of mechanical product), but also carries a number of low risk features, can we apply the skip lot concept only to the non-critical features? And continue to perform lot-by-lot inspection with critical features? We are concerned the definition of “product” in the standard is a generic term and could be interpreted as feature of a physical product.

2. The standard requires that:

c) The specified AQL(s) shall be at least 0,025 %.

Does this mean the AQL value should be less than or greater than 0.025%? I assume “greater.” In our company, the most often used is AQL 1.0 and AQL 2.5, which I think meets the requirement.

We would greatly appreciate your help.

A: My name is Dean Neubauer and I am the U.S. Lead Delegate to Subcommittee 5 on Acceptance Sampling and Quality Press author. I hope I can help you.

Let’s start with question 1.

The general idea of skip-lot sampling is to reduce the number of times incoming lots inspected due to exceptional quality on behalf of the supplier.  ISO 2859-3 states this in the beginning as:

The purpose of these procedures is to provide a way of reducing the inspection effort on products of high quality submitted by a supplier who has a satisfactory quality assurance system and effective quality controls.

The reduction in inspection effort is achieved by determining at random, with a specified probability, whether a lot presented for inspection will be accepted without inspection.

A skip-lot sampling plan is also known as a cumulative results plan.  In general, such plans require certain assumptions to be met regarding the nature of the inspection process:

  • The lot should be one of a continuing series of lots
  • We expect these lots to be of the same quality
  • The consumer should not expect that any lot is any worse than any of the immediately preceding lots
  • The consumer must have confidence in the supplier not to pass a substandard lot even though other lots are of acceptable quality

Under these conditions, we can use the record of previous inspections as a means of reducing the amount of inspection on any given lot.  ISO 2859-3 states the above in 5.1.1 Requirements for supplier qualification:

The requirements for supplier qualification are as follows.

a) The supplier shall have implemented and maintained a documented system for controlling product quality and design changes. It is assumed that the system includes inspection by the supplier of each lot produced and the recording of inspection results.

b) The supplier shall have instituted a system that is capable of detecting and correcting shifts in quality levels and monitoring process changes that may adversely affect quality. The supplier’s personnel responsible for the application of the system shall demonstrate a clear understanding of the applicable standards, systems and procedures to be followed.

c) The supplier shall not have experienced any change that might adversely affect quality.

The underlying assumption here is that the supplier quality is exceptional (low nonconforming level).  The skip-lot plan is applied to each characteristic, or feature, separately.  If several characteristics are present, then try to test for at least one of them.  In your situation, if you have critical characteristics you should not be doing skip-lot inspection due to the risk of ignoring (skipping) a potentially dangerous lot.  On the other hand, you can use a skip-lot plans for non-critical (major and minor) nonconformities and they will have different AQL levels associated with them.  Your listing of AQLs of 1.0% and 2.5% are typical for major and minor nonconformities.  Critical defects will typically have an AQL less than 1.0%, such as 0.25% to 0.65% (0% is preferred but theoretically unattainable as you would have to do a perfect 100% inspection, i.e., no inspection error).

The subclause referenced in question 2 states that the AQL level must be greater than or equal to 0.025%. Your levels of 1.0% and 2.5% can be used.

Dean Neubauer

U.S. Lead delegate for Subcommittee 5 on Acceptance Sampling on ISO Technical Committee 69 on Applications of Statistical Methods.

For more information on this topic, visit ASQ’s website.

ANOVA for Tailgate Samples

Automotive inspection, TS 16949, IATF 16949

Q: I have a question that is related to comparison studies done on incoming inspections.

My organization has a process for which it receives a “tailgate” sample from a supplier and then compares that data with three samples of the next three shipments to “qualify” them. The reason behind this comparison is to determine if the production process of the vendor has changed significantly from the “tailgate” sample, or if they picked the best of the best for the “tailgate.”

It seems a student’s t-test for comparing two means might be a simple and quick evaluation, but I believe an ANOVA might in order for the various characteristics measured (there are multiple).

Can an expert provide some statistician advice to help me move forward in determining an effective solution?

A: Assuming the data is continuous,  ANOVA (or MANOVA for multiple responses) should be employed. Since the tailgate sample is a control, Dunnett’s multiple comparison test should be used if the p-value from ANOVA is less than 0.05.  If the data is discrete (pass/fail), then comparing the lots would require the use of a chi-square test.

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Principal Statistician, BD
http://statisticaloutsourcingservices.com/

For more information on this topic, please visit ASQ’s website.

Guidance on Z1.4 Levels

Chart, graph, sampling, plan, calculation, z1.4

Q: My company is using ANSI/ASQ Z1.4-2008 Sampling Procedures and Tables for Inspection by Attributes, and we need some clarification on the levels and the sampling plans.

We are specifically looking at Acceptable Quality Limits (AQLs) 1.5, 2.5, 4.0, and 6.5 for post manufacturing of apparel, footwear, home products, and jewelry.

Do you have any guidelines to determine when and where to use levels I, II, and III? I understand that level II is the norm and used most of the time. However, we are not clear on levels I and III versus normal, tightened, and reduced.

Are there any recommended guidelines that correlate between levels I, II, III and single sampling plans, normal, tightened, and reduced?

The tables referenced in the standard show single sampling plans for normal, tightened, and reduced, can you confirm that these are for level II (pages 11, 12, 13)?

Do you have any tables showing the levels I and III for normal, tightened, and reduced?

A: Level I is used when you need less discrimination or when you are not as critical on the acceptance criteria. This is usually used for cosmetic defects where you may have color differences, but it is not noticeable in a single unit. Level III is used when you want to be very picky.  This is a more difficult level to get acceptance with, so it needs to be used sparingly or it can cost you a lot of money.

Each level has a normal, tightened and reduced scheme.  I am not sure about what you are asking for with respect to correlation to levels I, II and III and normal, tightened and reduced.  The goal is to simply inspect the minimum amount to get an accept or reject decision. Since inspection costs money, we do not want to do too much. Likewise, we do not want to reject much since that also costs money both in product availability and extra shipping.

Yes, the tables on pages 11, 12 and 13 are for normal, tightened, and reduced, but if you look at the letters for sample size, you will note that in most cases there are different letters for the levels I, II, and III.  Accept and reject numbers are based on the defect level and the sample size. The switching rules tell you when you can switch to either a reduced or tightened level. The tables can handle not just the levels I, II , and III, but also the special levels.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
ASQ Fellow, CQE, CQA, CMQ/OE, CSSBB, CMBB
Fort Worth, TX

Operational Qualification (OQ) Challenges; Cpk vs. AQL

Q: We’re completing a validation of a plastic extrusion process, which has raised a few questions with me.

This validation exercise encompasses the installation qualification (IQ), operational qualification (OQ), and the performance qualification (PQ). The IQ is self explanatory, but the OQ is challenging. The process is dependent on the batch resin properties which vary enough that the extrusion processing parameters cannot be setup where good parts are always produced. One resin batch can use processing parameters that will not work with the next batch. A justification will be written and included in the documentation package to explain this. Does the inability of defining an operating window void or limit the validation?

My second question has to do with PQ acceptance criteria. The PQ will be three production runs using at least two different material resins (the largest source of variation). While production acceptance will be on an AQL=1.0, C=0 basis, these initial validation lots will be accepted on a process capability index (Cpk) level. While on the surface the acceptance difference may seem benign, it is causing some changes. The tolerance is such that the process routinely passes the Acceptable Quality Limit (AQL) test criteria but fails a Cpk requirement. Is it possible to accept PQ runs as they would be accepted in production?

A related question is the power of a Cpk vs. an AQL sampling plan. A Cpk value can be calculated using the same number of samples on a 100-foot run vs. a 10,000-foot run, while an AQL sampling plan is size dependant. Is there a criterion on sample size or a rule of thumb as to when one plan should be used over another?

A: First, the plastic extrusion process is always a tricky one to qualify simply because each new batch of resin always requires adjustments no matter how controlled the storage conditions are. So yes, you will have to define what adjustments your organization has to make and how big an operating window you need to transition from batch to batch.  If you can demonstrate that it can be resolved within a certain time (say, 15-30 minutes), then it should be ok for validation.  This is assuming that the customer is in agreement with what your company is doing.

Cpk formula, Cpk indexThe second question is a bit more difficult in that the Cpk is assuming that the process is in control and performing at a steady rate.  Cpk is a long term measure and requires the use of control charts to really control the process.  You may be able to work with your customer on help to get validated to the Cpk requirement, but you have to show the plan to get here.  In the past, some customers have been willing to provide an extended period to attain validation. You may want to talk to your customer representative to find out what help they can provide.

The third question gets to the fundamental heart of the situation: the question of using Cpk vs. AQL.  Cpk is a measure of process capability and AQL is a measure of long-term, outgoing quality.  Are they the same?  On some studies I did early on with Cpk and specifications, it was not always clear.  I have not seen any criterion on sample size on when to use Cpk vs. AQL.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
ASQ Fellow, CQE, CQA, CMQ/OE, CSSBB, CMBB
Fort Worth, TX

For more on this topic, please visit ASQ’s website.