Range Statistics and the d2 Constant Used in Statistical Process Control Charts
Range statistics are often used in statistical process control charting. One type of statistical process control chart is the average and range chart. Another type is the individual and moving range chart. To calculate control limits for each SPC chart requires we estimate the standard deviation. This estimate of the standard deviation depends on the sampling program. Click on link to learn more about statistical process control limits and how they’re derived.
Range Statistics for the Average and Range Charts
When using an average and range chart a typical sample size is n=5 consecutive sample specimens and we call this a subgroup. Each sample specimen, in a subgroup, should be produced under homogeneous or like conditions. When produced under like conditions we say the samples specimens came from a system of common causes. The time between subgroups should be constant, if possible. In case of a special cause event, more subgroups may be needed to confirm the event is real.
Range Statistics for the Individuals and Moving Range Charts
When using an individual and moving range chart there are n=1 data points per sampling period. It is a subgroup of n=1 sample. The time between data points should be constant, when possible. Since the change between two consecutive data points captures the smallest time frame we use a moving range of n=2 consecutive data points. For example, the first moving range is the absolute value of the difference between the second and first data point. Likewise the second moving range is the absolute value of the difference between the third and second data point and so on.
How Do We Calculate the Standard Deviation From these Two Range Statistics?
To calculate the standard deviation for these two range statistics we use the following expressions. The first is for the average and range charts and the second is for the individual and moving range charts.
Range Statistics and the d2 Constant
Where did the d2 constant come from? They’re found in statistical process control tables. But how were they derived? It’s better explained by numerical example and it’s something that you can also figure out on your own using Excel or Minitab.
In Minitab, I created a column of 100,000 normally distributed numbers having a mean of 0 and standard deviation of 1. In the next column I created another column of 100,000 normally distributed numbers having the same mean and standard deviation. In the third column I computed the range statistics. To compute the range statistics I subtracted the smallest from the largest value for each row. This yields a column of 100,000 range values. Shown in the figure below is a histogram for the range statistics for n=2.
When I compute the average for the histogram of range statistics for n=2 we have d2=1.13. Notice this value is the same value found in any statistical process control table. Let see if we get the same value for d2, found in statistical process control tables for n=5.
When I compute the average for the histogram of range statistics for n=5 we get d2=2.33. Notice this value is the same value reported in any statistical process control table. Note the d2 constant is reported to two decimal places.
Things to keep in Mind about Range Statistics
Notice that I used normally distributed values to compute the range statistics in Minitab. Also notice the shape of the distribution becomes normally distributed as the sample size increases from n=2 to n=5. This is one reason why n=5 is a common subgroup sample size. Also notice we used histograms to model an infinite curve. If we knew the probability distribution function for n=2 and n=5 we could precisely compute the d2 constants using calculus. That I will leave for another post.
The d2 Constant Explained!
So there you have it. If you ever wondered where these d2 values came from – well now you know! Now I need you to let me know if you appreciate this post by leaving your comments below. I look forward to hearing from you.
If you are interested is seeing how you can visualize and estimate the d2 and d3 constants then watch the video below!
If you enjoyed this blog post, then consider reading these interesting Articles!
Range and d2 constant _How to calculate SD is very nicely explained, Thanks & regards.
You showed the calculation of d2 using minitab ,using normal data sets. But in past when these constants were derived, nothing this sort of software were available, then how did they established d2?
The d2 constant was derived mathematically using order statistics. If I can find the manuscript I will share it.
please the Origin of the array of constant d2 cited in the R&R study calculation formulas?
A good place to find a table of d2 values in the 4th Edition of the Measurement Systems Analysis Manual by AIAG.
Thank you for your excellent blog.
Can you tell how to calculate or point me to a reference for d2* for 1 subgroup and subgroup sizes from 21 to 30?
Thanks
Hello Michael
Please refer to the fourth edition of the MSA Manual. On page 203 (Appendix C) you will find Table C1. It shows the values associated with the Distribution of the Average Range. In the first row (g=1 subgroup) you will find the d2-star values for m= 2 through 20 subgroups. Unfortunately, it doesn’t list the values of d2-star for m=21 through m=30 subgroup. But, this tables does mention that it was taken from Acheson Duncan, Quality Control and Industrial Statistics, 5th edition, Mcgraw Hill 1986. I trust this text will be able to assist you further. If you have success locating this text, and it proves useful, I trust you will write me back and let me know.
Best regards,
Andrew
I ordered the book. I will let you know if it contains the values.
Mike
Hello Michael – I hope the book provides you with the answers you are looking for!
All the best
Andrew
Hi asking about to calculate to find the value of d2 depending the value of skewed in Weibull Distribution
thanks for you
Hello Kawa.
I suggest that you conduct a simulation using the distribution of interest and sample from that distribution for various subgroup sample sizes. You can then compute the d2 values from such a simulation. If you do so, I would be interested in knowing as the sample subgroup size increases if the d2 constant approaches those values already reported. The following link discuss how I computed the d2 values using simulation.
https://andrewmilivojevich.com/range-statistics/
Hi asking about to calculate to find the value of d2 depending the value of skewed in Weibull Distribution
I would suggest that you could determine the d2 value via simulation. Begin by constructing the Weibull Distribution using Minitab (or other software) and create sampling distributions based on how many samples per subgroup you wish to study. Please refer to the Minitab video where I show how to perform this simulation for a Normal Distribution in Minitab
Hi, Andrew,
Thanks for the clear explanation. I have a question below.
Rbar/d2 is used to estimate the standard variation of within subgroups. Will it only be applied to Normal Distribution process ? In non-normal distribution using non-normal forms, can we use Rbar/d2 for calculating standard variation ? and then calculate Cp?
The reason why Shewhart used the Rbar/d2 estimate using a normal distribution results from a comparison of several distributions. He observed that the normal distribution provided the most robust dispersion in data.
Because the samples within a subgroup should be collected under like conditions the values from these samples should be normally distributed. As such, Rbar/d2 can be used to compute the within subgroup standard deviation and used to compute Cp. However, if the difference between the subgroup averages differ significantly, and, behave in a matter that suggests special cause variation then compute the total standard deviation across all the data. Then compare the total standard deviation to the Rbar/d2 value. You will notice that they differ.
Good Afternoon,
Sigma = R/D2
is it Standard deviation of all measurements
OR
is it Standard deviation of means of means (which will be smaller)
It is the standard deviation within subgroups not the total standard deviation within and between subgroups.
The average range is a value that represents the mean difference within a subgroup. If the samples within that subgroup are collected under like conditions then it estimates the variation due to common causes. Dividing R-bar by d2 then provides an estimate of the standard deviation across the samples within the subgroups.
This is why we plot the range on a Range chart. As long as the plot does not show any patterns or trends then R-bar is a good estimate of the range we observe, on average, within those subgroups.
Note that the average difference between the subgroups is not captured in the R-bar by d2 estimate. Think of the Xbar and R chart as a OneWay ANOVA where each subgroup represents a treatment. When we decompose the total variation we compute within and between treatment variation. The Control Limits are then based on the variability within treatment and when a treatment exceeds a limit we then have some evidence to suggest an effect occurred between the treatments.
Best Regards,
Andrew
Hi Andrew, from what I can see the derivation of d3, D3 and D4 assume that the range statistics are normally distributed. But that is not the case for a range calculated from a small number of samples.
Especially for a range of 2 numbers (like in IMR chart), the statistics follow a chi-distribution of dof=1. So that the value of D3 and D4 contain about 97% instead of 99.73%. This makes the average run length a lot less for the IMR chart, about 50 instead of 370 for the X-bar chart. From my own calculations you need D4 to be around 4.3 to 4.6, instead of 3.3 depending on what percentile you pick.
Do you know why the values for D3 and D4 are based on this assumption instead of on the percentiles of the chi-distribution? Especially since range statistics are used with very low number of samples. I’ve nor been able to find any answer anywhere.
Hello Ron.
Yes, as the subgroup sample size increases the range statistics do become more and more normally distributed.
This is observed when you see the distribution of range values in one of my posts.
I think I recall seeing something from Wheeler on this topic (wrt IMR Charts and their constants). So please be patient as I try to search that out.
And many thanks for bringing this point to my attention as well as readers of this blog.
I trust what is discovered will lead to another blog post on this very topic.
Hi Andrew, thanks for the response (and luckily I checked my junk email box). In EMP III Wheeler does talk about the alpha error of a specific chart design, which is probably related. In SQC Montgomery does mention it in chapter 6.4 (but he seems to not be a fan of the IMR chart in that book), and also links an article, Crowder S. V. (1987b). “Computation of ARL for Combined Individual Measurement and Moving Range Charts,” Journal of Quality Technology, Vol. 19(1), pp. 98–102..
Hope that helps you out. Looking forward to what you find.
Hello Ron. I really appreciate your input. Thank you.
Andrew – I read your online article about calculating control limits, along with all the comments, but am still confused as to when to use d2 or stdev.
From what I could conclude, for Xbar and R charts, you use d2.
For I and MR charts you use stdev.
Is that correct?
Thanks
Mike
Hello Michael!
Please refer to the following post: https://andrewmilivojevich.com/control-chart-constants-how-to-derive-a2-and-e2/
This post will show you how the d2 constant is used for Individual and Moving Range charts.
Best Regards,
Andrew Milivojevich
Andrew,
thanks for this post. You make a complicated topic very easy to understand. Did you ever come to write the calculus-based computations for d2? If not, do you have a source I can look at?
Again, thanks.
Hello Alberto. I never did, but I think I did find a source that discussed this. Give me a chance to look for it and I will get back to you here.
Thank you
Dear Sir.,
many thanks , I’d like to understand the relation between standard deviation and rbar/d2. may the following clarify my gap in right understand:
For subgroup n=2 :
standard deviation square (s2) =( (x1-r1)2 + (x2-r1)2)/n-1
mathematically =( (x1-x2)/sqrt2)2
i.e.,standard deviation( s) = r1/sqrt2
not r1 / 1.128.
Iam sorry that in my previous reply, I can not adjust coulom.
Thanks and best regards.
Dear Sir.,
many thanks , I’d like to understand the relation between standard deviation and rbar/d2. may the following clarify my gap in right understand:
n=2
subgroup x1 x2 r std dev.
1 1 1.2 0.2 0.141421 s2 =( (x1-r1)2 + (x2-
r1)2)/n-1
2 0.7 0.99 0.31 0.205061 mathematically =( (x1-
x2)/sqrt2)2
Average Sr 0.173241 i.e. s = r1/sqrt2
rbar/d2 0.226064 not r1 / 1.128
n=3
subgroup x1 x2 x3 r std dev.
1 1 1.1 1.2 0.2 0.1
2 0.7 1 0.99 0.3 0.170392
Average Sr 0.135196
rbar/d2 0.147929
n=5
subgroup x1 x2 x3 x4 x5 r std dev.
1 1 1.1 1.2 0.98 1.3 0.32 0.135204
2 1.11 1 0.99 1.3 0.96 0.36 0.139535
Average Sr 0.137369
rbar/d2 0.145923
Many thanks and best regards.
Many thanks for the valuable explanation of how d2 came from.
I have 2 questions:
1: I computed Sr mathematically, it does not equal to average R / d2 except for subgroup n=2.
2: What is the relation between standard deviation and average r/d2 mathematically. Same question for reproducibility which is R(xbar)/d2.
Thanks and best regards.
Hello Adel.
With respect to question number 1. Can you clarify ‘Sr’?
Dear Sir.,
many thanks , I’d like to understand the relation between standard deviation and rbar/d2. may the following clarify my gap in right understand:
n=2
subgroup x1 x2 r std dev.
1 1 1.2 0.2 0.141421 s2 =( (x1-r1)2 + (x2-
r1)2)/n-1
2 0.7 0.99 0.31 0.205061 mathematically =( (x1-
x2)/sqrt2)2
Average Sr 0.173241 i.e. s = r1/sqrt2
rbar/d2 0.226064 not r1 / 1.128
n=3
subgroup x1 x2 x3 r std dev.
1 1 1.1 1.2 0.2 0.1
2 0.7 1 0.99 0.3 0.170392
Average Sr 0.135196
rbar/d2 0.147929
n=5
subgroup x1 x2 x3 x4 x5 r std dev.
1 1 1.1 1.2 0.98 1.3 0.32 0.135204
2 1.11 1 0.99 1.3 0.96 0.36 0.139535
Average Sr 0.137369
rbar/d2 0.145923
Many thanks and best regards.
Hello Andrew, thanks a lot, your explanation is quite clear
Hello, Andrew, appreciated your explanation, it’s quite clear
Very informative. I really appreciate the way the things are explained in a very comprehensive way. Thanks
Thank you!
Thank you very much for your help!
Andrew!
You figured it out! Yes, the average was 28. I obviously was staying up too late when I typed my initial question to you. I appreciate the help! So, did you know that s=6 just because this is a Six Sigma question? Or how did you know that s=6? That’s the only thing I’m not very clear on now.
Thank you!
Hello Riley – I am glad that we got to the bottom of this question!
We often say the spread in a Normal Distribution is +/-3s (where “s” is the standard deviation) and this equals a total spread of 6s. As such, the process spread is typically referred to as 6s. So when the question said the spread was 10 I expressed it as: 10 / 6 = 1.67 to figure out what one standard deviation equals.
Now, the formula for Cpk is: (USL – Xbar) / 3s OR (Xbar – LSL) / 3s. Notice the denominator in both expressions is 3s. In this case, the denominator is 1/2 the process spread. Now, the reason why it is 3s is because we are only looking at that side (tail) of the distribution that is closest to a specification limit. In our case, the process average is 28, and since it falls above the nominal of 25 the process will tend to have more out-of-specification units that fall above the USL.
Going forward, always remember that the process spread equals 6s and you should be fine.
Best Regards,
Andrew
Hello Andrew,
I’m studying for the CSSBB exam and found a practice question that didn’t make sense to me. It says the spread is 10, USL = 15, LSL = 15 and the average 10 and they want me to calculate the Cpk. They get 1.4 and I don’t see how they calculated the sigma. I’m assuming they used an r bar and d2 equation but I don’t know how they did it. Can you explain this to me, please?
Hello Riley, in your comment above USL = LSL = 15. This would result in a spread of zero – which is unusual. This would mean that the tolerance equals the nominal value and no variation is tolerated.
Can you clarify if the spread = 10 means the process spread or the tolerance spread?
Best Regards,
Andrew Milivojevich
Hello Andrew,
Sorry! The USL is 35. All the question says is that the spread is 10. There is no further explanation of the spread.
Hello Riley. Thanks for the update!
Let’s see if we can solve your problem!
With a USL = 35 and LSL = 15 the tolerance spread is 20. Since the question mentions a Spread = 10 and given it does not equal the tolerance spread it must be referring to the process spread which is equal to 6s where “s” is the standard deviation. In this case, s = 10/6 = 1.67 (rounded to 2 decimal places).
But, we have another problem. CpK is a measure of the distance to the nearest tolerance limit divided by 1/2 the process spread (which would be 3 x 1.67 = 5.0 or 1/2 the spread of 10). Since the average is 10 and the nearest tolerance limit is 15, the average is less than the LSL. As such, the CpK would be negative. So I can’t see how they would get a result of 1.4 unless the process average was another value that fell inside the tolerance limtis.
Let’s assume the process average was on nominal (target = 25). Under this assumption the Cpk would be 10 / 5 = 2.0. For the Cpk to equal 1.4 the process average would need to be either 22 (15+7) or 28 (35-7). Is it possible that the process average was 22 or 28?
Please let me know.
I hope this helps.
Andrew
Pls explain on which condition we can use
1) Rbar/D2
2) sqrt of variance
for calculating sd?
The Rbar/d2 is typically used in the calculation of control limits for the Xbar and Range control charts. The Rbar/d2 is also used to compute Process Capability indices Cp and Cpk.
The square root of the variance is used when computing the total dispersion in an entire data set. in statistical process control applications it is used to compute Performance Performance indices Pp and Ppk.
Yes, thanks for explanation. In order to let me understood d2 was came from constants for control charts.
Hi Andrew,
I was reading your excellent article “Range Statistics and d2 Constant – How to Calculate Standard Deviation” on the d2 constant and I would like to ask you for some clarification of technique for my own understanding please. You simulated 100,000 normal data points in minitab, then a second column of 100,000 normal data points and then subtracted the min from the max of each row to get 100,000 range values in C3. I followed up to that step just fine but I would like some help to understand exactly what the histogram of n=2 (and similiarly n=5) actually is please in terms of how to manipulate the data to create the histogram. So for example, if I plot C3 (ranges) I get what looks like a normal curve. How did you plot sub-groups of n=2? Is it the average of row 1 and 2, 3 and 4 and so on to give 50,000 values?
Any help you can give will be greatly appreciated.
Hello Andy.
I trust the following will answer your question.
The histogram is simply the plot of the range values.
The n=2 and n=5 represent the subgroup samples size.
For n=2, I computed the Range based on 2 columns of normally distributed values generated by Minitab (100,000 values with a mean of zero and standard deviation of 1). I subtracted the largest from the smallest value for each row to compute 100,000 range values. This becomes the histogram based on a subgroup size of n=2.
For n=5, I computed the Range based on 5 columns of normally distributed values generated by Minitab (100,000 values with a mean of zero and standard deviation of 1). I subtracted the largest value from the smallest value across 5 columns, for each row, to compute 100,000 range values. This becomes the histogram based on a subgroup of n=5.
I trust this explanation brings clarity.
Many thanks for your inquiry.
Andrew Milivojevich
thank you very much; you don’t know how much I appreciate finally finding a clear explanation for the source of d2 values after getting frustrated going into many references without having a clue about its origin.
Hello Christian.
Thank you for your kind words. I am glad to hear this blog post brought you some clarity.
In the event, you’d like to learn where the d3 constant comes from, I would refer you to the following blog post: https://andrewmilivojevich.com/xbar-and-r-chart/
Best regards,
Andrew
Hello Mr Andrew,
I have one doubt
their are two formulas for calculating sigma value. ( standard deviation )
R Bar/d2 & another one
At what condition we have to use RBar /d2 & other formula
Regards
Baswaraj patil
Hello Baswaraj.
Thank you for your question.
For clarification, there are 3 types of Control Charts used for Variable data.
1. X-bar & Range
2. Individual & Moving, and
3. X-bar & S chart.
Are you asking what formula is used to estimate the short-term standard deviation for each of these control charts?
Great explanation! Thank you Andrew
Thanks!
The chart shown for N=2 is incorrect. You discuss getting d2 for n=2 of 1.13 but the chart shows 1.33. Not sure what went askew there but that is an issue. I am in need of the equation to calculate the d2 constant. Do you know it and if so can you share it?
Hello Paul, thanks for letting my know about the typo.
I have since corrected it and updated the Histogram. It now shows the mean for d2=1.13 for the distribution of range values for n=2.
As for the formula. I recall seeing it show time ago. I will need to check my library of text books. If I find it I will post a reply here.
Best regards,
Andrew
I am in need of the equation to calculate the d2 constant. Do you know it and if so can you share it?
I found it from following paper not quite sure though.
http://www.okstate.edu/sas/v8/saspdf/qc/chapc.pdf
can you please have a look.
Hello Nadeeka,
I have computed the d2 constant through simulation for various n. In all cases I used the data from a standard normal distribution.
The reference you speak of appears correct. Note that the standard normal cumulative distribution function is used in the integral.
The reference to Tippett refers to a paper he published in 1925. The title of the paper is: Tippet, L. On the Extreme Individuals and the Range of Samples Taken from a Normal Population Biometrika, 1925, 17, 364-387
Here is another reference: Newman, D. The Distribution of Range in Samples from a Normal Population, Expressed in Terms of an Independent Estimate of Standard Deviation Biometrika, 1939, 31, 20-30
I trust these references will be useful to you.