## Range Statistics and the d2 Constant Used in Statistical Process Control Charts

Range statistics are often used in statistical process control charting. One type of statistical process control chart is the **average and range chart**. Another type is the individual and moving range chart. To **calculate control limits** for each SPC chart requires we estimate the standard deviation. This estimate of the standard deviation depends on the sampling program. Click on **link** to learn more about **statistical process control limits and how they’re derived.**

#### Range Statistics for the Average and Range Charts

When using an average and range chart a typical sample size is n=5 consecutive sample specimens and we call this a subgroup. Each sample specimen, in a subgroup, should be produced under homogeneous or like conditions. When produced under like conditions we say the samples specimens came from a system of common causes. The time between subgroups should be constant, if possible. In case of a special cause event, more subgroups may be needed to confirm the event is real.

#### Range Statistics for the Individuals and Moving Range Charts

When using an individual and moving range chart there are n=1 data points per sampling period. It is a subgroup of n=1 sample. The time between data points should be constant, when possible. Since the change between two consecutive data points captures the smallest time frame we use a moving range of n=2 consecutive data points. For example, the first moving range is the absolute value of the difference between the second and first data point. Likewise the second moving range is the absolute value of the difference between the third and second data point and so on.

#### How Do We Calculate the Standard Deviation From these Two Range Statistics?

To calculate the standard deviation for these two range statistics we use the following expressions. The first is for the average and range charts and the second is for the individual and moving range charts.

#### Range Statistics and the d2 Constant

Where did the d2 constant come from? They’re found in statistical process control tables. But how were they derived? It’s better explained by numerical example and it’s something that you can also figure out on your own using Excel or Minitab.

In Minitab, I created a column of 100,000 normally distributed numbers having a mean of 0 and standard deviation of 1. In the next column I created another column of 100,000 normally distributed numbers having the same mean and standard deviation. In the third column I computed the range statistics. To compute the range statistics I subtracted the smallest from the largest value for each row. This yields a column of 100,000 range values. Shown in the figure below is a histogram for the range statistics for n=2.

When I compute the average for the histogram of range statistics for n=2 we have d2=1.13. Notice this value is the same value found in any statistical process control table. Let see if we get the same value for d2, found in statistical process control tables for n=5.

When I compute the average for the histogram of range statistics for n=5 we get d2=2.33. Notice this value is the same value reported in any statistical process control table. Note the d2 constant is reported to two decimal places.

#### Things to keep in Mind about Range Statistics

Notice that I used normally distributed values to compute the range statistics in Minitab. Also notice the shape of the distribution becomes normally distributed as the sample size increases from n=2 to n=5. This is one reason why n=5 is a common subgroup sample size. Also notice we used histograms to model an infinite curve. If we knew the probability distribution function for n=2 and n=5 we could precisely compute the d2 constants using calculus. That I will leave for another post.

#### The d2 Constant Explained!

So there you have it. If you ever wondered where these d2 values came from – well now you know! Now I need you to let me know if you appreciate this post by leaving your comments below. I look forward to hearing from you.

**If you enjoyed this blog post, then consider reading these interesting Articles!**

Paul Ramey says

The chart shown for N=2 is incorrect. You discuss getting d2 for n=2 of 1.13 but the chart shows 1.33. Not sure what went askew there but that is an issue. I am in need of the equation to calculate the d2 constant. Do you know it and if so can you share it?

Andrew Milivojevich says

Hello Paul, thanks for letting my know about the typo.

I have since corrected it and updated the Histogram. It now shows the mean for d2=1.13 for the distribution of range values for n=2.

As for the formula. I recall seeing it show time ago. I will need to check my library of text books. If I find it I will post a reply here.

Best regards,

Andrew

Nadeeka says

I am in need of the equation to calculate the d2 constant. Do you know it and if so can you share it?

I found it from following paper not quite sure though.

http://www.okstate.edu/sas/v8/saspdf/qc/chapc.pdf

can you please have a look.

Andrew Milivojevich says

Hello Nadeeka,

I have computed the d2 constant through simulation for various n. In all cases I used the data from a standard normal distribution.

The reference you speak of appears correct. Note that the standard normal cumulative distribution function is used in the integral.

The reference to Tippett refers to a paper he published in 1925. The title of the paper is: Tippet, L. On the Extreme Individuals and the Range of Samples Taken from a Normal Population Biometrika, 1925, 17, 364-387

Here is another reference: Newman, D. The Distribution of Range in Samples from a Normal Population, Expressed in Terms of an Independent Estimate of Standard Deviation Biometrika, 1939, 31, 20-30

I trust these references will be useful to you.

Supat says

Great explanation! Thank you Andrew

Andrew Milivojevich says

Thanks!

Baswaraj Patil says

Hello Mr Andrew,

I have one doubt

their are two formulas for calculating sigma value. ( standard deviation )

R Bar/d2 & another one

At what condition we have to use RBar /d2 & other formula

Regards

Baswaraj patil

Andrew Milivojevich says

Hello Baswaraj.

Thank you for your question.

For clarification, there are 3 types of Control Charts used for Variable data.

1. X-bar & Range

2. Individual & Moving, and

3. X-bar & S chart.

Are you asking what formula is used to estimate the short-term standard deviation for each of these control charts?

christian khoury says

thank you very much; you don’t know how much I appreciate finally finding a clear explanation for the source of d2 values after getting frustrated going into many references without having a clue about its origin.

Andrew Milivojevich says

Hello Christian.

Thank you for your kind words. I am glad to hear this blog post brought you some clarity.

In the event, you’d like to learn where the d3 constant comes from, I would refer you to the following blog post: https://andrewmilivojevich.com/xbar-and-r-chart/

Best regards,

Andrew

Andy says

Hi Andrew,

I was reading your excellent article “Range Statistics and d2 Constant – How to Calculate Standard Deviation” on the d2 constant and I would like to ask you for some clarification of technique for my own understanding please. You simulated 100,000 normal data points in minitab, then a second column of 100,000 normal data points and then subtracted the min from the max of each row to get 100,000 range values in C3. I followed up to that step just fine but I would like some help to understand exactly what the histogram of n=2 (and similiarly n=5) actually is please in terms of how to manipulate the data to create the histogram. So for example, if I plot C3 (ranges) I get what looks like a normal curve. How did you plot sub-groups of n=2? Is it the average of row 1 and 2, 3 and 4 and so on to give 50,000 values?

Any help you can give will be greatly appreciated.

Andrew Milivojevich says

Hello Andy.

I trust the following will answer your question.

The histogram is simply the plot of the range values.

The n=2 and n=5 represent the subgroup samples size.

For n=2, I computed the Range based on 2 columns of normally distributed values generated by Minitab (100,000 values with a mean of zero and standard deviation of 1). I subtracted the largest from the smallest value for each row to compute 100,000 range values. This becomes the histogram based on a subgroup size of n=2.

For n=5, I computed the Range based on 5 columns of normally distributed values generated by Minitab (100,000 values with a mean of zero and standard deviation of 1). I subtracted the largest value from the smallest value across 5 columns, for each row, to compute 100,000 range values. This becomes the histogram based on a subgroup of n=5.

I trust this explanation brings clarity.

Many thanks for your inquiry.

Andrew Milivojevich

Ai Ling Teoh says

Yes, thanks for explanation. In order to let me understood d2 was came from constants for control charts.

kaushik says

Pls explain on which condition we can use

1) Rbar/D2

2) sqrt of variance

for calculating sd?

Andrew Milivojevich says

The Rbar/d2 is typically used in the calculation of control limits for the Xbar and Range control charts. The Rbar/d2 is also used to compute Process Capability indices Cp and Cpk.

The square root of the variance is used when computing the total dispersion in an entire data set. in statistical process control applications it is used to compute Performance Performance indices Pp and Ppk.

Riley McCleary says

Hello Andrew,

I’m studying for the CSSBB exam and found a practice question that didn’t make sense to me. It says the spread is 10, USL = 15, LSL = 15 and the average 10 and they want me to calculate the Cpk. They get 1.4 and I don’t see how they calculated the sigma. I’m assuming they used an r bar and d2 equation but I don’t know how they did it. Can you explain this to me, please?

Andrew Milivojevich says

Hello Riley, in your comment above USL = LSL = 15. This would result in a spread of zero – which is unusual. This would mean that the tolerance equals the nominal value and no variation is tolerated.

Can you clarify if the spread = 10 means the process spread or the tolerance spread?

Best Regards,

Andrew Milivojevich

Riley McCleary says

Hello Andrew,

Sorry! The USL is 35. All the question says is that the spread is 10. There is no further explanation of the spread.

Andrew Milivojevich says

Hello Riley. Thanks for the update!

Let’s see if we can solve your problem!

With a USL = 35 and LSL = 15 the tolerance spread is 20. Since the question mentions a Spread = 10 and given it does not equal the tolerance spread it must be referring to the process spread which is equal to 6s where “s” is the standard deviation. In this case, s = 10/6 = 1.67 (rounded to 2 decimal places).

But, we have another problem. CpK is a measure of the distance to the nearest tolerance limit divided by 1/2 the process spread (which would be 3 x 1.67 = 5.0 or 1/2 the spread of 10). Since the average is 10 and the nearest tolerance limit is 15, the average is less than the LSL. As such, the CpK would be negative. So I can’t see how they would get a result of 1.4 unless the process average was another value that fell inside the tolerance limtis.

Let’s assume the process average was on nominal (target = 25). Under this assumption the Cpk would be 10 / 5 = 2.0. For the Cpk to equal 1.4 the process average would need to be either 22 (15+7) or 28 (35-7). Is it possible that the process average was 22 or 28?

Please let me know.

I hope this helps.

Andrew

Riley McCleary says

Andrew!

You figured it out! Yes, the average was 28. I obviously was staying up too late when I typed my initial question to you. I appreciate the help! So, did you know that s=6 just because this is a Six Sigma question? Or how did you know that s=6? That’s the only thing I’m not very clear on now.

Thank you!

Andrew Milivojevich says

Hello Riley – I am glad that we got to the bottom of this question!

We often say the spread in a Normal Distribution is +/-3s (where “s” is the standard deviation) and this equals a total spread of 6s. As such, the process spread is typically referred to as 6s. So when the question said the spread was 10 I expressed it as: 10 / 6 = 1.67 to figure out what one standard deviation equals.

Now, the formula for Cpk is: (USL – Xbar) / 3s OR (Xbar – LSL) / 3s. Notice the denominator in both expressions is 3s. In this case, the denominator is 1/2 the process spread. Now, the reason why it is 3s is because we are only looking at that side (tail) of the distribution that is closest to a specification limit. In our case, the process average is 28, and since it falls above the nominal of 25 the process will tend to have more out-of-specification units that fall above the USL.

Going forward, always remember that the process spread equals 6s and you should be fine.

Best Regards,

Andrew

Riley McCleary says

Thank you very much for your help!

Hari says

Very informative. I really appreciate the way the things are explained in a very comprehensive way. Thanks

Andrew Milivojevich says

Thank you!

Mike Liang says

Hello, Andrew, appreciated your explanation, it’s quite clear

Mike Liang says

Hello Andrew, thanks a lot, your explanation is quite clear

Adel says

Many thanks for the valuable explanation of how d2 came from.

I have 2 questions:

1: I computed Sr mathematically, it does not equal to average R / d2 except for subgroup n=2.

2: What is the relation between standard deviation and average r/d2 mathematically. Same question for reproducibility which is R(xbar)/d2.

Thanks and best regards.

Andrew Milivojevich says

Hello Adel.

With respect to question number 1. Can you clarify ‘Sr’?

Adel says

Dear Sir.,

many thanks , I’d like to understand the relation between standard deviation and rbar/d2. may the following clarify my gap in right understand:

n=2

subgroup x1 x2 r std dev.

1 1 1.2 0.2 0.141421 s2 =( (x1-r1)2 + (x2-

r1)2)/n-1

2 0.7 0.99 0.31 0.205061 mathematically =( (x1-

x2)/sqrt2)2

Average Sr 0.173241 i.e. s = r1/sqrt2

rbar/d2 0.226064 not r1 / 1.128

n=3

subgroup x1 x2 x3 r std dev.

1 1 1.1 1.2 0.2 0.1

2 0.7 1 0.99 0.3 0.170392

Average Sr 0.135196

rbar/d2 0.147929

n=5

subgroup x1 x2 x3 x4 x5 r std dev.

1 1 1.1 1.2 0.98 1.3 0.32 0.135204

2 1.11 1 0.99 1.3 0.96 0.36 0.139535

Average Sr 0.137369

rbar/d2 0.145923

Many thanks and best regards.

Adel says

Dear Sir.,

many thanks , I’d like to understand the relation between standard deviation and rbar/d2. may the following clarify my gap in right understand:

n=2

subgroup x1 x2 r std dev.

1 1 1.2 0.2 0.141421 s2 =( (x1-r1)2 + (x2-

r1)2)/n-1

2 0.7 0.99 0.31 0.205061 mathematically =( (x1-

x2)/sqrt2)2

Average Sr 0.173241 i.e. s = r1/sqrt2

rbar/d2 0.226064 not r1 / 1.128

n=3

subgroup x1 x2 x3 r std dev.

1 1 1.1 1.2 0.2 0.1

2 0.7 1 0.99 0.3 0.170392

Average Sr 0.135196

rbar/d2 0.147929

n=5

subgroup x1 x2 x3 x4 x5 r std dev.

1 1 1.1 1.2 0.98 1.3 0.32 0.135204

2 1.11 1 0.99 1.3 0.96 0.36 0.139535

Average Sr 0.137369

rbar/d2 0.145923

Many thanks and best regards.

Adel says

Dear Sir.,

many thanks , I’d like to understand the relation between standard deviation and rbar/d2. may the following clarify my gap in right understand:

For subgroup n=2 :

standard deviation square (s2) =( (x1-r1)2 + (x2-r1)2)/n-1

mathematically =( (x1-x2)/sqrt2)2

i.e.,standard deviation( s) = r1/sqrt2

not r1 / 1.128.

Iam sorry that in my previous reply, I can not adjust coulom.

Thanks and best regards.