## Small Sample Confidence Intervals

### Upper confidence limits

The manager of a milk factory finds that the first batch of 10 milk containers produced that day contains 3 which have spoilt milk.  He is concerned that the other batches might also have spoilt containers.  Although they might too have 30% of the containers spoilt, he is concerned there could be a larger percentage.  He wants to know the probable maximum percentage of containers that might be spoilt.

Solution

The approach to this problem is to assume that there is a probability ‘s’ that any one container is spoilt.  We want to find the highest probability this ‘s’ can be while staying consistent with the data from the first batch.  Obviously s = 0.3 is consistent with the first batch, what about s = 0.4? We need to pick a significance level – let’s use the standard 5%.  We look up the binomial table for n = 10:

s = 0.4, f = 0.6

 X 0 1 2 3 4 5 P(X) 0.006 0.0403 0.1209 0.215 0.2508 0.2007 X 6 7 8 9 10 P(X) 0.1115 0.0425 0.0106 0.0016 0.0001

Now we want to find out the minimum number of successes whose probabilities add to more than 5%:

P(0) = 0.6 % – too small

P(0) + P(1) = 0.6% + 4.03% = 4.63% – too small

P(0) + P(1) + P(2) = 0.6% + 4.03% + 12.09% = 16.72% – large enough (larger than our 5% significance level).

So if the number of spoilt containers in the first batch is 2 or more, it is consistent with s = 0.4 – it is consistent.

Let’s try s = 0.5:

s = 0.5, f = 0.5

 X 0 1 2 3 4 5 P(X) 0.001 0.0098 0.0439 0.1172 0.2051 0.2461 X 6 7 8 9 10 P(X) 0.2051 0.1172 0.0439 0.0098 0.001

Again, find out the minimum number of successes whose probabilities add up to more than 5%:

P(0) = 0.1% – too small

P(0) + P(1) = 0.1% + 0.98% = 1.08% – too small

P(0) + P(1) + P(2) = 0.1% + 0.98% + 4.39% = 5.47% – large enough.

So once again, if the number of spoilt containers in the first batch is larger or equal to 2, it is consistent with s = 0.5 – it is.

Try:

s = 0.6, f = 0.4

 X 0 1 2 3 4 5 P(X) 0.001 0.0016 0.0106 0.0425 0.1115 0.2007 X 6 7 8 9 10 P(X) 0.2508 0.215 0.1209 0.0403 0.006

Again, find out the minimum number of successes whose probabilities add up to more than 5%:

P(0) = 0.1% – too small.

P(0) + P(1) = 0.1% + 0.16% = 0.26% – too small.

P(0) + P(1) + P(2) = 0.1% + 0.16% + 1.06% = 1.32% – too small

P(0) + P(1) + P(2) + P(3) = 0.1% + 0.16% + 1.06% + 4.25% = 5.57%  – large enough.

So if there are 3 or more spoilt containers in the first batch, it is consistent with s = 0.6 – there are, so it is consistent.

Try:

s = 0.7, f = 0.3

 X 0 1 2 3 4 5 P(X) 0 0.0001 0.0014 0.009 0.0368 0.1029 X 6 7 8 9 10 P(X) 0.2001 0.2668 0.2335 0.1211 0.0282

P(0) = 0 – too small.

P(0) + P(1) = 0 + 0.01% = 0.01% – too small.

P(0) + P(1) + P(2) = 0 + 0.01% + 0.14% = 0.15% – too small.

P(0) + P(1) + P(2) + P(3) = 0 + 0.01% + 0.14% + 0.9% = 1.05% – too small.

P(0) + P(1) + P(2) + P(3) + P(4) = 0 + 0.01% + 0.14% + 0.9% + 3.68% = 4.73% – too small.

P(0) + P(1) + P(2) + P(3) + P(4) + P(5) = 0 + 0.01% + 0.14% + 0.9% + 3.68% + 10.29% = 15.02% – large enough.

So if there are 5 or more spoilt containers in the first batch, it will be consistent with an ‘s’ value of 0.7.  There aren’t, so we know that the highest values of ‘s’ we can have that is still consistent with the first batch sample, is between 0.6 and 0.7.

So the manager can say with confidence that there may be up to over 60% of containers spoilt on average, but it is unlikely that there is a 70% or higher average spoiling rate.  Since we were trying to find the largest value of ‘s’, we call this the upper 95% confidence limit (95% because we used a 5% level of significance – 100% – 5% = 95%).  If he wanted to have even more assurance he could use a smaller level of significance – maybe 1% instead of 5%.

What we have just done in effect is assume higher and higher average probabilities of a container being spoilt, and then found whether only 3 being spoilt could have occurred from random variation.

### Lower confidence limits

In this example, we found the highest possible value of ‘s’ that would be consistent with the information from the first batch of containers.  This was done to find the maximum possible percentage of containers that might have spoilt in other batches.

One can also find the lowest possible value of ‘s’ that would be consistent with the information from the first batch of containers.  This might be done to find out the minimum possible percentage of containers that might be spoilt on average.

So now we are trying to find the lowest value of ‘s’ that will be consistent with the first batch sample.  Obviously s = 0.3 is consistent, so let’s try s = 0.2.

s = 0.2, f = 0.8

 X 0 1 2 3 4 5 P(X) 0.1074 0.2684 0.302 0.2013 0.0881 0.0264 X 6 7 8 9 10 P(X) 0.0055 0.0008 0.0001 0 0

When you are working out the smallest value of ‘s’ that is still consistent with the first sample, your ‘P’s have to add up to (100 – level of significance).

For our example, this means 95%, since we are using a 5% level of significance.

P(0) = 0.1074 – too small.

P(0) + P(1) = 0.1074 + 0.2684 = 0.3758 – too small.

Continue…

P(0) + P(1) + P(2) + P(3) + P(4) = =0.9672 = 96.72% – large enough.

So for ‘s’ = 0.2, if the sample has 4 or less spoilt containers, it is consistent – it does, so let’s try a smaller ‘s’.  Notice how now it is 4 or less – because we are trying to find the smallest value of ‘s’.

s = 0.1, f = 0.9

 X 0 1 2 3 4 5 P(X) 0.3487 0.3874 0.1937 0.0574 0.0112 0.0015 X 6 7 8 9 10 P(X) 0.0001 0 0 0 0

P(0) – too small.

P(0) + P(1) – too small

P(0) + P(1) + P(2) – too small

P(0) + P(1) + P(2) + P(3) = 98.72% – large enough.

So if the sample has 3 or less spoilt containers, it is consistent – it does, so let’s try a lower ‘s’.

s = 0.05, f = 0.95

 X 0 1 2 3 4 5 P(X=x) 0.5987 0.3151 0.0746 0.0105 0.001 0.0001 X 6 7 8 9 10 P(X=x) 0 0 0 0 0

P(0) – too small

P(0) + P(1) – too small

P(0) + P(1) + P(2) = 98.84% – large enough.

So if the sample has 2 or less spoilt containers, it is consistent with s = 0.05 – it doesn’t.

So we can say that the lowest possible value of ‘s’ we can have that is consistent with the sample is between 0.05 and 0.1 – between 5% and 10%.

So the manager can confidently say that if he was very lucky, less than 10% of the containers may be spoilt, but even with luck it is unlikely that less than 5% of them have spoilt.

What we have just found is the lower 95% confidence limit.

The upper 95% confidence limit and the lower 95% confidence limit together form what is called the 90% confidence interval.

It is 90% because you have allowed an upper margin of 5% and a lower margin of 5% – they add together to give 10%, which you take from 100% to give your overall confidence interval of 90%.

The milk problem’s 90% confidence interval (conservatively) is:

Or: