This page has been proofread, but needs to be validated.
27

99% confidence that the true mean is within the interval -0.23 to 0.27. Actually the true mean for this dataset is zero.

Table 1. Values of the t distribution for 95% and 99% confidence limits (two-tailed) and for different sample sizes [Fisher and Yates, 1963].
N: 2 3 4 5 6 7 8 9 10 11
t95: 12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228
t99: 63.66 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169
 
N: 12 13 14 15 16 17 18 19 20 21
t95: 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086
t99: 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861  2.845
 
N: 22 23 24 25 30 40 60 80 100
t95: 2.080 2.074 2.069 2.064 2.045 2.023 2.001 1.990 1.984 1.960
t99: 2.831 2.819 2.807 2.797 2.756 2.713 2.662 2.640 2.627 2.576

Selection of a confidence level (α95, α99, etc.) usually depends on one’s evaluation of which risk is worse: the risk of incorrectly identifying a variable or effect as significant, or the risk of missing a real effect. Is the penalty for error as minor as having a subsequent researcher correct the error, or could it cause disaster such as an airplane crash? If prior knowledge suggests one outcome for an experiment, then rejection of that outcome needs a higher than ordinary confidence level. For example, no one would take seriously a claim that an experiment demonstrates test-tube cold fusion at the 95% confidence level; a much higher confidence level plus replication was demanded. Most experimenters use either a 95% or 99% confidence level. Tables for calculation of confidence limits other than 95% or 99%, called tables of the t distribution, can be found in any statistics book.

How Many Measurements are Needed?

The standard error of the mean σX is also the key to estimating how many measurements to make. The definition σX=σN-0.5 can be recast as N=σ22X. Suppose we want to make enough measurements to obtain a final mean that is within 2 units of the true mean (i.e., σX≤2), and a small pilot study permits us to calculate that our measurement scatter σ≈10. Then our experimental series will need N≥102/22, or N≥25, measurements to obtain the desired accuracy at the 68% confidence level (or 1σX). For about 95% confidence, we recall that about 95% of points are within 2σ of the mean and conclude that we would need 2σX≤2, so N≥102/12, or N≥100 measurements. Alternatively and more accurately, we can use the t table above to determine how many measurements will be needed to assure that our mean is within 2 units of the true mean at the 95% confidence level (α95>≤2): we need for t9595X95N0.5/σ=2N0.5/10=0.2N0.5 to be greater than the t95 in the table above for that N. By trying a few values of N, we see that N≥100 is needed.

As a rule of thumb, one must quadruple the number of measurements in order to double the precision of the result. This generalization is based on the N0.5 relationship of standard deviation to standard error and is strictly true only if our measure of precision is the standard error. If, as is of-