This page has been proofread, but needs to be validated.
400
PROBABILITY
[LAWS OF ERROR


shape. Although the regression-equations obtained would not accurately fit the original material, yet they would have a certain correspondence thereto. What sort of correspondence may be illustrated by an example in games of chance, which Professor Weldon kindly supplied. Three half-dozen of dice having been thrown, the number of dice with more than three points in that dozen which is made up of the first and the second half-dozen is taken for y, the number of sixes in the dozen made up to the first and the third half-dozen, is taken for x.

2
 10/72


 25/72
 7/72 1/72


 10/72 1/72


 15/72


 3/72  0 
0 1 2

Fig. 13.

Thus each twofold observation (xy) is the sum of six twofold elements, each of which is subject to a law of frequency represented in fig. 13; where[1] the figures outside denote the number of successes of each kind, for the ordinate the number of dice with more than three points (out of a cast of two dice), for the co-ordinate the number of sixes (out of a cast of two dice, one of which is common to the aforesaid cast); and the figures inside denote the comparative probabilities of each twofold value (e.g. the probability of obtaining in the first two cast dice each with more than three points, and in the second cast two sixes, is 1/72). Treating this law of frequency according to the rule which is proper to the normal law, we have (for the element) if the sides of the compartments each = i

; ; .

Whence for the regression-equation which gives the value of the ordinate most probably associated with an assigned value of the abscissa we have y = x × rσ2/σ1 = 0.3x; and for the other regression-equation, x = y/6.

0 1 2 3 4 5 6 7 8 9 10 11 12
12   1
11   4   3   3   3   1
10   3  17  15  13  10   4   3   1
9  12  51  59  61  36  14   5   3
8  36  135  154  150  64  21   5   2   1
7  74  195  260  179  112  35   5   1
6  90  248  254  170  75  26   3
5  93  220  230  124  51   8   2
4  86  162  127  75  19   4   1
3  37  86  56  17   6   2
2  14  23  23   4   3
1   2   4
0

Accordingly, in Professor Weldon's statistics, which are reproduced in the annexed diagram, when x = 3 the most probable value of y ought to be 1. And in fact this expectation is verified, x and y being measured along lines drawn through the centre of the compartment, which ought to have the maximum of content, representing the concurrence of one dozen with two sixes and another dozen with six dice having each more than three points, the compartment which in fact contains 254 (almost the maximum content). In the absence of observations at x = −3i or y = ±6i, the regression-equations cannot be further verified. At least they have begun to be verified by batches composed of six elements, whereas they are not verifiable at all for the simple elements. The normal formula describes the given statistics as they behave, not when by themselves, but when massed in crowds; the regression-equation does not tell us that if x′ is the magnitude of one member the most probable magnitude of the other member associated therewith is rx′, but that if x′ is the average of several samples of the first member, then rx′ is the most probable average for the specimens of the other member associated with those samples. Mr Yule's proposal to construct regression-equations according to the normal rule “without troubling to investigate the normality of the distribution”[2] admits of this among other explanations.”[3] Mr Yules own view of the subject is well worthy of attention.

154. In the determination of the standard-deviation proper to the law of error (and other constants proper to other laws of frequency) Sheppard's Corrections. it commonly happens that besides the inaccuracy, which has been estimated, due to the paucity of the data, there is an inaccuracy due to their discrete character: the circumstance that measurement, e.g. of human heights, are given in comparatively large units, e.g. inches, while the real objects are more perfectly graduated. Mr Sheppard has prescribed a remedy for this imperfection. For the standard deviation let μ2 be the rough value obtained on the supposition that the observations are massed at intervals of unit length (not spread out continuously, as ideal measurements would be); then the proper value, the mean integral of deviation squared, say (μ2) = μ21/12h2, where h is the size of a unit, e.g. an inch. It is not to be objected to this correction that it becomes nugatory when it is less than the probable error to which the measurement is liable on account of the paucity of observations. For, as the correction is always in one direction, that of subtraction, it tends in the long run to be advantageous even though masked in particular instances by larger fluctuating errors.[4]

155. Professor Pearson has given a beautiful application of the theory of correlation to test the empirical evidence that a given Pearson's Criterion of Empirical Verification. group conforms to a proposed formula, e.g. the normal law of error.[5]

Supposing the constants of the proposed function to be known—in the case of the normal law the arithmetic mean and modulus—we could determine the position of any percentile, e.g. the median, say a. Now the probability that if any sample numbering n were taken at random from the complete group, the median of the sample, a′, would lie at such a distance from a that there should be r observations between a and a′ is

.[6]

If, then, any observed set has an excess which makes the above written integral very small, the set has probably not been formed by a random selection from the supposed given complete group. To extend this method to the case of two, or generally n, percentiles, forming (n + 1) compartments, it must be observed that the excesses say e and e′, are not independent but correlated. To measure the probability of obtaining a pair of excesses respectively as large as e and e′, we have now (corresponding to the extremity of the probability-curve in the simple case) the solid content of a certain probability-surface outside the curve of equal probability which passes through the points on the plane xy assigned by e, e′ (and the other data). This double, or in general multiple, integral, say P, is expressed by Professor Pearson with great elegance in terms of the quadratic factor, called by him χ2, which forms the exponent of the expression for the probability that a particular system of the values of the correlated e, e′, &c., should concur—

when n is odd; with an expression different in form, but nearly coincident in result, when n is even. The practical rule derived from this general theorem may thus be stated. Find from the given observations the probable values of the coefficients pertaining to the formula which is supposed to represent the observations. Calculate from the coefficients a certain number, say n, of percentiles; thereby dividing the given set into n + 1 sections, any of which, according to calculation, ought to contain say m of the observations, while in fact it contains m′. Put e for m′ − m; then χ2 = ∑e2/m. Professor Pearson has given in an appended table the values of P corresponding to values of n + 1 up to 20, and values of χ2 up to 70. He does not conceal that there is some laxity involved in the circumstance that the coefficients employed are not known exactly, only inferred with probability.[7]

156. Here is one of Professor Pearson's illustrations. The table on next page gives the distribution of 1000 shots fired at a line in a target, the hits being arranged in belts drawn on the target parallel to the line. The “normal distribution” is obtained from a normal curve, of which the coefficients are determined from the observations. From the value of χ2, viz. 45.8, and of (n + 1), viz. 11, we deduce, with sufficient accuracy from Professor Pearson's table, or more exactly from the formula on which the table is based, that P = .000,001,5 · ·. “In other words, if shots are distributed on a target according to the normal law, then such a distribution as that cited could only be expected to occur on an average some 15 or 16 times in 10,000,000 times.”

157. “Such a distribution” in this argument must be interpreted The Criterion Criticized. as a distribution for which it is claimed that the observations are all independent of each other. Suppose that there were only 500 independent observations, the remainder being merely duplicates of these 500. Then in the above


  1. Cf. above, par. 115.
  2. Proc. Roy. Soc., vol. 60, p. 477.
  3. Below, par. 168.
  4. Just as the removal of a tax tends to be in the long run beneficial to the consumer, though the benefit on any particular occasion may be masked by fluctuations of price due to other causes.
  5. Phil. Mag. (July, 1900).
  6. As shown above, par. 103.
  7. Loc. cit. p. 166.