Page:EB1911 - Volume 22.djvu/410

This page has been proofread, but needs to be validated.

396

PROBABILITY

[LAWS OF ERROR

+1.25, +0.75, −1, −1, +5.5, −2.75, +0.75, −2, +1.75, +3.25, +0.25, −2.75, −2.25, −0.5, +4.75, +0.25.

If, instead of sixteen, a million digits went to each batch, the general character of the series would be much the same; the aggregate figures would continue to hover about zero with a standard deviation of 8.25, a probable error of nearly 2. Here for instance are seven aggregates formed by recombining 252 out of the 256 digits above utilized into batches of 36 according to the prescribed rule: viz. subtracting 36 × 4.5 from the sum of each batch of 36 and dividing the remainder by √36:—

−0.5, +3.3, +2.6, −0.6, +1.5, −2, +1.

The illustration brings into view the circumstance that though the system of molecules may start with a distribution of velocities other than the normal, yet by repeated collisions the normal distribution will be superinduced. If both the velocities u and v are distributed according to the law of error for one dimension, we may presume that the joint values of u and v conform to the normal surface. Or we may reason directly that as the pair of velocities u and v is made up of a great number of elementary pairs (the co-ordinates in each of which need not, initially at least, be supposed uncorrelated) the law of frequency for concurrent values of u and v must be of the normal form which may be written^[1]

z = 1/2√(km.1 − r²) exp -[x²/k − 2rxy/√km + y²/m]/2(1 − r²).

It may be presumed that r, the coefficient of correlation, is zero, for, owing to the symmetry of the influences by which the molecular chaos is brought about, it is not to be supposed that there is any connexion or repugnance between one direction of u, say south to north, and one direction of v, say west to east. For a like reason k must be supposed equal to m. Thus the average velocity = 2k; which multiplied by m, the mass of a sphere, is to be equated to the average energy T/N. The reasoning may be extended with confidence to three dimensions, and with caution to contiguous molecules.

126. Correlation cannot be ignored in another application of the many-dimensioned law of error, its use in biological inquiries to Normal Correlation in Biology. investigate the relations between different generations. It was found by Galton that the heights and other measurable attributes of children of the same parents range about a mean which is not that of the parental heights, but nearer the average of the general population. The amount of this “regression” is simply proportional to the distance of the “mid-parent's” height from the general average. This is a case of very general law which governs the relations not only between members of the same family, but also between members of the same organism, and generally between two (or more) coexistent or in any way co-ordinated observations, each belonging to a normal group. Let x and y be the measurements of a pair thus constituted. Then^[2] it may be expected that the conjunction of particular values for x and y will approximately obey the two-dimensioned normal law which has been already exhibited (see par. 114).

121. Regression-lines.—In the expression above given, put $l/{\sqrt {km}}=r$ , and the equation for the frequency of pairs having values of the attribute under measurement becomes

z = 1/2π√km√1 − r² exp[(x − a)²/k − 2r(x − a)/√k(y − b)/√m + (y − b)²/m]/2(1 − r²).

This formula is of very general application.^[3] If two sets of measurements were made on the height, or other measurable feature, of the proverbial “Goodwin Sands” and “Tenterden Steeple,” and the first measurement of one set was coupled with the first of the other set, the second with the second, and so on, the pairs of magnitudes thus presented would doubtless vary according to the above-written law, only in that case r would presumably be zero; the expression for z would reduce to the product of the two independent probabilities that particular values of x and y should concur. But slight interdependences between things supposed to be totally unconnected would often be discovered by this law of error in two or more dimensions.^[4] It may be put in a more convenient form by substituting ξ for (x − a)/√k and η for (y − b)/√m. The equation of the surface then becomes z = (1/2π√1 − r²) exp −[ξ² − 2rξη + η²]/2(1 − r²). If the frequency of observations in the vicinity of a point is represented by the number of dots in a small increment of area, when r = 0 the dots will be distributed uniformly about the origin, the curves of equal probability will be circles. When r is different from zero the dots will be distributed so that the majority will be massed in two quadrants: in those for which ξ and η; are both positive or both negative when r is positive, in those for which ξ and η have opposite signs when r is negative. In the limiting case, when r = 1 the whole host will be massed along the line η = ξ, every deviation ξ being attended with an equal deviation η. In general, to any deviation of one of the variables ξ′ there corresponds a set or “array” (Pearson) of values of the other variable; for which the frequency is given by substituting ξ′ for ξ in the general equation. The section thus obtained proves to be a normal probability-curve with standard deviation √1 − r². The most probable value of η corresponding to the assigned value of ξ is rξ′. The equation η − rξ, or rather what it becomes when translated back to our original co-ordinates (y − b)/σ₂ = r(x − a)/σ₁, where σ₁, σ₂ are our √k, √m respectively,^[5] is often called a regression-equation. A verification is to hand in the above-cited statistics, which Weldon obtained by casting batches of dice. If the dice were perfect, r ( = l/√km) would equal ½, and as the dice proved not to be very imperfect, the coefficient is doubtless approximately = ½. Accordingly, we may expect that, if axes x and y are drawn through the point of maximum-frequency at the centre of the compartment containing 244 observations, corresponding to any value of x, say 2νi (where i is the side of each square compartment), the most probable value of y should be νi, and corresponding to y = 2νi the most probable value of x should be νi. And in fact these regression-equations are fairly well fulfilled for the integer values of ν (more than which could not be expected from discrete observations): e.g. when x = +4i, the value of y, for which the frequency (25) is a maximum, is as it ought to be +2i; when x = −2i the maximum (119) is at y = −i; when x = −4i the maximum (16) is at y = −2i; when y is +2i the maximum (138) is at x = +i; when y is −2i the maximum (117) at x = −i, and in the two cases (x = +2i and y = +4i), where the fulfilment is not exact, the failure is not very serious.

128. Analogous statements hold good for the case of three or more dimensions of error.^[6] The normal law of error for any number of variables, x₁ x₂ x₃, may be put in the form z = (1(2πn/2 √∆)exp − [R₁₁x₁² + R₂₂x₂² + &c. + 2R₁₂x₁x₂ + &c.]/2∆ where ∆ is the determinant:—

1	r₁₂	r₁₃	⋅⋅
r₂₁	1	r₂₃	⋅⋅
r₃₁	r₃₂	1	⋅⋅
:	:	:;

each r, e.g. r₂₃ ( = r₃₂), is the coefficient of correlation between two of the variables, e.g. x₂, x₃; R₁₁ is the first minor of the determinant formed by omitting the first row and first column; R₂₂ is the first minor formed by omitting the second row and the second column, and so on; R₁₂ ( = R₂₁) is the first minor formed by omitting the first column and second row (or vice versa). The principle of correlation plays an important rôle in natural history. It has replaced the notion that there is a simple proportion between the size of organs by the appropriate conception that there are simple proportions existing between the deviation from the average of one organ and the most probable value for the coexistent deviation of the other organ from its average.^[7] Attributes favoured by “natural” or other defection are found to be correlated with other attributes which are not directly selected. The extent to which the attributes of an individual depend upon those of his ancestors as measured by correlation.^[8] The principle is instrumental to most of the important “mathematical contributions” which Professor Pearson has made to the theory of evolution.^[9] In social inquiries, also, the principle promises a rich harvest. Where numerous fluctuating causes go to produce a result like pauperism or immunity from small-pox, the ideal method of eliminating chance would be to construct “regression-equations” of the following type: “Change % in pauperism [in the decade 1871-1881] in rural districts = −27.07%, +0.299 (change % out-relief ratio), +0.271 (change % on proportion of old), + .064 (change % in population).”^[10]

129. In order to determine the best values of the coefficients Determination of Constants by the Inverse Method. involved in the law of error, and to test the worth of the results obtained by using any values, recourse must be had to inverse probability.

130. The simplest problem under this head is where the quaesitum is a single real object and the data consist of a large number of observations, x₁, x₂, . . . x_n, such that if the number were indefinitely increased, the completed series would form a normal probability-curve with the true point as its centre, and having a given modulus c. It is as if we had observed the position of the dints made by the fragments

↑ Above, par. 114, and below, par. 127.
↑ Some plurality of independent causes is presumable.
↑ Herschel's a priori proposition concerning the law of error in two dimensions (above, par. 99) might still be defended either as generally true, so many phenomena showing no trace of interdependence, or on the principle which justifies our putting ½ for a probability that is unknown (above, par. 6), or 5 for a decimal place that is neglected; correlation being equally likely to be positive or negative. The latter sort of explanation may be offered for the less serious contrast between the a priori and the empirical proof of the law of error in one dimension (below, par. 158).
↑ Cf. above, par. 115.
↑ Cf. note to par. 98, above.
↑ Phil. Mag. (1892), p. 200 seq.; 1896, p. 211; Pearson, Trans. Roy. Soc. (1896), 187, p. 302; Burbury, Phil. Mag. (1894), p. 145.
↑ Pearson, “On the Reconstruction of Prehistoric Races,” Trans. Roy. Soc. (1898), A, p. 174 seq.; Proc. Roy. Soc. (1898), p. 418.
↑ Pearson, “The Law of Ancestral Heredity,” Trans. Roy. Soc.; Proc. Roy. Soc. (1898).
↑ Papers in the Royal Society since 1895.
↑ An example instructively discussed by Yule, Journ. Stat. Soc. (1899).

[1] Above, par. 114, and below, par. 127.

[2] Some plurality of independent causes is presumable.

[3] Herschel's a priori proposition concerning the law of error in two dimensions (above, par. 99) might still be defended either as generally true, so many phenomena showing no trace of interdependence, or on the principle which justifies our putting ½ for a probability that is unknown (above, par. 6), or 5 for a decimal place that is neglected; correlation being equally likely to be positive or negative. The latter sort of explanation may be offered for the less serious contrast between the a priori and the empirical proof of the law of error in one dimension (below, par. 158).

[4] Cf. above, par. 115.

[5] Cf. note to par. 98, above.

[6] Phil. Mag. (1892), p. 200 seq.; 1896, p. 211; Pearson, Trans. Roy. Soc. (1896), 187, p. 302; Burbury, Phil. Mag. (1894), p. 145.

[7] Pearson, “On the Reconstruction of Prehistoric Races,” Trans. Roy. Soc. (1898), A, p. 174 seq.; Proc. Roy. Soc. (1898), p. 418.

[8] Pearson, “The Law of Ancestral Heredity,” Trans. Roy. Soc.; Proc. Roy. Soc. (1898).

[9] Papers in the Royal Society since 1895.

[10] An example instructively discussed by Yule, Journ. Stat. Soc. (1899).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]