Page:EB1911 - Volume 22.djvu/413

This page has been proofread, but needs to be validated.

LAWS OF ERROR]

PROBABILITY

399

if not known beforehand, may be inferred, as in the simpler case, from a set of observations. Similar statements holding for the other equations, the probability that the given set of observations f₁, f₂, &c., should have resulted from a particular system of values for x, y . . is J exp [(a₁x + b₁y − f₁)²/c₁² + (a₂x + b₂y − f₂)²/c₂² + &c.], where J is a-constant determined on the same principle as in the analogous simpler cases.^[1] The condition that P should be a maximum gives as many linear equations for the determination of x′ y′ . . . as there are unknown quantities.

145. The solution proper to the case where the observations are known to arrange according to the normal law may be extended to numerous observations ranging under any law, on the principles which justify the use of the Method of Least Squares in the case of a single quaesitum.

146. As in that simple case, the principle of economy will now justify the use of the median, e.g. in the case of two quaesita, putting for the true values of x and y that point for which the sum of the perpendiculars let fall from it on each of a set of lines representing the given equations (properly weighted) is a minimum.^[2]

147. The older writers have expressed the error in the determination of one of the variables without reference to the error in the Normal Correlation. other. But the error of one variable may be regarded as correlated with that of another; that is, if the system x′, y′ . . . forms the solution of the given equations, while x′ + ξ, x′ + η . . . is the real system, the (small) values of ξ, η. . . . which will concur in the long run of systems from which the given set of observations result are normally correlated. From this point of view Bravais, in 1846, was led to several theorems which are applicable to the now more important case of correlation in which ξ and η are given (not in general small) deviations from the means of two or more correlated members (organs or attributes) forming a normal group.

148. To determine the frequency-constants of such a group it is proper to proceed on the analogy of the simple case of one-dimensioned error. In the case of two dimensions, for instance, the probability p₁ that a given pair of observations (x₁, y₁) should have resulted from a normal group of which the means are x′ y′ respectively, the standard deviations σ₁ and σ₂ and the coefficient of correlation r, may be written—

∆x∆y∆σ₁∆σ₂∆r(1/2π) ${\sqrt {\sigma _{1}\sigma _{2}(1-r^{2})}}\exp {-{\frac {1}{2}}{\text{E}}^{2}}$ ,

where E² = (x′ − x₁)²/σ₁² − 2r(x′ − x₁)(y′ − y₁)/σ₁σ₂ + (y′ − y₁)²/σ₂². A similar statement holds for each other pair of observations (x₂y₂), (x₃y₃). . .; with analogous expressions for p₂, p₃. . . Whence, as in the simpler case, we have p₁ × p₂ × &c. × p_n/J (a constant) for P, the a posteriori probability that the given observations should have resulted from an assigned system of the frequency-constants. The most probable system is determined by making P a maximum, and accordingly equating to zero each of the following expressions—

dP/dx dP/dy dP/dσ₁ dP/dσ₂ dP/dr.

The values of the arithmetic mean and of the standard deviation for each variable are what have been obtained in the simple case of one dimension. The value of r is ∑(x′ − x_r)(y′ − y_r)/σ₁σ₂.^[3] The probable error of the determination is assigned on the assumption that the errors to which it is liable are small.^[4] Such coefficients have already been calculated for a great number of interesting cases. For instance, the coefficient of correlation between the human stature and femur is 0.8, between the right and left femur is 0.96, between the statures of husbands and wives is 0.28.^[5]

149. This application of inverse probability to determine correlation-coefficients and the error to which the determination is liable has been largely employed by Professor Pearson^[6] and other recent writers. The use of the normal formula to measure the probable—and improbable—errors incident to such determinations is justified by reasoning akin to that which has been employed in the general proof of the law of error.^[7] Professor Pearson has pointed out a circumstance which seems to be of great importance in the theory of evolution: that the errors incident to the determination of different frequency-coefficients are apt to be mutually correlated. Thus if a random selection be made from a certain population, the correlation-coefficient which fits the organs of that set is apt to differ from the coefficient proper to the complete group in the same sense as some other frequency-coefficients.

150. The last remark applies also to the determination of the coefficients, in particular those of correlation, by abridged methods, on principles explained with reference to the simple case; for instance by the formula r = ∑η/∑ξ, where ∑ξ is the sum of (some or all) the positive (or the negative) deviations of the values for one organ or attribute measured by the modulus pertaining to that member, and ∑η is the sum of the values of the other member, which are associated with the constituents of ∑ξ. This variety of this method is certainly much less troublesome, and is perhaps not much less accurate, than the method prescribed by genuine inversion.

151. A method of rejecting data analogous to the use of percentiles in one dimension is practised when, given the frequency of observations for each increment of area, e.g. each ∆x ∆y, we utilize only the frequency for integral areas. Mr Sheppard has given an elegant solution of the problem: to find the correlation between two attributes, given the medians L, and M, of a normal group for each attribute and the distribution of the total group, as thus.^[8]

	Below L,	Above L,
Below M,	P	R
Above M,	R	P

Fig. 12.

If cos D is put for r, the coefficient of correlation, it is found that D = πR/(P + R). For example, let the group of statistics relating to dice already ^[9]cited from Professor Weldon be arranged in four quadrants by a horizontal and a vertical line, each of which separates the total groups into two halves: lines of which equations prove to be respectively y = 6.11 and x = 6.156. For R we have 1360.5, and for P 687.5 roughly. Whence D = π × 0.66; r = cos 0.66 × π = −½ nearly, as it ought; the negative sign being required by the circumstance that the lower part of Mr Sheppard's diagram shown in fig. 12 corresponds to the upper part of Professor Weldon's diagram shown in par. 115.

152. Necessity rather than convenience is sometimes the motive for resort to percentiles. Professor Pearson has applied the median method to determine the correlation between husbands and wives in respect of the darkness of eye-colour, a character which does not admit of exact graduation: “our numbers merely refer to certain groupings, arranged, it is true, in increasing darkness of colour, but in no way corresponding to equal increases in colour-intensity.”^[10] From data of this sort, having ascertained the number of husbands with eye-colours above the median tint who marry wives with eye colour above the median tint, Professor Pearson finds for r the coefficient of correlation +0.1. A general method for determining the frequency-constants when the data are, or are taken to be, of the integral sort has been given by Professor Pearson.^[11] Attention should also be called to Mr Yule's treatment of the problem by a sort of logical calculus on the lines of Boole and Jevons.^[12]

153. In the cases of correlation which have been so far considered, it has been presupposed that the things correlated range according Abnormal Correlation. to the normal law of error. But now, suppose the law of distribution to be no longer normal: for instance, that the dots on the plane of xy,^[13] representing each a pair of members, are no longer grouped in elliptic (or circular) rings of equal frequency, that the locus of the maximum y deviation, corresponding to an assigned x deviation, is no longer a right line. How is the interdependence of these deviations to be formulated? It is submitted that such data may be treated as if they were normal: by an extension of the Method of Least Squares, in two or more dimensions.^[14] Thus when the amount of pauperism together with the amount of outdoor relief is plotted in several unions there is obtained a distribution far from normal. Nevertheless if the average pauperism and average outdoor relief are taken for aggregates—say quintettes or decades—of unions taken at random, it may be expected that these means will conform to the normal law, with coefficients obtained from the original data, according to the rule which is proper to the case of the normal law.^[15] By obtaining averages conforming to the normal law, as by the simple application of the method of least squares, we should not indeed have utilized the whole of our data, but we shall put a part of it in a very useful

↑ Above, par. 130.
↑ See Phil. Mag. (1888), “On a New Method of Reducing Observations”; where a comparison in respect of convenience and accuracy with the received method is attempted.
↑ Corresponding to the $k/{\sqrt {lm}}$ of pars. 14, 127 above.
↑ Pearson, Trans. Roy. Soc., A, 191, p. 234.
↑ Pearson, Grammar of Science, 2nd ed. p. 402, 431.
↑ Trans. Roy. Soc. (1898), A, vol. 191; Biometrika, ii. 273.
↑ Above, par. 107. Compare the proof of the “Subsidiary Law of Error,” as the law in this connexion may be called, in the paper on “Probable Errors,” Journ. Stat. Soc. (June 1908).
↑ Trans. Roy. Soc. (1899), A, 192, p. 141.
↑ Above, par. 115.
↑ Grammar of Science, p. 432.
↑ Trans. Roy. Soc., A, vol. 195. In this connexion reference should also be made to Pearson's theory of “Contingency” in his thirteenth contribution to the “Mathematical Theory of Evolution” (Drapers' Company Research Memoirs).
↑ Trans. Roy. Soc. (1900), A, 194, p. 257; (1901), A, 197, p. 91.
↑ Above, par. 127.
↑ Above, par. 116.
↑ If from the given set of n observations (each corresponding to a point on the plane xy) there is derived a set of n/s observations each obtained by averaging a batch numbering s of the original observation; the coefficient of correlation for the derived system is the same as that which pertains to the original system. As to the standard deviation for the new system see note to par. 135.

[1] Above, par. 130.

[2] See Phil. Mag. (1888), “On a New Method of Reducing Observations”; where a comparison in respect of convenience and accuracy with the received method is attempted.

[3] Corresponding to the $k/{\sqrt {lm}}$ of pars. 14, 127 above.

[4] Pearson, Trans. Roy. Soc., A, 191, p. 234.

[5] Pearson, Grammar of Science, 2nd ed. p. 402, 431.

[6] Trans. Roy. Soc. (1898), A, vol. 191; Biometrika, ii. 273.

[7] Above, par. 107. Compare the proof of the “Subsidiary Law of Error,” as the law in this connexion may be called, in the paper on “Probable Errors,” Journ. Stat. Soc. (June 1908).

[8] Trans. Roy. Soc. (1899), A, 192, p. 141.

[9] Above, par. 115.

[10] Grammar of Science, p. 432.

[11] Trans. Roy. Soc., A, vol. 195. In this connexion reference should also be made to Pearson's theory of “Contingency” in his thirteenth contribution to the “Mathematical Theory of Evolution” (Drapers' Company Research Memoirs).

[12] Trans. Roy. Soc. (1900), A, 194, p. 257; (1901), A, 197, p. 91.

[13] Above, par. 127.

[14] Above, par. 116.

[15] If from the given set of n observations (each corresponding to a point on the plane xy) there is derived a set of n/s observations each obtained by averaging a batch numbering s of the original observation; the coefficient of correlation for the derived system is the same as that which pertains to the original system. As to the standard deviation for the new system see note to par. 135.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]