This page has been proofread, but needs to be validated.
390
PROBABILITY
[AVERAGE


putting r = OH, r′ = OK; as r = 2α sin θ, r′ = 2α sin θ′,

M1 = .

Professor Sylvester has remarked that this double integral, by means of the theorem

,

is easily shown to be identical with

.

∵ M1 = 35α2/36π; ∵ M = 35/48π2πα2.

From this mean value we pass to the probability that four points within a circle shall form a re-entrant figure, viz.

p = 35/12π2.

94. The function of expectation in this class of problem appears to afford an additional justification of the position here assigned to this conception[1] as distinguished from an average in the more general sense which is proper to the following Part.

Part II.—Averages and Laws of Error

95. Averages.—An average may be defined as a quantity derived from a given set of quantities by a process such that, if the constituents become all equal, the average will coincide with the constituents, and the constituents not being equal, the average is greater than the least and less than the greatest of the constituents. For example, if x1, x2, . . . xn, are the constituents, the following expressions form averages (called respectively the arithmetic, geometric and harmonic means):—

x1 + x2 + . . . + xn/n.

(x1 × x2 × . . . × xn)1/n.

1/1/n(1/x1 + 1/x2 + . . . + 1/xn).

The conditions of an average are likewise satisfied by innumerable other symmetrical functions, for example:—

(x12 + x22 + . . . + xn2/n)½

The conception may be extended from symmetrical to unsymmetrical functions by supposing any one or more of the constituents in the former to be repeated several times. Thus if in the first of the averages above instanced (the arithmetic mean) the constituent xr, occurs l times, the expression is to be modified by putting lxr for xr in the numerator, and in the denominator, for n, n+r−1. The definition of an average covers a still wider held. The process employed need not be a function.[2] One of the most important averages is formed by arranging the constituents in the order of magnitude and taking for the average a value which has as many constituents above it as below it, the median. The designation is also extended to that value about which the greatest number of the constituents cluster most closely, the “centre of greatest density,” or (with reference to the geometrical representation of the grouping of the constituents) the greatest ordinate, or, as recurring most frequently, the mode.[3] But to comply with the definition there must be added the condition that the mode does not occur at either extremity of the range between the greatest and the least of the constituents. There should be also in general added a definition of the process by which the mode is derived from the given constituents.[4] Perhaps this specification may be dispensed with when the number of the constituents is indefinitely large. For then it may be presumed that any method of determining the mode will lead to the same result. This presumption presupposes that the constituents are quantities of the kind which form the sort of “series” which is proper to Probabilities.[5] A similar presupposition is to be made with respect to the constituents of the other averages, so far as they are objects of probabilities.

96. The Law of Error.—Of the propositions respecting average with which Probabilities is concerned the most important are those which deal with the relation of the average to its constituents, and are commonly called “laws of error.” Error is defined in popular dictionaries as “deviation from truth”; and since truth commonly lies in a mean, while measurements are some too large and some too small, the term in scientific diction is extended to deviations of statistics from their average, even when that average—like the mean of human or barometric heights—does not stand for any real objective thing. A “law of error” is a relation between the extent of a deviation and the frequency with which it occurs: for instance, the proposition that if a digit is taken at random from mathematical tables, the difference between that figure and the mean of the whole series (indefinitely prolonged) of figures so obtained, namely, 4.5, will in the long run prove to be equally often ±0.5, ±1.5, ±2.5, ±3.5, ±4.5.[6] The assignment of frequency to discrete values—as 0, 1, 2, &c., in the preceding example—is often replaced by a continuous curve with a corresponding equation. The distinction of being the law of error is bestowed on a function which is applicable not merely to one sort of statistics—such as the digits above instanced—but to the great variety of miscellaneous groups, generally at least, if not universally. What form is most deserving of this distinction is not decided by uniform usage; different authorities do not attach the same weight to the different grounds on which the claim is based, namely the extent of cases to which the law may be applicable, the closeness of the application, and the presumption prior to specific experience in favour of the law. The term “the law of error” is here employed to denote (1) a species to which the title belongs by universal usage, (2) a wider class in favour of which there is the same sort of a priori presumption as that which is held to justify the more familiar species. The law of error thus understood forms the subject of the first section below.

97. Laws of Frequency—What other laws of error may require notice are included in the wider genus “laws of frequency,” which forms the subject of the second section. Laws of frequency, so far as they belong to the domain of Probabilities, relate much to the same sort of grouped statistics as laws of error, but do not, like them, connote an explicit reference to an average. Thus the sequence of random digits above instanced as affording a law of error, considered without reference to the mean value, presents the law of frequency that one digit occurs as often as another (in the long run). Every law of error is a law of frequency; but the converse is not true. For example, it is a law of frequency—discovered by Professor Pareto[7]—that the number of incomes of different size (above a certain size) is approximately represented by the equation y = A/xa, where x denotes the size of an income, y the number of incomes of that size. But whether this generalization can be construed as a law of error (in the sense here defined) depends on the nice inquiry whether the point from which the frequency diminishes as the income x increases can be regarded as a “mode,” y diminishing as x decreases from that point.

  1. See introductory remarks and note to par. 95.
  2. A great variety of (functional) averages, including those which are best known, are comprehended in the following general form φ−1{M[φ(x1), φ(x2), . . . φ(xn)]}; where φ is an arbitrary function, φ−1 is inverse (such that φ−1(φ(x)) ≡ x), M is any (functional) mean. When M denotes the arithmetic mean; if φ(x) ≡ log x (φ−1(x) ≡ ex) we have the geometric mean; if φ(x) ≡ 1/x, we have the harmonic mean. Of this whole class of averages it is true that the average of several averages is equal to the average of all their constituents.
  3. This convenient term was introduced by Karl Pearson.
  4. E.g. some specified method of smoothing the given statistics.
  5. See above, pt. i., pars. 3 and 4. Accordingly the expected value of the sum of n (similar) constituents (x1 + x2 + . . . + xn) may be regarded as an average, the average value of nxr where xr is any one of the constituents.
  6. See as to the fact and the evidence for it, Venn, Logic of Chance, 3rd ed., pp. 111, 114. Cf. Ency. Brit., 8th ed., art. “Probability,” p. 592; Bertrand, op. cit., preface § ii.; above, par. 59.
  7. See his Cours d'économie politique, ii. 306. Cf. Bowley, Evidence before the Select Committee on Income Tax (1906, No. 365, Question 1163 seq.); Benini, Metodologica statistica, p. 324, referred to in the Journ. Stat. Soc. (March, 1909).