Open main menu

Page:Popular Science Monthly Volume 65.djvu/143

This page has been proofread, but needs to be validated.

permit me to make. Indeed, Dr. Mendenhall found himself in the same predicament, from which he was rescued by the generosity of a private citizen, who supplied the salaries of two assistants for several months during which the necessary data were collected.

Then it occurred to me that though one hundred thousand words may be necessary to yield an invariable curve, a much smaller number might suffice to establish the existence of such a curve within certain limits. If these limits for the curves of different forms of composition from the same author turn out to be mutually exclusive, our hypothesis would be established, though we had not examined a sufficiently large number of words to determine the locus of the curves with accuracy. Thus, possibly, the work necessary to test our hypothesis might reduce itself to manageable proportions.

The first author examined was Goethe. To eliminate as far as possible the disturbing effect of unconscious bias, I decided to count in word-groups of consecutive thousands, always beginning with the first of the work. Quotations, footnotes, headings and, in the case of dramas, stage-directions, etc., were uniformly omitted. These rules were strictly adhered to in all the data which follow. Five groups of one thousand words each were taken from each of Goethe's 'B├╝rgergeneral,' and 'Literatur Recensionen,' (B). The results were tabulated as follows:

Table I.

PSM V65 D143 Word analysis of goethe work.png

Each thousand words was now plotted separately and the resulting two sets of five curves compared (Fig. 6 and Fig. 7). These results far exceeded my expectation. No curve of the one set could possibly be mistaken for any curve of the other set. Three-letter words, of which there were between 319 and 338 in each thousand of the first set, were reduced to 250 to 268 per thousand in the second set;