Open main menu

Page:Popular Science Monthly Volume 65.djvu/137

This page has been proofread, but needs to be validated.
133
CHARACTERISTIC CURVES OF COMPOSITION.

words in each group is increased there is, of course, closer agreement of their diagrams, and this became so evident in the earlier stages of the investigation that the conclusion was soon reached that if a diagram be made representing a very large number of words from a given author, it will not differ sensibly from any other diagram representing an equally large number of words from the same author. Such a diagram would then reflect the persistent peculiarities of this author in the use of words of different lengths and might be called the characteristic curve of his composition. Curves similarly formed from anything that he had ever written could not differ materially from this." (The italics are mine.) After some preliminary work which seemed to bear out the conclusion ventured above, the writer states: "From the examination thus far made I am convinced that 100,000 words will be necessary and sufficient to furnish the characteristic curve of a writer—that is to say, if a curve is constructed from 100,000 words of a writer, taken from any one of his productions, then a second curve constructed from another 100,000 words would be practically identical with the first—and that this curve would, in general, differ from that formed in the same way from the composition of another writer, to such an extent that one could always be distinguished from the other."

Such is the author's own statement of his theory, which the facts adduced apparently support. The culminating test consisted in the examination of different groups of 100,000 or more words from each of several authors, and it was found that the corresponding graphs did actually coincide. This, in the words of the author, 'must be regarded as convincing evidence of the soundness of the original assumption.'

The existence and uniqueness of characteristic curves being granted, its practical application as a test of disputed authorship is obvious. To quote again, "If it can be proved that the characteristic curve exhibited by an analysis of 'David Copperfield' is identical with that of 'Oliver Twist' of 'Barnaby Rudge,' of 'Great Expectations,' etc., and that it differs sensibly from that of 'Vanity Fair,' or 'Eugene Aran,' or 'Robinson Crusoe,' or 'Don Quixote,' or anything else, in fact, then the conclusion will be tolerably certain that whenever it appears it means Dickens."

The title of Dr. Mendenhall's second paper, 'A Mechanical Solution of a Literary Problem,' refers to the application of this theory to the Bacon-Shakespeare controversy, which, we are told, formed the objective point of the whole investigation. The characteristic curves resulting from 400,000 words of the plays, and 200,000 words from Bacon's 'Henry VII.,' 'Advancement of Learning' and the 'Essays' were constructed and exhibited together as in Fig. 20. The con-