Popular Science Monthly/Volume 66/February 1905/Examinations, Grades and Credits




THE determination of individual differences, the improvement of useful traits and the assignment of men to the work for which they are fit are among the most important problems in the whole range of pure and applied science. The extraordinary growth of the material sciences with their applications during the nineteenth century requires as its complement a corresponding development of psychology. It would under existing conditions be intolerable to erect a building without regard to the quality and strength of materials, to use at random a wooden beam or a steel girder; yet we often do much this thing in selecting men for their work and adjusting them to it.

In examinations and grades we attempt to determine individual differences and to select individuals for special purposes. It seems strange that no scientific study of any consequence has been made to determine the validity of our methods, to standardize and improve them. It is quite possible that the assigning of grades to school children and college students as a kind of reward or punishment is useless or worse; its value could and should be determined. But when students are excluded from college because they do not secure a certain grade in a written examination, or when candidates for positions in the government service are selected as the result of a written examination, we assume a serious responsibility. The least we can do is to make a scientific study of our methods and results.

Grades assigned to college students have some meaning, though just what this is remains to be determined. Dr. Wissler[1] has shown that there is a decided correlation in the standing in different subjects. A man who receives a high grade in Latin is likely to receive a high grade in Greek, and almost as likely to receive a high grade in mathematics or gymnastics. This seems to indicate that the grades are assigned for moral traits, or for the general impression made by the man, as much as for ability and performance in a given subject. Professor Thorndike and his students[2] have found a similar relationship in school grades and in the New York State Regents' examinations. Professor Dexter[3] has shown that a man who is given a high standing in college is more likely than others to find his name in 'Who's Who in America.' Phi Beta Kappa men (on the average the upper seventh of the class) are twice as likely to be there as others, and the first man in his class is five times as likely.[4]

It is evident that subjects differ greatly in examinability. The results of an examination in mathematics, for example, can be graded with considerable accuracy; they give fairly definite information as to the man's mathematical aptitudes, and mathematical ability is largely innate, so that here the boy is father to the man. The mathematical tripos at Cambridge is a real test. Of the fifty senior wranglers in the first half of the last century a very large number have attained eminence. For example, two of them, Sir George Gabriel Stokes and Dr. N. M. Ferrers, who died within a month preceding the writing of this paragraph, maintained both in mathematical performance and general efficiency the position of, say, first in a hundred given them as the result of a student examination. Two facts should, however, be borne in mind. The senior wrangler is given great opportunity by being made a fellow, and the examination is on three years of solid work. The results of examinations in scrappy courses lasting half a year are not nearly so valid.

Subjects such as literature and psychology do not lend themselves to written examinations so well as mathematics. I have had the same papers in psychology graded by different examiners and have found great variations in the results. There is some validity in the order of excellence, but scarcely any in the absolute grades, the variation of the grades for the same paper by different examiners being as large as the variation of different papers by the same examiners. I have not, however, confirmed this result by sufficient data. One of our courses in psychology is given by different instructors, each of whom sets and grades papers for the same student. The grades assigned are A, B, C, D and F—excellent, good, fair, poor and failure. Four instructors gave twenty-one men a total of 15 A's, 38 B's, 27 C's, 4 D's and 1 F. When, however, we average the grades of the four instructors, we get 3B + 17C + and ID +. All the grades are alike within the unit used, except four, and the probable errors of three of the four show that they are as likely as not to fall within this grade, while the probable error of the remaining grade gives it but moderate validity.

It seems scarcely possible to determine what students are fitted for a college course by means of a written examination; and I fear that the systematization of entrance examinations under the auspices of a board will be harmful to secondary education.[5] The German method, which has made some progress here, of leaving the decision to the school seems much better. If we can not accept the recommendation of the school, I should prefer to see the candidate passed upon by two psychological experts. If their independent judgment agreed, I should have more confidence in this than in the results of any written examination. In general, I should admit to college any students who were not pronounced unfit by expert opinion, dropping of course those who subsequently proved themselves unfit. Requiring all students to pass an examination in Latin composition and the like is as out of place in a modern university as an ichthyosaurus on Broadway[6].

Our college entrance requirements and examinations are a serious injury to secondary education, and they select very imperfectly the men who should have a college education. Of 262 students who entered Columbia College in 1900, only 50 completed the regular four-year course in the college. Civil service examinations often exclude the fit from the public service. In Great Britain the method is carried to an extreme, and the results depend as much on the coach as on the candidate. Almost anything is better than appointments for party service; but past performance, character, habits, heredity and physical health are much more important than the temporary information that can be but imperfectly tested by a written examination. I should not be willing to select a fellow or an assistant in psychology by such a method, and to select a professor would be nearly as absurd as to choose a wife as the result of a written examination on her duties. To devise and apply the best methods of determining fitness is the business of the psychological expert, who will probably represent at the close of this century as important a profession as medicine, law or the church.

I am at present working at the problem of assigning grades for moral, mental and physical traits[7], but shall here confine myself to a discussion of college grades. The literature is very scanty. I can only refer to two papers[8], both of which are slight.

Grades are usually assigned on a scale of 100, some institutions, as Harvard and Columbia, reporting only the five groups into which the men are divided. The starting point in all grades is the fact that the written papers or the results of the term's work can be arranged more or less accurately in the order of merit.[9] The assignment of quantitative grades to a qualitative series or its division into groups is usually done in an arbitrary manner, and, so far as I am aware, no attempt has hitherto been made to assign probable errors. It is obvious that our grades should be standardized. Our colleges are in the position of a grocer who should let each of his clerks give to customers without weighing and without knowledge of market prices what he believed to be a dollar's worth of tea.

The simplest method of assigning grades is to arrange a hundred papers as nearly as may be in the order of merit and to give the poorest paper the grade 1, the next poorest the grade 2, and so on, until the best paper receives the grade 100. The 100 cases would not be exactly representative of the entire group with which we are concerned; but if we had 100,000 cases, the error from this source in giving the poorest 1,000 the grade of 1, etc. would be entirely negligible. It is possible to calculate how likely it is that in a random group of 100 cases we should find two, three or more men to whom the lowest or any other grade should be assigned. Each instructor forms a rough estimate of the group of students with which he is concerned, and can with a probable error that might be determined assign its place in the series to each case.

If men are arranged in this way in the order of merit and each is assigned his position in the series from 1 to 100, the differences between them will not be equal. If a hundred men are placed in a row according to height, the line passing along the tops of their heads will not be a straight line. The men in the middle of the row will differ but little from one another, and the differences will become continually greater towards the ends. Fig. 1 (page 366) shows the approximate distribution in stature of 1,052 English women, measured for Professor Karl Pearson. Their average height was about 5 feet 212 inches; 18.3 per cent, of the whole number were between 62 and 63 inches, and one half of them were within about 112 inches of the average, the probable error. The ordinates or vertical lines are proportional to the number of women falling within the limits of an inch. Thus 16.3 per cent, were between 63 and 64 inches; 11.5 per cent, between 64 and 65 inches, etc., only two falling between 70 and 71 inches. The women near the average tend to differ in height by about 1/200th of an inch, while the tallest or shortest of the thousand tend to differ by half an inch or more. This curve, showing the

Fig. 1. Distribution of Stature of Women in Inches.

distribution in height, corresponds closely with the fainter and more regular curve on the figure which represents the distribution of events due to a large number of small causes equally likely to affect them in one of two ways, the curve of error of the exponential equation whose properties have been discussed by Gauss, Laplace and other mathematicians.

If the performances of students in examinations are assumed to vary in the same way as their height, then we can if we like place them in classes which represent equal differences. Thus by the Harvard-Columbia method of grouping into five classes, if we put half the men into the middle class, C, and let B and D represent an equal range, we should give about 23 per cent, of both B's and D's and about 2 per cent, of both A's and F's. This, however, gives too few men in the A and F classes for our purposes. If we make the range of the unit 20 per cent, smaller, we obtain the distribution shown in Figure 3, according to which of ten men four would receive C, two B, two D, one A and one F. It departs slightly from the theoretical distribution, but certainly not so much as the theoretical distribution departs from the actual distribution. It appears to be the most convenient classification when five grades are used; one in ten being given honors and one in ten being required to repeat the course corresponding fairly well with the average practise and being a convenient standard.

It is maintained by Dr. Galton, Professor Pearson and others that ability and performance are distributed in accordance with the curve of error. It does not seem to me that this is the case. If ability for scholastic work were distributed in this way at birth, it would not remain so among college students, who are a selected group. Those unfit are less likely to be found in college and those particularly competent are more likely to be there. This would tend to give us for college students a skew curve in the negative direction. In spite of this factor, I believe that the main skew is in the opposite direction,

Fig. 2. The Upper Surface shows the Theoretical Distribution of Grades, the Lower that must convenient in Practise.

and that ability is distributed somewhat like wages which are roughly proportional to it. If the average earnings of men in this country are $600 a year, it is clear that the positive deviations from the average are many times the negative deviations. There may be a certain minimal ability necessary for survival, and variations and sports may occur to an extent in the positive direction not possible in the negative direction. There are certain 'constant errors,' such as a college education, which divide men into different 'species.' In so far as students are graded on the lines of the probability curve, this may measure the attitude of the examiner rather than the distribution of the men in merit.

But we do not need theorizing so much as facts, which should be secured without delay. In the papers quoted above I have shown that it is possible to transform a qualitative series into one giving measures of differences. If the same thousand examination papers were read and graded independently by ten examiners, the variation in the grades of the same paper by different examiners would give us a measure of the differences between the papers, which would be inversely as the variation of the grades. I have in this way made a curve for the distribution of scientific performance in a selected group, and the same methods should be applied to merit in examinations.

In the meanwhile I am able to give the grades actually assigned in several cases. The accompanying table shows the grades given to 200 students in each of five courses in Columbia College, and the figure shows the averages and the grades in English A and Mathematics A. The average grade is a little above C, the median grade

Percentage of Students receiving

Eng. A 4 .5% 41 .5% 44 .5% 4 .5% 5 %
Eng. B. 4 40 39 6 .5 10 .5
Math A. 11 24 24 22 19
Hist. A. 10 .5 28 28 .5 20 13
Econ. A 9 36 33 17 .5 4 .5
—— —— —— —— ——
Average 8 33 .9 33 .8 14 .1 10 .4
Fig. 3. The Distribution of the Average Grades assignee in Five Courses, with the Details for Introductory Courses in English and Mathematics.

is nearly midway between C and B, and more than two thirds of all the grades are either C or B. Eight per cent, of the grades are A and ten per cent, are F, which approximates closely to the standard recommended above. The average of the grades assigned in these courses does not vary considerably, but the distribution is different. In the courses in English the distribution tends to follow the normal curve of error, with the failures as a separate group or species. In the courses in mathematics and history the groups are more nearly equal in size, except in the case of 'excellent.' Here the range of ability is presumably greater in D and F than in B and C. The distribution in economics is intermediate. The fact that the courses in English, though given by different instructors, correspond closely shows that within a department certain standards may be followed; and this would be possible for the whole college or for the educational system of the country. It is only necessary to adopt the standards and then to teach people how to apply them.

I have also counted up the average grades assigned to 200 students in their first ten courses. In the table and curve, A represents the range between A and B + ½, B the range to C + ½, etc. Here

Average grade A B C D F
Per cent, of students 2.5 34 46.5 16.5 0.5

the grades tend to be bunched, the differences between the men being partly obliterated by the combination of the grades in different courses.

In the next table and in the figure are given the grades of 15,275 papers assigned by the examiners of the College Entrance Examination

Rating 100-90 89-75 74-60 59-50 49-40 39-0
Per cent. of papers 6.1 21.4 32.6 12.1 11.1 16.7
Fig. 4. Distribution of Grades of the College Entrance Examination Board.

Board in 1904. The grades are in this case given on a centile scale. The curve is decidedly skewed in the negative direction, the most frequent grades being between 60 and 75. There is a considerable variation in the different subjects. Thus 10.6 per cent, of the candidates are given a grade above 90 in Greek and only 2.7 per cent, in history; 34.9 per cent, are given a grade below 50 in mathematics and only 19.1 per cent, in English. It is obvious that such grades should be standardized. It may be remarked incidentally that it is easy to select examiners by a competitive examination. If twenty candidates grade the same sets of papers, those whose grades are nearest the average of all the grades are likely to be the most competent examiners.

In these cases, and in all grades with which I am acquainted, there is a tendency to grade students above the average. Professor Pearson finds that in estimating the health of English boys, teachers place twice as many above 'normally healthy' as below, and he seems to regard it as gratifying that English boys should be more than normally healthy. We look on our own students as better than the average and in any case give them the benefit of the doubt. We call things 'fair' that are only average, and then the word 'fair' comes to mean average. Then we assign the grade 'fair' to students who are below the average, and a 'fair' student comes to mean a poor student. In assigning grades such words should be avoided; we should learn to think in terms of the average and probable error.

If grades are given on a centile system, the grade should mean the position of the man in his group; thus 60 should mean that in the long run it is more likely than anything else that there would be forty men better and fifty-nine not so good. The average probable error should be determined and a probable error should be attached to the grades; thus the grade 60 ± 10 means that the chances are even that there are between thirty and fifty men in the group who are better. The probable error becomes smaller as we depart from the average man; I estimate on the basis of a few experiments that it is over 10 in the middle of the scale. If this proves to be correct on the basis of more extended data, it is needless to grade more closely than on a scale of 10, though the first decimal would have some meaning when the grades are combined. If a hundred men are divided into ten groups of 10 each, the men in the middle groups will differ less from each other than those towards the ends, and if we wish to let the groups represent approximately equal ranges of merit, we can, as explained above, make five groups, A, B, C, D and F, putting 40 men in C, 20 men in both B and D and 10 in both A and F.

The determination of the validity of the grades given to college students and their standardization appear to me to be important because I regard it as desirable that students should be credited for the work they do rather than for the number of hours that they attend courses. By our present method a student who fails gets no credit at all, while a student who is nearly as bad (and perhaps worse) gets as much credit towards his degree as the best student in the class. In our graduate faculties we credit men for work they do, and this principle is also adopted in the secondary schools that have broken the 'lock step.' Just now we hear much about the need of shortening the four year college course. Men can not do the work of four years in three by attending more courses each year, but some men accomplish as much in three years as others do in four, and many men, if they had an adequate motive, would do as much in three years as they now do in four.

We find among our graduate students that the better men can obtain the doctor's degree in about half the time required by the poorer men, while in exceptional cases the range is greater. I have found in various fundamental traits that can be measured, such as accuracy of perception, reaction-time and memory, that ordinary individuals differ about as 2:1. It seems that the best men (say the first ten) in our classes differ from the poorest (say the last ten) in about this ratio. If, therefore, men are divided into five groups representing nearly equal ranges of ability and we give the C, or middle group, a credit of three points for a three hour course, it would be just to give the A group 4 points, the B group 312 points, the D group 212 points, and the F group 2 points or less.

In Columbia College sixty points are required for the bachelor's degree, a point being an hour's attendance at lectures or recitations, or two hours of laboratory work. Students are expected to attend classes for about 15 hours a week and usually receive the degree in four years; there are, however, some who attend 20 hours a week and receive the degree in three years. At Harvard College 54 points are required, and I understand that about half the students now accomplish the work in three years. When 60 points are required for the degree, and if credits as proposed above were assigned, the 200 students of Columbia College whose grades have been compiled on the basis of half the work for the degree would be required to attend a total number of hours, as follows:

Grades A B C D F
Per cent, of students 2.5 34 46.5 16.5 0.5
Hours for degree 40-45 45-55 55-65 65-75 75+

This would be a just assignment of credits to the best of our present knowledge. It would permit about one third of the best students to secure the degree by an attendance of from 15 to 18 hours a week for three years. If, however, it is thought that this gives too great a reward for good work and too great a penalty for deficiency, the credits and deductions could be halved. This would give for these students an attendance, as follows:

Grades A B C D F
Per cent, of students 2.5 34.5 46.5 16.5 0.5
Hours for degrees 47.5-52.5 52.5-57.5 57.5-62.5 65.5-67.5 67.5 +

Or, if the grades were standardized on the lines here proposed, the percentages would become:

Grades A B C D F
Per cent, of students 10 20 40 20 10
Hours for degree 47.5-52.5 52.5-57.5 57.5-62.5 62.5-67.5 no degree

It would also be possible to introduce the principle of giving extra credit for good work in a less radical manner, for example, by allowing a credit of three points to students who receive the highest grade in at least five courses. The application of the principle in any form would be an important educational advance, but a method such as this would not lie nearly so fair and accurate as the plan here recommended. It would affect only a few men and would be more dependent on chance. The amount of credit in the plan recommended can be so adjusted that a given percentage of students can receive any credit desired; those receiving the highest grade (the first ten per cent, in the long run) could be awarded, on the average, an extra credit of 2, 3, 5, 10 or 20 points, as may be decided, and all others would receive credits in proportion.

I see no serious objection to the plan. The aberrancy of grades in different subjects would be a drawback, but not so serious as the existence of 'snap courses' under the present system. The adoption of the plan would tend to the standardization of grades, and the apparent objection might prove to be a real advantage. If it is objected that it would lead students to work too much for grades, this would simply mean, if grades are properly assigned, that it would lead them to do better work. The present method, where the grade is simply a kind of prize or punishment putting one man before another, seems to have objections; I have some sympathy with the students who call 'C' the 'gentleman's grade.' But if grades had some real meaning, they would be no more invidious than the payment of a salary of $3,000 to one man and of $5,000 to another. If it is said that the method is unfair because grades can not be given in accordance with exact deserts, it may be replied that this is true of all salaries and the like. Although a single grade is subject to a considerable probable error, the error of the average of a number of grades decreases as the square root of the number. Thus, if the probable error of a single grade is one place (that is, if a man receives C, the chances are even that he deserves a higher or a lower grade), the average of 25 grades (about the number of college courses taken for the degree) would be subject to a probable error of only one fifth of a place. Lastly it may be said that the bookkeeping is very simple—the credits for 400 students can be compiled by an ordinary clerk in one day.

The assignment of credits in accordance with the work done by the student rather than for the number of exercises he attends appears to be in accord with common sense and justice. If after four years' study one man has the qualifications for the B.A. and another for the M.A., each should be given his appropriate degree. It may be well for one student to attend exercises for twelve hours a week and for another to be eighteen hours in attendance, but if each accomplishes the same amount of work they should be given the same credit. The plan would prove an excellent stimulus to good work and would attract to the college that adopted it the best class of students.

I should myself not only like to give students credit for the degree in accordance with the work they do, but I should also like to see tuition fees charged in proportion. In this case conduct and character should be included as well as merit in class work. More of the endowment of the institution should be used for those whose education is the greater service to the community, while those whose presence in a college interferes with its work should not be supported at the public expense. If the tuition fee is $150, it should be apportioned as follows:

Grades A B C D F
Per cent, of students 10 20 40 20 10
Tuition fee $100-120 $120-140 $140-160 $160-180 $180-200

But I fear that it will be even more difficult to convince trustees than faculties that psychology is becoming an exact science.

  1. 'The Correlation of Mental and Physical Tests,' Monograph Supplement to The Psychological Review, No. 16.
  2. Summarized in 'Educational Psychology,' Lemcke and Büchner, 1903.
  3. 'High Grade Men in College and Out,' Pop. Sci. Mon., March, 1903.
  4. It must, however, be remembered that the kind of people who are put in a book such as 'Who's Who' are largely those who talk about things rather than those who do things—the class that receives part payment for its services in notoriety.
  5. Since this was written Professor Thorndike has compiled statistics, not as yet published, which indicate that students who pass these examinations with the lowest grades are as likely to do well in college as those having much higher grades. Those rejected would probably do equally well.
  6. In the discussion now in progress at Cambridge concerning the requirement of Greek at entrance, Professor Jebb ridiculed New Zealand as a Greekless land, because one of its citizens is alleged to have called Andromache 'Andromach.' Professor Jebb in his speech called New Zealand a part of Australia; yet he does not regard himself as illiterate.
  7. Cf. articles in Science (N. S. 17: 561-570, 1903) and Am. Jour of Psychol. (14: 310-328, 1903).
  8. 'American Titles and Distinctions,' W. Le Conte Stevens, The Popular Science Monthly, 63: 312-320, 1903. 'The Education of Examiners,' E. B. Sargent, Nature, 70: 63-65, 1904.
  9. Many instructors doubtless let the grade represent the percentage of questions correctly answered. This is a possible but fallacious method in a subject such as mathematics; in a subject such as psychology it is impossible.