This is an example of the proofreading of a single page using the ProofreadPage extension. It shows the initial state of the text, the final state of the text and notes on the process. This example is the first page of the preface from Picturesque New Guinea (Source: Page:Picturesque New Guinea.djvu/19), which was part of the Wikisource:Proofread of the Month for July 2011.
In many DjVu and PDF documents, a "text layer" exists for each page. This is the text pre-loaded into the editing panel on the left side of the screen when proofreading (with the page image on the right side of the screen). The text layer has been created by a computer program attempting to read the page image and recognise the letters and symbols therein, called Optical Character Recognition or OCR. Computer programs are not always very good at doing this. The number of errors depends on the clarity of the image and the quality of the software but there is often something that needs to be corrected.
Unedited OCR text
editThis is the original state of the text for this page. If the page is saved in this state, it should be saved with the "Not proofread" (red) page status.
P R E F A C E. |,WA,^"^.;OR years past, when perusing the account of explorino- ^^^^ expeditions setting out for some country comparatively 'V^J iniknown, I always noticed with a pang of disappointment ^^ :^^J^'^Jil that, however carefully the scientific staff was chosen, it was, as a rule, considered sufficient to supply one of the members -with a mahogany camera, lens, and chemicals to take pictures, the dealer fui-- nishing these articles generally initiating the purchaser for a cou[)le or three hours' time into the secrets and tricks of the " dark art," or when funds were too limited to purchase instruments, it was taken for granted that enough talent existed among the members to make rough sketches, Avhicli would afterwards be " worked up " for the purpose of illustrating [)erhaps a very important report. Sir Samuel Baker remarks, in the Appendix to one of his "Works, tliat a photographer should accompany every exploring expedition. The only one I ever heard of being furnished with that connnodity wns 11. M.S. "Challenger," on her scientific cruise round the world, but 1 remember readhig in the " Photographic News " the complaint of a gentleman, that so many years luid already passed, and still there was no sign of the " Challenger " photographs ever becoming accessible to the ])ublic. llow this is, or why it should be so, is dilUcult to tell, but as yet no book of travel, entirely illustrated by artistic views and portraits taken
Corrections
editThere are many different types of corrections that may need to be made while proofreading a page. This section highlights the edits and corrections made in this example, grouping them by type of correction. Corrections do not need to be made in any particular order.
If the page is saved after these corrections have been made, it can be saved with the "Proofread" (yellow) page status. If some, but not all, of the corrections have been made it should be saved as "Not Proofread" (red). A page does not need to be saved as "Not Proofread" before being upgraded to "Proofread"; if the corrections have all been made it can saved as "Proofread" from the start.
Gibberish and OCR errors
editNote that the initial F has been read by the computer as gibberish at the start of the first few lines of the page. This needs to be cleaned out, without removing the words and letters that should be there.
Several OCR errors exist in this text. Some of these just have the wrong letters; for example, "dilUcult" should be corrected to "difficult" and "connnodity" should be corrected to "commodity". "Avhicli" is particularly bad, and should be corrected to "Which". Some of the errors have not just the wrong letters but other characters entirely. For example, the computer appears to have had a problem recognising the letter P: "cou[)le", "[)erhaps" and "])ublic" should be corrected to "couple", "perhaps" and "public". Errors with numbers for letters can be quite common but sometimes hard to spot. "11. M.S." instead of "H.M.S." is fairly obvious but "1 remember" instead of "I remember" can be easily missed.
Before | After |
---|---|
luid | had |
llow | How |
readhig | reading |
tliat | that |
iniknown | unknown |
wns | was |
Line breaks
editThere are three paragraphs on this page, but the text does not show this at the moment. Without changes, this will be displayed as a single block of text. Using the scanned image as reference, extra line breaks need to be added after "important report" and "the ])ublic" (the scano can be corrected now, or later in the process). Note that, as well as using the scan to find the paragraphs, the short lines also hint at the end of a paragraph.
The text, as it is at the moment, has line breaks rather than text wrapping. This is not always a problem; web browsers usually wrap text when it is split like this. However, it is sometimes preferred to remove these line breaks to allow the text to flow more naturally. One problem to overcome here is, whenever a word has been broken by a line break; i.e., at the end of the sixth line, the word "furnishing" has been broken into "fui--" and "nishing". First, there is a scano in the first part; "fui--" should be "fur-". Second, the break itself is a problem that will not be solved automatically. The word needs to be rejoined and the spelling fixed.
Punctuation
editThe spaces before punctuation should be removed. For example, the spaces between the quotation marks and "Photographic News" or "Challenger" in the second paragraph.
Before | After |
---|---|
of the " Challenger " photographs | of the "Challenger" photographs |
in the " Photographic News " the complaint | in the "Photographic News" the complaint |
OCR software does not distinguish between letters and any other symbols (or even dirt or random marks on the page). It is capable of inserting OCR errors with punctuation as well as letters or numbers. Examples here include an extra hyphen inserted before "with" in the first paragraph and the inability to process the letter P as a letter.
Images
editImages (which include photographic plates, illustrations, illuminated text, decorations and diagrams) need to be saved as separate image files. The PNG file format is good for simple black and white images (such as line drawings and diagrams) while the JPEG format is for complicated images with lots of colour (such as photographs); in some cases vector graphics (the SVG file format) may be better. However, including the images in the work in the right place is the most important thing for proofreading; new versions of the images can be uploaded later.
In this case, there is the fleuron at the top of the page and the initial F; in both cases the PNG is the most suitable format. These images have been saved on Wikimedia Commons as File:Picturesque New Guinea - Preface banner.png and File:Picturesque New Guinea - Initial F.png.
The fleuron can be inserted with the standard code for images used on Wikimedia projects: [[File:Picturesque New Guinea - Preface banner.png|center|400px]]. The image is centred as it is in the original and the image size (here at 400 pixels in width) can be determined by the proofreader. This code goes at the top of the page as this is where the image should be based on the original.
The initial is slightly more complicated. Wikisource has the template {{dropinitial}} which will create a dropped initial (or "drop cap") without the decoration. In this case it would be {{dropinitial|F}}
. This is enough as a temporary measure but the image should be added eventually. The same template can insert the image, once it is available on Wikimedia Commons or Wikisource's own File: namespace. In this case, the template would be {{dropinitial|[[File:Picturesque New Guinea - Initial F.png|100px|alt=F]]}}
. Note the use of the alternative text, alt=F
, which will allow software such as screenreaders to read the letter (most software will not otherwise be able to interpret the image as anything other than an image).
Formatting
editAs part of the proofreading of a page, the text needs to be formatted in a style similar to that of the original work. In this page, the formatting of the title needs to be be changed from plain text to large centred text.
Before | After |
---|---|
P R E F A C E. | {{c|{{x-larger|PREFACE.}}}} |
This uses two two templates. The first, {{c}}, is a short version of {{center}} which centres the title. The second template, {{x-larger}}, sets the size of the title; there are other versions of this template available if this text size is not appropriate. Note also that the spaces between letters have been removed in this case; this can be a personal choice on the part of the proofreader but it does allow the word to be read correctly by software such as screenreaders for the blind or partially sighted. The {{sp}} template will display the text spaced without breaking it up for screenreaders. Likewise, the {{uc}} template can display the text in uppercase while using title case in the plain text version, when the capitalization in the original is purely for stylistic reasons.
Wikilinks
editNames of authors and other works can be wikilinked to take the reader to the appropriate pages on Wikisource. It is not necessary to insert this kind of wikilink for a page to be considered proofread (or even validated). They can be added either at this stage or later on.
Before | After |
---|---|
Sir Samuel Baker remarks | [[Author:Samuel White Baker|Sir Samuel Baker]] remarks |
Validated text
editHere is the final, validated version of the text on this page, once all of the corrections have been made:
[[File:Picturesque New Guinea - Preface banner.png|center|400px]] {{c|{{x-larger|PREFACE.}}}} {{dropinitial|[[File:Picturesque New Guinea - Initial F.png|100px|alt=F]]}}OR years past, when perusing the account of exploring expeditions setting out for some country comparatively unknown, I always noticed with a pang of disappointment that, however carefully the scientific staff was chosen, it was, as a rule, considered sufficient to supply one of the members with a mahogany camera, lens, and chemicals to take pictures, the dealer furnishing these articles generally initiating the purchaser for a couple or three hours' time into the secrets and tricks of the "dark art," or when funds were too limited to purchase instruments, it was taken for granted that enough talent existed among the members to make rough sketches, which would afterwards be "worked up" for the purpose of illustrating perhaps a very important report. [[Author:Samuel White Baker|Sir Samuel Baker]] remarks, in the Appendix to one of his Works, that a photographer should accompany every exploring expedition. The only one I ever heard of being furnished with that commodity was [[w:HMS Challenger (1858)|H.M.S. "Challenger,"]] on her scientific cruise round the world, but I remember reading in the "Photographic News" the complaint of a gentleman, that so many years had already passed, and still there was no sign of the "Challenger" photographs ever becoming accessible to the public. How this is, or why it should be so, is difficult to tell, but as yet no book of travel, entirely illustrated by artistic views and portraits taken
A second person needs to read the page and approve the final version as correct. They do so by reading through for errors, comparing the final version against the original page image and, if everything is in order, saving the page again with the "Validated" (green) page status.