This project page is for reference on general cleanup of the pagespace bot postings of DNB volumes.

Progress and troubleshooting table edit

  1. Sandbox1.djvu ‎[264 pages]
  2. Sandbox2.djvu ‎[264 pages]
  3. Sandbox3.djvu ‎[264 pages]
Vol. Index AL? % text[1] Best scan[2] Offset[3] Glitches[4] Comment[5]
1 index   100 [1] (poor) 14 Replaced with djvu from Google Books PDF. Good condition. Like the scan it replaced, without errata.
2 index   100 [2] (good) 12 New djvu, 100 dpi grayscale. Stray note stuck to one of the pages gone, but recorded. (may have had copyrighted material) No errata. Index restored
3 index   100 [3] (good) 6 New djvu file, while readable throughout, contains isolated blurred text. March 25, 3rd djvu version introduced replacing previous one; assumed all correct. 2nd and 3rd versions both have errata introduced into the text apparently by making text on such pages a little smaller so there is no shifting forward of the text onto the next page. Text and images aligned.  Done All text images present.
4 index   100 [4] (good) 4 Better scan, but now index gone.
5 index   100 [5] (good) 8 This 2011 text file includes 1904 Errata corrections, but needs index pages inserted into final six djvu pages of this file. Listing. Replace text.  Done Without terminating indices
6 index   100 [6] (good) 12 Deleted bot applied pages, text and image align, so undeletions possible if poor image scan Better quality version uploaded. Listing.
7 index   100 [7] (good) 6 30 DEC scan requires index pages - currently available on some pages from previous file. Text replaced.   Done Needs index pgs.
8 index   100 [8] (good) 4 Replaced with best scan.
9 index   100 [9] (poor) 6
10 index   100 [10] (fair) 8 Needs new text. Current text is best available (30 Jan 11). Will keep looking… All 15 problematic pages are identified on the index page. The text has been refreshed, but will require an alternative source to proofread and validate pending locating better source. May 1, 2014:Smudged text replaced with Palo Alto scans. Text refreshed for all problematic pages.
11 index   100 [11] (good) 6 Text replaced with the good version identified. Most red pages deleted, though may be some that were not meant to be deleted.
12 index   100 [12] (good) 6 Text images are reasonably good prior to page 368 (with a few that may have blurred sections). All replacement pages >367 are marked and have had text refreshed. Better text needed.
13 index   100 [13] (good) 6 Better text needed. Found. (1-31-11) April 7, 2013: As a caution, neither of the scans contains errata, but someone added errata to a page I was proofreading. Replace text.   Done
14 index   100 [14] (good) 6 Better text needed.
15 index   100 [15] (good) 6
16 index   100 [16] (good) 7
17 index   100 [17] (OK) 6
18 index   100 [18] (poor) 6
19 index   100 [19] (good) 6   Done to this recommended volume, keep same pagination
20 index   24 [20] (good) 6 There is duplication after this to 27; two pages missing after djvu.216; two pages missing after djvu.321; one missing after djvu.345. Better text found (all pages). (1-31-11) Will upload once pages 95-97 are validated with existing images. Adding templates. Replaced djvu file.   Done
21 index   7 [21] (good) 6 One or two pages are missing after each of djvu.231, 343, 377, 382, 386, 389, 392, 395, 396, 407, 414, 417, 427. This is a weird image, apparently mixing two pages. Better text found (all pages). (1-31-11) Text replaced.   Done
22 index   11 [22] (good) 6 The following five pages are torn and have missing characters adjacent to the column edge: pp. 51, 88-90, & 143. Text replaced.   Done
23 index   15 [23] (good)in place 8 Better text needed.  DoneAwaiting validation.
24 index   [24] (good), metadata says wrong volume 14   Done to best quality volume
25 index   98 [25] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
26 index   98 [26] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
27 index   98 [27] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
28 index   4 [28] (good) 6 All text images replaced. Text replaced.   Done Index pages awaiting validation.
29 index   100 [29] (good, but breaks off p.279) 6 DPF vol. 29 452 pages and complete index. vol 29 djvu few pages missing or blurred images.
30 index   99 [30] (poor) 6 Two pages no longer missing after djvu/33 (09-21-2012).
31 index   99 [31] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in early 2013. Google Books PDF (used as base to replace previously flawed DjVu source file).
32 index   98 [32] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
33 index   25 [33] (OK) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in early 2013. Google Books PDF (used as base to replace previously flawed DjVu source file).
34 index   100 [34] (poor) 6
35 index   99 [35] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
36 index   21 [36] (good) 4 Two pages missing after djvu/255. Page missing after djvu/392.
37 index   100 [37] (good) 14
38 index   19 [38] (good) 6 Text layer present; complete pages.
39 index   13 [39] (good) 6 P.275-6 missing characters where ripped, see this; pp. 301, 369, & 373 text images are crowded on one margin; missing characters. New text needed.   Done
40 index   74 [40] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in early 2013. But the new version incorporates the errata, while the previous version did not. Google Books PDF (used as base to replace previously flawed DjVu source file).
41 index   17 [41] (good) 6 Duplicate pair: djvu/94 and 95 duplicate the two previous pages. Better text found (all pages). (1-31-11) AI vol 41 All pages are present, text near one margin on pages 175-78 will be challenging. Replaced File   Done
42 index   98 [42] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
43 index   100 [43] (good) 6 Poor scans throughout the work, mark problematic. Replaced text with new AI source file.   Done.
44 index   15 [44] (good) 12 12 "workable" problematic pages remain and are identified. Replace text.   Done
45 index   9 [45] (good) 8 OCR layer is best scan. Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
46 index   12 [46] (good) 6 In later half of book identified Problematic scans, may be more in first half. Page 176 is quite blank. Pages 408 and 409 duplicate 406 and 407. Text layer present, realigned latter pages; complete pages. All bios transferred to Page: and converted to <pages>
47 index   100 [47] (good) 6 Numerous pages marked as problematic. Candidate for replacement Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
48 index   11 [48] (good) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
49 index   11 [49] (good) 6 Numerous problematic pages; replace file. Updated source file.   Done
50 index   12 [50] (good) 12 (new scan 20091125) Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
51 index   6 [51] (OK) 8 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
52 index   5 [52] (poor) 10 Numbers of illegible pages identified in 2nd half, presumably similar in first half. Available copy poor, rescue job performed. Will need later review when work done to determine further needs. All bios transferred to Page: and converted to <pages>
53 index   98 [53] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
54 index   7 [54] (good) 7 some scans may be indistinct
55 index   13 [55] (good) 6 replace file to fix text misalignment.
56 index   8 [56] (good)

found to be missing pages 165-172.

6 Generated a new djvu version for Commons that is a mix of both files. Version based on Good version with inserts from other available source
All bios transferred to Page: and converted to <pages>
57 index   15 [57] (good) 6 Existing text pages may need to be replaced (<20091107)
All bios transferred to Page: and converted to <pages>
Scan replaced note.
58 index   9 [58] (good) 8 Better text found (all pages). (1-31-11) Uploaded new volume which was intact. Previously proofread text aligns with recently added text images. Volume replaced with complete version.   Done Completed page alignment.
59 index   2 [59] (good) 6 Text layer present; complete pages. Listing. Text layer is not best scan. All bios transferred to Page: and converted to <pages>
60 index   3 [60] (good) 6 DjVu source file replaced. Two pages missing after this now present; Pages 88 and 89 duplicated pages 86 and 87 - two djvu images that should have be there instead are no longer missing (08-30-2012). Text layer present; Better text was found for all pages (1-31-11); All bios transferred to Page: and converted to <pages>.
61 index   100 [61] (good) 6 Templates added, complete pages. All bios transferred to Page: and converted to <pages>
62 index   14 [62] (OK) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
63 index   100 [63] (good) 24 Better text needed; complete pages, though p. xviii needs rescan. All bios transferred to Page: and converted to <pages>

Notes edit

  1. Percentage text added. Apart from vol. 1, text completed generally ranges between 2% and 4%. Text verified and marked is currently negligible. One point is that the project page suggests ligatures should be added; the æ ligature is very common in the DNB. Another point that needs to be visited is the use of strike-through text to show a "diff" for a later edition.
  2. The code here is "good" for the Toronto scans, "poor" for the Google scans, and "OK" for the Hyderabad scans that are of intermediate quality. This is the rule-of-thumb only: in some cases the Toronto scan for a page may be so corrupt that another scan works better.
  3. The offset is the difference of the djvu file number and the page number in the volume. This ought to be consistent throughout the volume: if it currently is known not to be, the entry is "n/a" and the next column gives details.
  4. The bot-generated initial postings have imperfections, to be noted here.
  5. Points include: the best scan may not have been used by the bot ("better text needed"); progress in formatting at least all the author templates.