Scan Lab
Shortcut:
WS:LAB

A central resource for assistance with creation, downloading, uploading, processing and other operations on scans of texts.

Times have changed, but it still can be hard to put 600 pages in the right order!
Instructions

If you need help with a scan, add your request in the relevant section below as a new sub-section. If you can, include all the details someone will need to work on the request without further questioning. You can use {{ping project|Scan Lab}} to send an immediate notification to all subscribed Scan Lab members. Once you have been answered, ping only that user when you reply with {{re|Their username}} (do not ping the whole project on every comment).

If your request has been completed, you should acknowledge that your issue is resolved and close the section with {{section resolved|1=~~~~}}.

Participants Edit

Add your name to Module:Mass notification/groups/Scan Lab to be notified via {{ping project|Scan Lab}}. Also add your name below with details of any particular tasks you can help with.

Participant Can help with Instructions
Inductiveload
  • General scan tasks: scraping/download, batch uploads, scan repair
  • Splitting/combining scan images/photos from a scanner or camera into scan file (with ScanTailor)
Xover
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Mpaa
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)

Requests for downloading scans Edit

Instructions

If you would like scans that already exist online to be transferred to Wikisource, leave a message here. This includes batch transfers from the Internet or Hathi Trust for multi-volume works. Please include necessary bibliographic information so that scans can be uploaded to Commons with proper information and license templates. Author, country, and date of first publication. A suggested file name on Commons can also be helpful.

Jane Austen Juvenilia Volume 2 and 3 Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The scans of the manuscripts of Austen's Juvenelia are available on here and here. They're both in the PD, but I have absolutely no clue as how to download them. The images are higher resolution than the ones on the BL website, but they're in the zoomify flash format. Languageseeker (talk) 02:58, 2 February 2022 (UTC)Reply[reply]

  • Languageseeker: I know Volume the Second is in the public domain; it’s already been transcribed here. Are we sure that Volume the Third is in the public domain? It could easily fall into a copyright trap, so I just want to make sure. TE(æ)A,ea. (talk) 22:59, 8 February 2022 (UTC)Reply[reply]
    • @TE(æ)A,ea. The British Library has it listed as "Public Domain in most countries other than the UK." Languageseeker (talk) 23:07, 8 February 2022 (UTC)Reply[reply]
      So it looks like it was definitely published in 1951 (which would imply copyright expiry in 2001 in the UK as 50 years after publication), which makes the UK copyright claim weird. If true that would postdate the URAA date ... MarkLSteadman (talk) 00:06, 9 February 2022 (UTC)Reply[reply]
      That is volume 3 (Evelyn and Kitty the Bower). Volume 1 was published in 1933 (so it was in the PD on the URAA date). MarkLSteadman (talk) 00:19, 9 February 2022 (UTC)Reply[reply]

How Henry Ford is regarded in Brazil Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The scan can be downloaded easily enough (here), but the pages need to be split. The translation was published in 1926 and the original author died in 1948 (Brazil is PMA 70), not sure of the translator’s years. TE(æ)A,ea. (talk) 01:06, 17 June 2023 (UTC)Reply[reply]

@TE(æ)A,ea.: We're going to need the identity and vital years for Aubrey Stuart in any case, so you might as well start the research on that (Billinghurst might be able to help; they're an absolute genious at that kind of thing). I took a quick look at the scan and it looks like another one of those cases where it looks like it should be possible to automate it, but every time I think I've found ways to do that I've been disappointed and had to do it more or less manually (i.e. sloooooow and labour-intensive). I'll take a look at some point when I have the time available (IRL is keeping me busy just now), but you should probably be prepared for this taking some time (unless the tools actually come through for me this time). Xover (talk) 09:29, 17 June 2023 (UTC)Reply[reply]
I wonder if this is Aubrey Newton Stuart d. 1964 in Rio, and whether this is the Aubrey N. Stuart, the chess columnist who emigrated from Georgetown, Guyana to Brazil. MarkLSteadman (talk) 12:48, 17 June 2023 (UTC)Reply[reply]
@MarkLSteadman: Thanks. That does sound like a likely candidate. Xover (talk) 17:36, 21 June 2023 (UTC)Reply[reply]
  • Xover: It was published in 1926, so it’s PD-US in any case. TE(æ)A,ea. (talk) 13:51, 17 June 2023 (UTC)Reply[reply]
    @TE(æ)A,ea.: Sure. But we still need to document the actual copyright status to determine whether to host it here or on Commons (and if here, why we can't have it on Commons). Xover (talk) 17:36, 21 June 2023 (UTC)Reply[reply]
    • Xover: I think Commons assumes 120 years after publication for authors with unknown dates of death, so I think PD-US plus do not move to Commons|reason=translator’s death year undetermined. TE(æ)A,ea. (talk) 17:45, 21 June 2023 (UTC)Reply[reply]
      I believe they distinguish between a known person whose death date is unknown, and an person who is unknown (or anonymous). The laws are easier to interpret when the identity is unknown versus a known person with an unknown death year. --EncycloPetey (talk) 17:51, 21 June 2023 (UTC)Reply[reply]

"Everywoman's World" Edit

I would like to start scan-backing The Alpine Path whose originally publication was in this Canadian journal. There are scans available here for the installments containing this work:

  • June 1917 [1]
  • July 1917 [2]
  • August 1917 [3]
  • September 1917 [4]
  • October 1917 [5]
  • November 1917 [6] (the first page is listed as missing but the cover is there so it can be just ignored)

Happy to come back after I do each one if you prefer but just listed them all here in one go. They will need to be uploaded locally with PD-US|pubyear=1917 and move to commons|expiry=2037 as being from 1917 and originally published in Canada they are before the 120 year old assumed date. MarkLSteadman (talk) 23:04, 16 July 2023 (UTC)Reply[reply]

The Recluse Edit

From here: files 4–83. U.S. publication from 1927, so acceptable at Wikimedia Commons. TE(æ)A,ea. (talk) 00:20, 27 August 2023 (UTC)Reply[reply]

Mooresville, Indiana High School yearbooks, 1914–1930 Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) These scans exist in the form of galleries on the Mooresville High School Alumni Association's Facebook page, and extracting them by hand is tedious enough that I'm hoping someone can do it with a bot. The procedure I have in mind is:

Thanks! —CalendulaAsteraceae (talkcontribs) 01:59, 22 September 2023 (UTC)Reply[reply]

Finding scans Edit

Instructions

Requests for locating scans for existing works at Wikisource, or works you wish to add yourself but cannot find scans for. For general text requests, see Wikisource:Requested texts.

The Criterion Volume 2 and 3 Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Would it be possible to locate Volumes 2 and 3 of The Criterion? I'm especially trying to complete The Woman Who Rode Away that began in Volume 3. Languageseeker (talk) 18:36, 23 December 2022 (UTC)Reply[reply]

The History of the Decline and Fall of the Roman Empire volume 2 Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Is it possible to find a higher quality scan of this work than Index:Decline and Fall of the Roman Empire vol 2 (1897).djvu? This is the Revised edition, edited by J.B. Bury. It is too low quality to be proofread and has a number of cutoff pages. MER-C (talk) 16:28, 8 July 2023 (UTC)Reply[reply]

Scan repair Edit

Instructions

Request repair work on existing scans here.

When requesting page insertion, rearrangement or deletion, always include the page numbers (as marked on the pages) as well as the position of the page within the scan file. This makes it much easier for the repairing user to locate the defect in the file and fix it, as well as allowing a double-check against mistakes.

Please do not use this page to request repairs on works that you don’t really care about: the backlog at Category:Index - File to fix is a known backlog. If you want to help with those, you can add {{missing pages}} to those indexes if they do not already have it, along with details of the missing pages.

Index:Plato (IA platocollins00colliala).pdf Edit

There seems to be a problem with page 154: Page:Plato (IA platocollins00colliala).pdf/166 which is near completion on the Monthly Challenge. A clean copy of the page can be found in this scan, but note that this scan contains two separate works, so the first page 154 (Plato) is the correct one.. --EncycloPetey (talk) 20:31, 6 July 2023 (UTC)Reply[reply]

@EncycloPetey: PDF files can't be repaired in place, so I'd have to make a new DjVu and migrate everything over. Are you sure we want to do that for a text that's already fully validated (even if we had to cheat on the broken page)? Xover (talk) 06:36, 31 July 2023 (UTC)Reply[reply]
@EncycloPetey@Xover What is it that needs repairing? If I download the PDF from Commons and go to page 154 (the one that is not displaying properly in Wikisource) it displays properly in various PDF readers/editors and browsers. That being the case, isn't the problem somewhere else, since the file does not contain a page that looks anything like the one currently showing at Page:Plato (IA platocollins00colliala).pdf/166? Chrisguise (talk) 09:58, 31 July 2023 (UTC)Reply[reply]
phab:T343145 filed. Probably a problem with MediaWiki not thumbnailing this PDF properly. MER-C 16:54, 31 July 2023 (UTC)Reply[reply]
@Chrisguise: Right. It's probably a bug in MediaWiki-PDF (I've seen this happen from time to time). But it still prevents proper proofreading/validation even if it's the thumbnail and not the PDF. My inclination would be to just live with it, but I have no dog in this race. Xover (talk) 18:34, 31 July 2023 (UTC)Reply[reply]

Index:Narrative of a four months' residence among the natives of a valley of the Marquesas Islands; or, a peep at Polynesian life (IA b22022430).pdf Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Pages 234 and 235 of this otherwise proofread transcription are missing. MER-C 17:27, 5 August 2023 (UTC)Reply[reply]

Pages are available at Google Books . MarkLSteadman (talk) 06:58, 14 August 2023 (UTC)Reply[reply]
I have uploaded a fixed PDF. If someone could handle the move and renumbering that would be great! Starting with Page:Narrative of a four months' residence among the natives of a valley of the Marquesas Islands; or, a peep at Polynesian life (IA b22022430).pdf/256 shift +2. MarkLSteadman (talk) 19:01, 2 September 2023 (UTC)Reply[reply]
And I have moved everything around so that we have all the pages in the scan and the proofread pages match the scans with the correct page numbers. There is still the duplicated pages 242 and 243 (264 to 267 of the scan) as I have left that intact. MarkLSteadman (talk) 00:34, 16 September 2023 (UTC)Reply[reply]

Index:Brinkley - China - Volume 1.djvu Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) for inclusion in September's Monthly Challenge - the plate facing page 256 is missing. MER-C 11:11, 19 August 2023 (UTC)Reply[reply]

Index:The history of medieval Europe.djvu Edit

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Book pages xvi-xvii are missing, that is...
Page:The history of medieval Europe.djvu/25 is source page xv;
Page:The history of medieval Europe.djvu/26 is source page xviii
snafu22q

Entire djvu text off by one page Edit

I just uploaded Signs and Wonders God Wrought in the Ministry for Forty Years by Maria Woodworth-Etter, with the index page at Index:Signswondersgodw0000wood.djvu. Each page in the djvu has the OCR text for the following page. What is the best way to fix this? Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Heyzeuss (talk) 21:13, 15 September 2023 (UTC)Reply[reply]

EDIT: It looks like the whole djvu file could use a better OCR, assuming that technology has improved since the book was first scanned and had OCR applied to it. It tested the various OCR tools available for single pages, and Google OCR seemed to be the best, and definitely better than the text in the file at Internet Archive. Heyzeuss (talk) 11:52, 16 September 2023 (UTC)Reply[reply]
EDIT 2: See sample page Page:Signswondersgodw0000wood.djvu/33. Heyzeuss (talk) 14:44, 16 September 2023 (UTC)Reply[reply]
@Heyzeuss: Should be fixed now. Xover (talk) 14:12, 18 September 2023 (UTC)Reply[reply]

See also Edit

  • Commons:Graphic Lab at Wikimedia Commons - they can help with general image problems
  • Image extraction - guidance for extracting images from scans
  • Requested texts - general text requests. Many of these also need scans to be located.
  • Category:Index - File to fix - contains indexes that have various defects. Please do add templates like {{missing pages}} if needed to indicate what the problems are, but please do not bring the files here unless you would like it fixed to allow work in the near future.