The Great Gatsby questionsEdit

I am proofreading and validating pages from The Great Gatsby scanned file (the images). Should the punctuation and spelling be altered, based off the scanned images? Windywendi (talk) 00:31, 2 January 2021 (UTC)

@Windywendi: Yes, the text and punctuation should match the images of the scan. The text current in the pages was pulled from and outside source, and needs to be corrected to match the scans. --EncycloPetey (talk) 01:52, 2 January 2021 (UTC)
@EncycloPetey: Thanks, the suggestion has been followed. By the way, while proofreading, is it recommended that the row splittings (which are equivalent to spaces in the depicted text) be situated where the original scanned text splits between rows, so as to reduce the work load of the validator? Windywendi (talk) 02:01, 2 January 2021 (UTC)
No, if you're proofreading, go ahead and collapse the line breaks into complete paragraphs. --EncycloPetey (talk) 02:02, 2 January 2021 (UTC)

Index:Maria Edgeworth (Zimmern 1883).djvuEdit

Just seeing that this work is mostly transcribed, though missing transclusions. Before I wandered over to look at it, just wanted to check that it wasn't something that you were still working upon, and/or it had special needs. — billinghurst sDrewth 21:45, 8 March 2021 (UTC)

Nope. Not planning any further work on this item. It was set up for others to complete. Thanks for asking, though. --EncycloPetey (talk) 22:42, 8 March 2021 (UTC)

Monteiro LobatoEdit

Hi! I found a public domain translation of a Monteiro Lobato work, but seems that only specific people can download it complete -- Google Books digitized it, but didn't made it available there. Thanks, Erick Soares3 (talk) 21:27, 25 March 2021 (UTC)

I have no experience downloading from Hathi Trust. I would ask in the Scriptorium for help. --EncycloPetey (talk) 22:10, 25 March 2021 (UTC)
@Erick Soares3: Index created at Index:Brazilian_short_stories Languageseeker (talk) 22:46, 25 March 2021 (UTC)
@Languageseeker: That index page does not work because it is missing the file extension. The Index page must make use of the full filename, including the extension. Also, I do not see a text layer in the Index, so the file you created will need to be redone. It is not ready for use yet. --EncycloPetey (talk) 22:49, 25 March 2021 (UTC)
@EncycloPetey: There is no extension because the Index is comprised of individual image files for which you do not need an extension. The OCR will need to be generated with the OCR tool. Languageseeker (talk) 22:57, 25 March 2021 (UTC)
The OCR tool is not reliable, and produces a terrible OCR text. That is why the work should be in a PDF or DjVu format, with a text layer added. Also, work comprised of multiple individual images has two additional problems: (a) it makes the work susceptible to single-page vandalism, and (b) it makes it impossible to link to the scan from the Wikidata item with which the work will be associated. There are additional issues with preparing a work this way, which is why it is not recommended except perhaps for works that consist of only a few pages. --EncycloPetey (talk) 23:30, 25 March 2021 (UTC)
Also, take a look at what has happened at Commons for commons:Category:Pericles, Prince of Tyre, where all of the individual pages are swamping the category, and the file names tell the user nothing about their content or connection to other files. This includes File:Mdp.39015053493998 003.png which is essentially an image of a blank page. Creating and uploading vast numbers of unconnected pages to Commons makes it difficult for anyone to know what the pages are or what they are for. --EncycloPetey (talk) 23:35, 25 March 2021 (UTC)
Thanks everyone! Wouldn't be the cause of putting all the individual images on a PDF and then uploading it to Commons? Erick Soares3 (talk) 00:51, 26 March 2021 (UTC)
