Wikisource:Scriptorium/Help

(Redirected from Wikisource:SCH)
Latest comment: 5 hours ago by CalendulaAsteraceae in topic Scan resolution (question for the technical people)

The Scriptorium is Wikisource's community discussion page. This subpage is especially designated for requests for help from more experienced Wikisourcers. Feel free to ask questions or leave comments. You may join any current discussion or a new one. Project members can often be found in the #wikisource IRC channel (a web client is available).

Have you seen our help pages and FAQs?



add 1939 report of the commission on the palestine distrubances of august 1929

edit

i was wondering how to best start putting https://commons.wikimedia.org/w/index.php?title=File:Report_of_the_Commission_on_the_Palestine_Disturbances_of_August_1929_cmd_3530.djvu&page=2 , resp https://unispal.un.org/pdfs/Cmd5479.pdf here. in 2 aspects, first how to put at all, and second, how to handle the rather comlex formatting. ThurnerRupert (talk) 00:55, 5 June 2024 (UTC)Reply

To add the text here, begin by creating the Index:Report of the Commission on the Palestine Disturbances of August 1929 cmd 3530.djvu. Then proofread each page to match the original. If the formatting is complex, then this might not be a good choice for creating your first work here. You might try some of the community collaboration works listed through the main page before starting a challenging work. --EncycloPetey (talk) 00:59, 5 June 2024 (UTC)Reply

Question about updating PDF later

edit

So I am working on scanning a large U.S. government online document into a PDF. I currently have 47 pages. I know the Commons allows you to post a new version of a image/PDF. My question is would that screw up stuff over here on Wikisource? So say I uploaded the PDF as-is right now with 47 pages to work on transcribing those (basically 3 chapters done of it). If I then later go back and scan in more images, should I upload that as an entirely separate thing on the Commons or just upload a new version of the PDF? Basically, what should I do in this circumstance. Note, there is probably 200-ish (or more) pages of the document. WeatherWriter (talk) 20:36, 7 June 2024 (UTC)Reply

It's best to transcribe from a complete PDF. Yes, altering a PDF during transcription can create problems. This is one reason we have a setting option on the Index page to indicate a scan needs to be repaired before proofreading. --EncycloPetey (talk) 20:47, 7 June 2024 (UTC)Reply

Deciphering bl capital

edit

Sort of silly question, but there, I can't manage to understand what the first letter of the title is, it's in a {{bl}} variant I'm not familiar with. (there is no TOC in this book, so can't use that). It's not a common word either, ending in "idigeigi". Google OCR says it's an H, but it doesn't look like that. A G, maybe? — Alien333 (what I did & why I did it wrong) 15:23, 10 June 2024 (UTC)Reply

Ha, that's a tricky one, but there's an identical character on page 26 which confirms that it's an H. —Beleg Tâl (talk) 15:29, 10 June 2024 (UTC)Reply
Good find, thank you. — Alien333 (what I did & why I did it wrong) 15:49, 10 June 2024 (UTC)Reply

Wrong text layer and OCR

edit

On this page, the text layer is offset (for some reason). However, when I tried to generate OCR for the page, it generated OCR from the same (wrong) page! TE(æ)A,ea. (talk) 23:43, 12 June 2024 (UTC)Reply

It's not offset for me, at least as far as I can tell. --EncycloPetey (talk) 23:55, 12 June 2024 (UTC)Reply
I've created the text from what I see. If it is the correct page, then you may need to clear your browser cache to see the correct page. --EncycloPetey (talk) 23:58, 12 June 2024 (UTC)Reply

Tilted manually scanned pages

edit
 
Righted tilted page

This book was manually scanned and the resulting text layer is needs to be retyped manually.

I can straighten the page One of the many but then, so what? What options do I have? — ineuw (talk) 04:04, 16 June 2024 (UTC)Reply

@Ineuw: I don't understand the question. Why do you want to straighten this page? If you're asking how to straighten all pages in this scan then I advise against it: it's a lot of manual work, and the benefits are limited. Xover (talk) 06:28, 16 June 2024 (UTC)Reply
Thanks. That's what I thought, but needed an experienced opinion. I will retype it when needed. — ineuw (talk) 06:33, 16 June 2024 (UTC)Reply

Help fixing Index:Tropical Cyclone Report – Hurricane Katrina.pdf

edit

I am unsure what I did, but the pagelist for the PDF is not displaying. I tried to display commons:File:Tropical Cyclone Report – Hurricane Katrina.pdf. WeatherWriter (talk) 16:07, 18 June 2024 (UTC)Reply

Fixed. In general, you can try purging the file page on enWS (e.g. File:Tropical Cyclone Report – Hurricane Katrina.pdf) by adding ?action=purge to the end of the URL, then purging the index page. —CalendulaAsteraceae (talkcontribs) 17:43, 18 June 2024 (UTC)Reply
I think it works better to purge it at commons, and also the ?action=purge only works (I think) if you're already in index.php, so it's simpler to use of of the gadgets that do that. — Alien333 (what I did & why I did it wrong) 07:12, 19 June 2024 (UTC)Reply

Scan resolution (question for the technical people)

edit

I'm getting frustrated with the poor quality of the scan image when proofreading A Dictionary of Hymnology. Have a look at Page:Dictionary of Hymnology 1908.pdf/44—the fine print is barely legible, even though I have increased the "Scan resolution in edit mode" to 2000. When viewing the PDF directly, the print is perfectly crisp.

I am guessing that the Wikimedia software takes the scan image at its default resolution, heavily JPG-compresses it, then increases the resolution of the compressed image, rather than scaling up before converting and compressing. This results in high-fidelity images of JPEG artefacts instead of actually usable scan images. I also have found a related task phab:T38597, to replace JPG with PNG in these images, which would presumably mitigate this issue—but this ticket is ten years old and hasn't been touched for years.

Anyway, my question is this: is there any way to improve the scan image inside ProofreadPage? Or do I just have to open the PDF in a separate window (which is what I have been doing)? —Beleg Tâl (talk) 18:06, 2 July 2024 (UTC)Reply

Don't know much about it, but there was a discussion a few months ago about the same problem and there the answer given was to use DjVu, not PDF. — Alien333 (what I did & why I did it wrong) 18:36, 2 July 2024 (UTC)Reply
Lol thanks, should have searched the archives first :D —Beleg Tâl (talk) 18:49, 2 July 2024 (UTC)Reply
Taking a quick look at the code, the PdfHandler extension generates jpgs which are then retrieved by us. Which jpg is retrieved might vary but it doesn't regenerate the images at a higher resolution if the original conversion is a poor representation. MarkLSteadman (talk) 21:31, 2 July 2024 (UTC)Reply
It may be possible to regenerate the pdf outside and then upload it such that the conversion goes smoother. MarkLSteadman (talk) 21:36, 2 July 2024 (UTC)Reply
I use User:Inductiveload/jump to file, which is a very useful workaround if the file is from one of the sources it supports, although it is a workaround rather than a proper fix. —CalendulaAsteraceae (talkcontribs) 02:00, 3 July 2024 (UTC)Reply