Wikisource:Scriptorium/Help

(Redirected from Wikisource:S/H)
Latest comment: 5 minutes ago by RaboKarbakian in topic Rotated book

The Scriptorium is Wikisource's community discussion page. This subpage is especially designated for requests for help from more experienced Wikisourcers. Feel free to ask questions or leave comments. You may join any current discussion or a new one. Project members can often be found in the #wikisource IRC channel (a web client is available).

Have you seen our help pages and FAQs?



Template:Symbol missing

edit

This is supposed to add a specialized tracking category only if specifically mentioned, but it does so in every case. I’m not sure what the intended function is, though, so I hope someone else will make the appropriate change. TE(æ)A,ea. (talk) 16:52, 10 May 2025 (UTC)Reply

I don't see anything wrong here. Maybe I'm missing something in the documentation, but the only if/then statement is "if you have a character from a specific language or script, then choose that" (e.g. Korean or Hebrew). If you insert {{symbol missing}} then I think it's supposed to have the tracking category, near as I can tell. Can you give an example of a page that shouldn't be tracked but is? —Justin (koavf)TCM 16:59, 10 May 2025 (UTC)Reply

Colored Bulleted List

edit

How do I make a bulleted list where the bullet points are colored? ToxicPea (talk) 16:58, 18 May 2025 (UTC)Reply

CSS:
.whateverparentselector li::marker {
  color: whatevercolor;
}
Should do the trick. — Alien  3
3 3
17:39, 18 May 2025 (UTC)Reply

multicol template current best practices

edit

I am using multicol for strict parallel columns across many pages, paragraph by paragraph (showing a translation).

1. Is nop or nopt required when using this template? (It seems not, since every page ends with multicol-end.)

2. What is the best practice for paragraphs that span pages? I have tried putting codes in the footer/header within the split (like for tables), but it does not work. I have searched the Scriptorium archives and looked at a lot of pages/works that use multicol, trying to find examples. The best I have found so far appears to be copying the entire rest of the paragraph (from the next page) into the footer of the page on which the paragraph begins, with a multicol-end code also in the footer, then using noinclude (?) for the same text on the next page (I may have that wrong). Is this really the best way? Thanks for any suggestions, and especially any links to pages using a particular technique, so that I may copy/paste. Laura1822 (talk) 18:18, 25 May 2025 (UTC)Reply

Hello @Laura1822,
Regarding point 1. Multicol breaks paragraphs between pages, whether you want it to or not (as you have seen). Thus, you do not need nop/nopt.
Regarding point 2. I am not sure of the best practices for multicol per se, but using noinclude and includeonly should work. However, neither needs to be placed in the footer. I have provided an example on pages Page:Blessedbegodcomp00call.pdf/98 and Page:Blessedbegodcomp00call.pdf/99. Note also that hws/hwe doesn't help with noinclude/includeonly, and the hyphenated word has to be handled manually (as in the example). I also recommend transcluding as you go, to make sure this is all working, and to identify other pages in the text where noinclude/includeonly should be added.
For reference, I believe the only way to avoid copy pasting text from page to page would be to use a raw html table, rather than the multicol environment. If you would prefer to do this, I could help set something up, but otherwise, the above should work.
Regards,
TeysaKarlov (talk) 21:28, 25 May 2025 (UTC)Reply
Thank you so much! I will study what you have done tomorrow. I thought perhaps the "answer" was tables, but I am not very good at tables. I have been transcluding at User:Laura1822/sandbox3. Laura1822 (talk) 00:43, 26 May 2025 (UTC)Reply
P.S. @Laura1822 I did a little more testing, and you can probably save yourself some time by placing half the duplicate text in either the header/footer (as you said in your original post), to save you typing out the <noinclude> tags. Then you need only add the includeonly tags on the previous/next page. I have added another example on pages Page:Blessedbegodcomp00call.pdf/97 and Page:Blessedbegodcomp00call.pdf/98, placing text from djvu/97 into the footer, and using includeonly on the next page (djvu/98), noting in this case that spaces should appear after the last word inside the includeonly tags, not before the first word outside the includeonly tags. I also modified the example on djvu/98 and djvu/99, this time moving text from djvu/99 into the header, and using includeonly on the previous page (djvu/98). Hopefully this will be slightly faster. Regards, TeysaKarlov (talk) 20:37, 26 May 2025 (UTC)Reply
Thank you. I am studying. I did experiment myself just a little in the paragraph from 98-99, because I saw that there is an extra blank line in the transcluded text. I wondered if it was because the multicol-related templates were outside the include-related tags, but when I enclosed them all within the tag, it made no difference that I could see. I still don't quite understand how it's all supposed to work, but I will study it some more. Thank you for helping me. Laura1822 (talk) 12:24, 27 May 2025 (UTC)Reply
@Laura1822 I am not sure if there was an entire blank line, but I agree that there was a small gap in the 1px black line which runs along the center of the multicol environment. What I should have done (and which fixes the issue), is to start and end the multicolumn environment on the same page (so either duplicate all the content from the first page to the second, or duplicate all the content from the second page to the first; and if the multicol would have to span three pages, then you need a table, as far as I am aware). I have accordingly corrected the djvu 98-99 example (the djvu 97-98 example was already fine). Apologies for any past confusion, as I do not use multicol all that much. Note that you also want to place the line=1px solid arguments in the multicol-section template calls in most cases. Regards, TeysaKarlov (talk) 21:13, 27 May 2025 (UTC)Reply
Thank you, I just figured out that last bit about the vertical line yesterday. I use inverted colors so I thought I just wasn't seeing it because it was black on black, but it turns out that I simply misunderstood what I saw on a page in another book that I was copying from. I will study what you changed. I will get it all figured out eventually! Laura1822 (talk) 14:35, 28 May 2025 (UTC)Reply

include

edit

Can anyone please direct me to a Help page that will explain the purpose of the tags includeonly, noinclude, and onlyinclude, explaining their purposes and how and why they are intended to be used? Laura1822 (talk) 14:39, 26 May 2025 (UTC)Reply

We have no such page in WS, but w:WP:NOINCLUDE covers the three of them. — Alien  3
3 3
15:14, 26 May 2025 (UTC)Reply
Thank you!!!! Laura1822 (talk) 16:29, 26 May 2025 (UTC)Reply

Continuing a TOC across multiple pages with TOC row template

edit

I had an issue with the table of contents formatting for pages 18, 19, 20, 191, 192, 193, 194, and 195 of my transcription for The Life, Studies, and Works of Benjamin West. I used the TOC row 1-dot-1 template, but couldn't figure out how to extend it over different pages when the table of contents continues. I'm sure there's a way to do this, but I don't have that knowledge. Any help would be appreciated. JohnSon12a (talk) 22:27, 26 May 2025 (UTC)Reply

AFAIK there isn't really a clean way to make a single line continue across pages. The approach you took of moving the end of it to the previous page seems good to me. Though you should use a comment (<!-- text here -->) for such notes, rather than putting them visible in the content. — Alien  3
3 3
05:31, 27 May 2025 (UTC)Reply
  • JohnSon12a: I’m not sure that it’s possible with the TOC row family of templates, but if you switch to dtpl you can! For the part on the bottom of the first page, put dotted TOC page listing/top in the text body and dotted TOC page listing/suspend in the footer, and then on the top of the second page, put dotted TOC page listing/resume in the header and dotted TOC page listing/bottom in the body. TE(æ)A,ea. (talk) 20:45, 28 May 2025 (UTC)Reply
    {{dtpl}} really shouldn't be used, because it gives unreasonably large outputs. See [1] for size comparisons. Why? dtpl makes one separate table for each row, plus its dot leader hack is noticeably worse than others (putting {{gap}}s in there).
    IMO the best solution that would include keeping content on its page, would be taking the first half of the markup of {{TOC row 1-dot-1}}, putting it at the end of the first page, and putting the second half of the invocation at the top of the second page. I could try and make a TOC row template to do this, one of these days.'Alien  3
    3 3
    20:54, 28 May 2025 (UTC)Reply

The Mythology of All Races

edit

Dear Friends.

I just looked at The Mythology of All Races. It seems to me that all the volumes are out of copyrights, yet there are many tomes that have red links. As a person that kind of likes mythologies, this makes me sad. Could I do something about this? Still, the books look kinda longish, I believe I would need to manually check every page, and I have some other projects right now to keep me busy, thus, I cannot promise anything in my current situation.

How much, do you think, as experienced users, time would be needed to be spent in order to publish all the tomes of this collection? And how hard and time-consuming is following all the steps in adding new content to Wikisources?

Best wishes!

-- Kaworu1992 (talk) 00:02, 27 May 2025 (UTC)Reply

It depends on how much time you have daily, but I'd say one or two weeks per 400-page volume, so if one person were to focus only on this it would perhaps take somewhere between two and four months. (This is a very rough approximation; the actual time something takes also depends on IRL events, on complexity of formatting, on motivation, on OCR quality, and whatnot. The question of "how long will this take" is really hard to answer.) — Alien  3
3 3
20:43, 27 May 2025 (UTC)Reply

Issue with imported file

edit

I am attempting to create an index for The Making of Americans by Gertrude Stein. I used IA-upload to create a .djvu located at File:The_Making_of_Americans,_1925.djvu. The file currently is listed as having 1,000×1,500 px. dimensions on Wikimedia Commons but is listed as having 0×0 px. dimensions on Wikisource. I do not know why this happened or how to fix it, and there does not seem to be documentation about this. Please help.

Alef.person (talk) 00:15, 29 May 2025 (UTC)Reply

You may need to clear your cache. It is displaying correctly for me. --EncycloPetey (talk) 02:49, 29 May 2025 (UTC)Reply
The 0x0 bug that DJVUs get (as opposed to the PDF-specific one) disappears quite fase; probably the servers emptied their cache in the two-ish hours.
@Alef.person: in general the way to fix this is to purge the server's cache; for this you can use the "UTCLiveClock" gadget from c:Special:Preferences#mw-prefsection-gadgets. — Alien  3
3 3
06:34, 29 May 2025 (UTC)Reply

Search *inside* all works by author?

edit

Let's say I heard of a saying (eg. "corsi e ricorsi storici") attributed to a particular author ("Giambattista Vico"), and I want to find out where and if the author himself actually uses this expression. I would search for "ricorsi", but global full-text search is way too much information to sift through, and the only other option is manually searching all works, which is still very time-consuming.

Any ideas for searching all works all at once? AddyLockPool (talk) 10:35, 30 May 2025 (UTC)Reply

I'd do insource:"corsi e ricorsi storici", and then turn on all namespaces. Ought to work. — Alien  3
3 3
10:55, 30 May 2025 (UTC)Reply

I think I need two new TOC-row templates

edit

Per my current sandbox example, I'm trying to format a TOC with the {{TOC row}} templates, and I can get really close to the source format, but I'm encountering a couple hangups. Specifically, I'm using {{TOC row 1-1-1-1}}, and I don't know how to how to get the 2nd and 3rd columns center aligned, like {{TOC row 1-c-c-1}}, which doesn't exist.

But that one's just the header row, so maybe there's another workaround that doesn't involve the creation of a 56th template.

The other hangup is the need for dots and left-alignment in the 2nd column, via something like {{TOC row 1-ldot-dot-1}}. The closest existing template I could find is {{TOC row 1-l-dot-1}}.

I'm open to any kind of suggestions. Thanks everybody! Xaxafrad (talk) 02:40, 5 June 2025 (UTC)Reply

IMO, this doesn't warrant new templates. Just use CSS. For alignment, you can use index CSS to realign everything. For instance: .wst-toc-row-1-1-1-1 td:nth-child(2), .wst-toc-row-1-1-1-1 td:nth-child(3) { text-align:center }. — Alien  3
3 3
05:13, 5 June 2025 (UTC)Reply
Is there any documentation here that helps with explaining how to do this? CSS is great for those who know how to parse and write it, as well as when there are good examples in the documentation (like adding AuxTOC coloring to a TOC-row template), but in other cases finding a template or combination of templates is more straightforward. —Tcr25 (talk) 14:12, 5 June 2025 (UTC)Reply
Okay. How do I access the index.css for a page? Should I insert your code example into Template:TOC_templates/styles.css? Xaxafrad (talk) 02:24, 6 June 2025 (UTC)Reply
It's at Index:Name of work/styles.css. The easiest way to get there is to edit any page, scroll to the bottom and you'll see the CSS linked. If there's nothing there (which is true for 90%+ of works), it will be a red link. —Justin (koavf)TCM 02:57, 6 June 2025 (UTC)Reply
Even easier, when you're in the index page, at the top left, next to "index" and "discussion" there is "styles", which leads to the index css. — Alien  3
3 3
05:08, 6 June 2025 (UTC)Reply
Those are incredibly straight-forward instructions, thank you both so much! I tried reading through the page on Wikipedia about TemplateStyles[2], which "work on all types of pages, not just templates, despite the name." I couldn't understand anything there without studying a bunch of other material. This ought to work on a sandbox page, so I can tinker with it and learn by doing, now. Thanks again!! Xaxafrad (talk) 05:11, 6 June 2025 (UTC)Reply
Happy to help. Not sure what your proficiency is with CSS, but if you need help with something, let me know: I'm pretty okay at tinkering with CSS, particularly prior to 4.0. —Justin (koavf)TCM 05:24, 6 June 2025 (UTC)Reply

The first template has been handled. The second template has me stumped bad. I've looked at several template, style.css pages, and help pages, but I still have no clue how to add dots to .wst-toc-row-1-l-dot-1 td:nth-child(2). It looks like several wrapper classes were needed to implement the dots in the first place, so maybe it's not as easy to replicate them in another td element. Or if it is easy, I have no clue how to identify the parent class/div names, nor the inheritance syntax for applying it. Xaxafrad (talk) 00:46, 9 June 2025 (UTC)Reply

Weird symbols

edit

Right before the title of many poems of Index:XLI poems.djvu, there is a weird sort of symbol (ex Page:XLI poems.djvu/15). It's never the same. To me, it looks like a hastily hand-drawn circle. Given this, and that its position varies (horizontally close to the center but not centered), I think that it is an annotation of a librarian or whatnot, post-publication. And so I've concluded that these shouldn't be transcribed. Do you agree? — Alien  3
3 3
14:53, 6 June 2025 (UTC)Reply

Given the inconsistent shapes, usage, and placement, I'd agree with the assumption that they aren't part of the work. —Tcr25 (talk) 16:24, 6 June 2025 (UTC)Reply
Agreed that this seems to be someone scribbling in the book. I'd say assume that until you know otherwise (e.g. if you find another scan that has it). —Justin (koavf)TCM 16:46, 6 June 2025 (UTC)Reply
https://babel.hathitrust.org/cgi/pt?id=mdp.39015059899487&seq=266&q1=+the+sky+was+can+dy is a reprint of the poem a couple years later, and shows no sign of any such mark.--Prosfilaes (talk) 02:37, 7 June 2025 (UTC)Reply
Case closed. 🙅 —Justin (koavf)TCM 03:31, 7 June 2025 (UTC)Reply
Thanks to everyone for the input! — Alien  3
3 3
06:49, 7 June 2025 (UTC)Reply
You do a lot of great work around here and are very helpful. Let me thank you as well. —Justin (koavf)TCM 07:55, 7 June 2025 (UTC)Reply

Problems with transcluded texts

edit

Referring to: The Chinese language and how to learn it/The Written Language#

There are a large number of errors - figures turned into random modern characters, spaces added into words, characters deleted from words and spaces inserted into their places, chinese characters rendered as modern QWERTY symbols, etc.

In non-transcluded texts, I'd mark the page as problematic. What's the protocol for pointing out problems with a page (problems which the reader doesn't have the skill/knowledge to fix)? Grayautumnday (talk) 17:11, 7 June 2025 (UTC)Reply

This chapter was transcluded too early as some the pages are not proofread. Click on the page links in the left margin of the text for the problem pages and mark them as problematic in the Page: namespace. This will be represented in the status bar at the top of the transcluded text. Beeswaxcandle (talk) 18:11, 7 June 2025 (UTC)Reply

Documents with numbered pargraphs

edit

I have a document where the paragraphs are numbered throughout the entire document; is there set (or recommended) way to implement these numbers?

Example page with the numbered paragraphs:Page:The_collapse_of_NATM_tunnels_at_Heathrow_Airport.pdf/9 -- The Navigators (talk) 02:38, 9 June 2025 (UTC)Reply

(Wording error in original question fixed) --The Navigators (talk) 02:51, 9 June 2025 (UTC)Reply
Hello @The Navigators,
Two options come to mind. The first would be to implement these numbers with sidenotes, e.g. Template:Sidenote. The problem is, sidenotes are trouble from a technical point of view, and the likelihood that they render correctly in both page and namespace is probably low. If you see my example edit on Page:The collapse of NATM tunnels at Heathrow Airport.pdf/9, to reduce the left margin below the 11em default has led to the line numbers overflowing left (into the wiki toolbar links on the left). This will (sort of) correct itself once transcluded, although may look worse for wide layouts (e.g. layout 1). The other option, see example edit on Page:The collapse of NATM tunnels at Heathrow Airport.pdf/10 is to use Template:Pline. Adjusting the (now small) margin might not look so great either, but at least it isn't (at present) colliding with any text. It may also be possible to apply changes across the entire work with some custom css styles, if you are interested in pursuing the Template:Pline option (e.g. changing the color or margin - please ask if unsure how to go about this).
Regards, TeysaKarlov (talk) 05:03, 9 June 2025 (UTC)Reply
Other possibility: a table. Where the left column is the para numbers, and the right column is the para. For unnumeberd paras (like titles) you can just leave the left cell empty. — Alien  3
3 3
10:59, 9 June 2025 (UTC)Reply
The template {{numbered div/s}} might be of help here. ToxicPea (talk) 19:11, 9 June 2025 (UTC)Reply
The issue would be with the margins: except by putting all paras in a numbered div even when they don't have a number, the unnumbered divs will be missing the left margin. — Alien  3
3 3
19:17, 9 June 2025 (UTC)Reply
A table with the necessary styling templates could take the work beyond the expansion limit. Personally, I would use {{fqm}} and {{fsp}} on the single digit numbers for alighment. Beeswaxcandle (talk) 20:23, 9 June 2025 (UTC)Reply
Given no formatting would need to be added to the table from what I can see, I think it wouldn't go above PEIS. The text added per para would be |-\n|[a few digits, less than 10]||. So about 16 chars per para. That's shorter than a single {{fqm}} invocation (109 chars). Styling can be done through index css.
Plus, {{fqm}} and {{fsp}} would mean not having the margins on the whole paragraphs, no? — Alien  3
3 3
20:40, 9 June 2025 (UTC)Reply
Looking over the options folks suggested, using Template:Pline seems like it might be the simplest approach. Bonus if the look of the Pline numbers can be modified.--The Navigators (talk) 04:51, 11 June 2025 (UTC)Reply
They have the css class wst-pline, so you can target them in index CSS with something like .wst-pline { color: inherit; font-size:inherit}. — Alien  3
3 3
07:54, 11 June 2025 (UTC)Reply
There is also {{*!/s}}{{*!/i}}{{*!/e}} for doing block based lists. I wrote these to work around some limitations of wikitext lists, but for a 'list' of numbered paragraphs it would be a 'use-case'. You do have to mark the start of each item though. ShakespeareFan00 (talk) 19:47, 12 June 2025 (UTC)Reply

Rotated book

edit

Most of the content pages of Christmas tree are turned sideways (as in, the text's put in landscape rather than portrait mode).

Do you think that the rotation is part of the content, or was it just a technical device to allow for larger text? And so, should they be transcribed rotated (which is technically doable) or normally ? — Alien  3
3 3
10:56, 9 June 2025 (UTC)Reply

What would be cool is if it had a button that would show the as published layout. Index:Christmas Tree-EEC.djvu Did the book that was scanned have a silver page?--RaboKarbakian (talk) 17:10, 9 June 2025 (UTC)Reply
Maybe change the way a {{class block}} works?--RaboKarbakian (talk) 17:12, 9 June 2025 (UTC)Reply
Technically, there wouldn't be an issue. I'm wondering on whether we should. — Alien  3
3 3
17:21, 9 June 2025 (UTC)Reply
(@RaboKarbakian, a side note for future cases: it's nice, but don't bother generating files for me; I prefer to do it myself because of my OCR setup and a few other things; and that requires going through PDF first; so I don't use preexisting DJVUs.) — Alien  3
3 3
19:52, 11 June 2025 (UTC)Reply
Alien 333: Sorry for the (what for me would have been an) annoyance. I think it should be okay to upload in the same namespace; you know of the upload a new version link on the commons page? I also did some images. I have a pretty good cover and that first image, with the stars and such, I brightened the faded colors and I removed that underline from the "C", because I think it is a librarian mark, just in red. There seems to be an "underlining the first letter of the title and first letter in the last name ritual" that many libraries practice(d). I am tempted to upload my image into your images namespace, but see the annoyance apology.
I am going to upload the cover and put that onto the header, but that can be easily reverted away.--RaboKarbakian (talk) 15:37, 13 June 2025 (UTC)Reply
It's not an annoyance, no; it's just I thought you could spare yourself the effort since I'm not going to use them. If you want, you can reupload the illustrations under the same title if you've got better ones; it's specifically for the whole scans (the djvus) that I really like to do them myself. — Alien  3
3 3
15:39, 13 June 2025 (UTC)Reply
Alien, I am unsure what you do to your pdf files, but perhaps you could strip the watermarks from the pages? If I were just picking a version to use, I would have picked the version without Google et al slapped on them.--RaboKarbakian (talk) 16:10, 13 June 2025 (UTC)Reply
I would definitely appreciate watermark stripping. How do you do that? Last time I tried removing the google watermarks I got the impression that they were actually rasterised onto the jpgs and that they couldn't really be easily removed. I was probably wrong.
On my setup: it's essentially raw images (JP2 if from IA, else JPGs) -img2pdf> pdfs -ocrmypdf> pdfs with OCR -pdf2djvu> djvus.
The in-between PDF conversion is the weak spot, but with my setup I manage to get OCR greatly superior to what I can get on-site or on the internet, and ocrmypdf as the name says does only pdfs. ocrodjvu looks promising but I never managed to get it up and running (missing dependencies not in my package manager IIRC) User:Inductiveload/Scripts/DJVU OCR also looks like a possible improvement, but it's python 2.X and I haven't bothered trying to update it (definitely should at some point).
To me, getting good OCR is more important than the actual page image; given of course that the image is still clear and legible.
I'd be interested if you have tips for file conversions. Regards, — Alien  3
3 3
17:00, 13 June 2025 (UTC)Reply
Removing watermarks can be easily done with Hathi PDF using xpdf tools. If you have evince installed, try typing "pdf" at a cmdline and tab to show what tools you have. It will be obvious from that list. My tesseract is good, but the djvu software seems to be having problems with some characters. I want to fix the first problem with all my computer first to verify that though.
The watermarks are embedded in the Hathi downloads which are not pdf.
If there is a dislike here after non-scan backed works, it is any irrelevant watermarks and/or logos. I have been told I was wrong, but I blame the option to remove the "ugly" Google and Hathi cover pages for IAUpload's brokenness.--RaboKarbakian (talk) 18:54, 13 June 2025 (UTC)Reply
Could you describe precisely your method for removing google tags? (I do have all the xpdf tools.)
Of course, the watermarks don't really matter that much. It's just a small annoyance. — Alien  3
3 3
19:14, 13 June 2025 (UTC)Reply
pdfimage -jp2 -p name_of.pdf but this is only close. pdfimage -h or some variation (--help) should show the syntax better than I can type from memory. It is so easy, I was ashamed I hadn't poked around in the old (really really old) tools before.
Honest! It wasn't me at the time disliking the logo spam. I have come to agree with it. It seems like they were uploading Google pdfs to IA so that IAUpload could convert them. And, I am surprised how much I miss that opinionated Aussie lately.--RaboKarbakian (talk) 19:37, 13 June 2025 (UTC)Reply


Based on the shape of the cover and the how the scan looks, I think it was printed rotated. That said, I think it would look and read better on screen if transcribed non-rotated -- the same way we handle photos that are printed "turned." —Tcr25 (talk) 18:40, 9 June 2025 (UTC)Reply
Good point on the images - Thanks. — Alien  3
3 3
18:54, 9 June 2025 (UTC)Reply